vmware-archive / kubeless

Kubernetes Native Serverless Framework
https://kubeless.io
Apache License 2.0
6.86k stars 755 forks source link

Funtion configured with kafka trigger not receiving all messages #1082

Open gemanilkashyap opened 5 years ago

gemanilkashyap commented 5 years ago

BUG REPORT

What happened: Python code function configured with Kafka trigger was not receiving all the messages (I could say 1 out of 5 received).

What you expected to happen: Function to receive all message

How to reproduce it (as minimally and precisely as possible): I couldnt suggest anything apart from me left them running for few days (random behaviour)

Anything else we need to know?: I did a function delete and redeploy after which it started behaving normal.

Environment:

andresmgot commented 5 years ago

There are two possible points of failure:

Can you check if in those logs you are able to see any error?

gemanilkashyap commented 5 years ago

@andresmgot thanks for the reply and sorry for not responding. I got the issue yesterday and found no error within kafka controller logs. Though i suspect that it could be because i do produce kafka message operation with in function call. Do you have the best practice to do kafka and database operations within function configured against kafka trigger?

andresmgot commented 5 years ago

There is no a specific good practice I can point you to. I would recommend you to log in your function if it's able to reach Kafka and properly publish the message. That way you can narrow down the possible causes of the issue. If you have a timestamp of the failed message it will be easier to debug what happened.

gemanilkashyap commented 5 years ago

@andresmgot thanks for the reply and sorry for not responding. I got the issue yesterday and found no error within kafka controller logs. Though i suspect that it could be because i do produce kafka message operation with in function call. Do you have the best practice to do kafka and database operations within function configured against kafka trigger?

gemanilkashyap commented 5 years ago

I was looking into any specific way of handling connections to kafka or database when working with kubeless? Currently closing kafka connection with each function call

andresmgot commented 5 years ago

Depending on the runtime you may be able to open the connection outside the function code. This improves performance but it keeps the connection open for the time the pod is running.

gemanilkashyap commented 5 years ago

Sure i will give it a go

gemanilkashyap commented 5 years ago

@andresmgot I created the database connection and kafka connection to publish outside the function to reuse the connection but it is not giving the performance i was expecting and also it stops after inserting few records, my runtime is python.

andresmgot commented 5 years ago

There are other bottlenecks regarding Kafka, see https://github.com/kubeless/kubeless/issues/826 for more details. It's unresolved at this moment.

gemanilkashyap commented 5 years ago

Hmm, will that reuse the kafka and database connections across multiple function invocations?

gemanilkashyap commented 5 years ago

Hmm, will that reuse the kafka and database connections across multiple function invocations?

andresmgot commented 5 years ago

In the case of python, if I recall correctly, a new thread is created to execute the function every time it's invoked so I think it's not reusing the database connection. You can double check that if you print some message in the code that is creating the connection and check if that message is printed every time.

gemanilkashyap commented 5 years ago

Ahh ok it seem to be a costly and performance hit, have you got a way to get connections within python functions in the past?

andresmgot commented 5 years ago

Unfortunately no, I have no personal experience trying to do so. Happy to know your findings if you work on this!