Open vishal-kvn opened 4 years ago
@vishal-kvn I use a k8s deployment to run the Faust workers. I have configured the Faust app to auto discover the agents and the workers run indefinitely. This set up works fine to me.
@afausti Thanks for the reply. I will try it out.
@afausti Setting autodiscover=True
did not fix the above issue. Also, I noticed that you set the replicaCount to 1(https://github.com/lsst-sqre/charts/blob/master/charts/kafka-aggregator/values.yaml#L3) for your worker. Have you deployed with a replicaCount greater than 1? For my use case I have a replicaCount of 3 but I noticed that only 1 worker(pod) is consuming messages.
Please let me know if you came across this behavior.
A couple of questions:
How many partitions do you have on your topic? You need at minimum one partition per worker
Have you run "kubectl describe" on the pod after it is killed to get the status/event information? That should tell you why K8S is killing the pod
Do you have a readinessProbe and/or livenessProbe configured?
Are you allocating enough memory for the pods? OOMKilled is a very common reason for pods to get killed
Kubernetes will tell you what it doesn't like, you just need to look hard for it.
Hope this helps
@bobh66 Thanks for the reply.
How many partitions do you have on your topic? You need at minimum one partition per worker I have one topic that has 6 partitions.
Have you run "kubectl describe" on the pod after it is killed to get the status/event information? That should tell you why K8S is killing the pod I will be looking into this and will share more info.
Do you have a readinessProbe and/or livenessProbe configured? Yes. The pods pass the livenessProbe check.
Are you allocating enough memory for the pods? OOMKilled is a very common reason for pods to get killed I haven't seen a OOMKilled error in the logs and I have provisioned sufficient memory for the deploy.
Kubernetes will tell you what it doesn't like, you just need to look hard for it. Ack! I will take a closer look at the logs to find the root cause.
@afausti I see you're using the memory storage for Tables. Do you think you'd need to use a StatefulSet instead of a Deployment if you switched to rocksdb?
@taybin have you tried implementing a StatefulSet for Faust when using Rocksdb?
@vishal-kvn My Faust app is also getting a sigterm 15, though I'm running via docker-compose, not k8s. I'm wondering if this ever went anywhere for you?
Checklist
master
branch of Faust.Steps to reproduce
I am trying to deploy a Faust agent to production env using 2 pods. The agent consumes from a topic that has 6 partitions. After the deploy the agent runs until it receives a SIGTERM(15) and the agent shut downs and stops consuming messages.
I am wondering if there are any best practices around deploys using kubernetes.
Expected behavior
Agent gracefully handles the sigterm.
Actual behavior
App shuts down and stop consuming messages
Versions