Closed aintItPythonic closed 6 months ago
I use health checks.
livenessProbe:
exec:
command:
- arq
- WorkerSettings
- --check
Looks like @JonasKs has the right solution.
I should also add that you want to have a unique health check key for each pod. Add this to your worker settings: health_check_key = socket.gethostname()
So using multiple pods in Kubernetes/FastAPI, and occasionally, we get an interruption to Redis.
After Redis comes back, the worker stops doing jobs/tasks. We are only using arq for cron jobs.
Here is the traceback on the connection interruption. Don't quite know why it interrupts.
Rather than restarting the pod, when the connection is interrupted, it would be better to get the worker reset somehow when the connection comes back online.
arq is 0.25.0 and we are using this with Kubernetes and FastAPI also, but I don't think it has anything to do with FastAPI.
Also I don't believe it is related to Kubernetes, as have been able to replicate it locally.
The Worker is seems to be still alive, but no tasks/jobs are going through it.