Workers go into restarting/crash cycle (WORKER TIMEOUT / signal 6)

lsmith77 commented 2 years ago

I am struggling to know which layer is the root cause here.

My app runs fine, but then suddenly it is unable to serve requests for a while and then "fixes itself". While it's unable to serve requests my logs show:

[2022-01-18 08:36:46 +0000] [1505] [CRITICAL] WORKER TIMEOUT (pid:1548)
[2022-01-18 08:36:46 +0000] [1505] [CRITICAL] WORKER TIMEOUT (pid:1575)
[2022-01-18 08:36:46 +0000] [1505] [WARNING] Worker with pid 1548 was terminated due to signal 6
[2022-01-18 08:36:46 +0000] [1505] [WARNING] Worker with pid 1575 was terminated due to signal 6
[2022-01-18 08:36:46 +0000] [1783] [INFO] Booting worker with pid: 1783
[2022-01-18 08:36:46 +0000] [1782] [INFO] Booting worker with pid: 1782
[2022-01-18 08:36:47 +0000] [1505] [CRITICAL] WORKER TIMEOUT (pid:1577)
[2022-01-18 08:36:47 +0000] [1505] [CRITICAL] WORKER TIMEOUT (pid:1578)
[2022-01-18 08:36:47 +0000] [1505] [WARNING] Worker with pid 1578 was terminated due to signal 6
[2022-01-18 08:36:47 +0000] [1784] [INFO] Booting worker with pid: 1784
[2022-01-18 08:36:47 +0000] [1505] [WARNING] Worker with pid 1577 was terminated due to signal 6
[2022-01-18 08:36:47 +0000] [1785] [INFO] Booting worker with pid: 1785
[2022-01-18 08:36:51 +0000] [1505] [CRITICAL] WORKER TIMEOUT (pid:1545)
[2022-01-18 08:36:51 +0000] [1505] [CRITICAL] WORKER TIMEOUT (pid:1551)
[2022-01-18 08:36:51 +0000] [1505] [CRITICAL] WORKER TIMEOUT (pid:1559)
[2022-01-18 08:36:52 +0000] [1505] [WARNING] Worker with pid 1551 was terminated due to signal 6

Initially, I thought it was related to load and resource limits, but it seems to also happen during "typical load" and when resources are nowhere near their limits.

lsmith77 commented 2 years ago

BTW I saw this ticket here https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker/issues/47 but I think its not the same issue.

udit-pandey commented 2 years ago

Facing the same issue. Whenever following code is executed with incorrect smtp_url, port, my worker crashes:

def validate_smtp(smtp_url: str):
     try:
         smtp = SMTP()
         smtp.connect(smtp_url)
         smtp.quit()
         return True
     except:
         return False

There is no crash if smtp_url or port is valid. Dependencies:

gunicorn: 20.1.0
uvicorn: 0.17.1
fastapi: 0.73.0

AnjaneyuluBatta505 commented 2 years ago

I'm also facing the same issue. any workarounds?

udit-pandey commented 2 years ago

i resolved this issue by adding worker timeout while initiating my gunicorn application.

gunicorn -k uvicorn.workers.UvicornWorker ${APP_MODULE} --bind 0.0.0.0:80 --timeout ${WORKER_TIMEOUT}

mudassirzr commented 2 years ago

Facing the same issue when running long processes on websockets and it ends up terminating the websocket connection. Any fixes?

yuanwu2017 commented 2 years ago

Facing the same issue when I use the haystack. I modifed the docker-compose.yml as following: command: "/bin/bash -c 'sleep 10 && gunicorn rest_api.application:app -b 0.0.0.0 -k uvicorn.workers.UvicornWorker -- workers 1 --timeout 600'" It can work.

ankitksharma commented 2 years ago

Facing this issue while using docker. Working perfectly fine if run directly with gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8080 main:app.

None of the following suggested solutions worked:

Assigning more memory
Changing worker class to gevent
Changing python version to 3.7 from 3.9
Adding timeout
Running directly with uvicorn without gunicorn

Can someone please point me in the right direction to resolve this issue?

komljenovicnikola commented 2 years ago

Facing this same issue (on both CentOS and Ubuntu VM's), it happens during typical load and all resources are not near limits.