FastAPI deployed with hypercorn in GCP Cloud Run returning 503 sporadically

pgjones / hypercorn

Hypercorn is an ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.

MIT License

1.06k stars 95 forks source link

FastAPI deployed with hypercorn in GCP Cloud Run returning 503 sporadically #221

Open bgregoinductiva opened 2 months ago

bgregoinductiva commented 2 months ago

I have a FastAPI project deployed in Cloud Run using the hypercorn server. I'm using Uvloop as the event loop and leaving the other configurations with default values:

hypercorn app.main:app --bind 0.0.0.0:80 --worker-class uvloop

Here are the Cloud Run configurations:

Memory: 1 GiB
CPU: 1
Maximum concurrent requests per instance: 80
CPU is only allocated during request processing
Minimum number of instances: 1
Maximum number of instances: 30
Startup CPU boost
Use HTTP/2 end-to-end

When I get a peak of concurrent requests during integration testing, about 30, I usually get a 503, and then a new instance is started.

Has anyone faced a similar problem before?

Thanks in advance.

nabheet commented 2 months ago

Yes, based on what I have learnt so far, your instance was terminated because it accessed more memory that its defined limit.

Even though this says that Cloud Run will return a 500. In my testing I was able to prove the it actually returns a 503. Their documentation leaves a lot to be desired.

Hope this helps.

nielsbox commented 2 months ago

We have the same issue, only at 40% memory usage at 99 percentile.

nielsbox commented 2 months ago

Update: we isolated the issue to only HTTP/2. HTTP/1 seems to be fine.

pgjones commented 1 month ago

Update: we isolated the issue to only HTTP/2. HTTP/1 seems to be fine.

Is the HTTP/1 traffic encrypted? There seems to be an asyncio memory leak with SSL

nielsbox commented 1 month ago

Update: we isolated the issue to only HTTP/2. HTTP/1 seems to be fine.

Is the HTTP/1 traffic encrypted? There seems to be an asyncio memory leak with SSL

Cloudrun terminates TLS. https://cloud.google.com/run/docs/container-contract#tls

nabheet commented 1 month ago

Also, I hate to admit this in public, but I wasn't closing SQL connections in the health check endpoint so that was leaking file descriptors. This was causing our Cloud Run containers to crash without log events returning a 503 from the Cloud Run LB.

So another thing to check would be your file descriptor count.