Closed rileyhun closed 2 years ago
Hi there!
We looked into the tracing and found one of the bottlenecks is the prometheus-fastapi-instrumentator.
This is interesting, and I would definitely recommend raising an issue in the Github repository for that library so the maintainer is aware of the issues. However, it's not clear whether the performance issue is with prometheus-fastapi-instrumentator
or the underlying prometheus_client
library. The former is really just a convenient and lightweight wrapper around the latter.
So it seems that under high usage, the /metrics endpoint doesn't get enough resources, and hence all /metrics requests wait for some time and fail eventually. So having separate container for metrics could help.
According to the docs, you should be able to instrument your main application and then expose a second application which returns the metrics. However, both of these server processes would be running in the same container.
Good luck digging into this!
Thanks @jeremyjordan. I'll raise an issue on the prometheus-fastapi-instrumentator
github repository, but doesn't look like the maintainers are particularly active, but worth a try.
Would it help to increase the number of gunicorn
workers and then use prometheus_client
multiprocess mode?
Yes, I'd recommend reading through https://fastapi.tiangolo.com/deployment/server-workers/ if you haven't already. The FastAPI docs are great 🎉
Hello again @jeremyjordan,
We are trying to decrease the latency of our BERT model prediction service that is deployed using FastAPI. The predictions are called through the
/predict
endpoint. We looked into the tracing and found one of the bottlenecks is theprometheus-fastapi-instrumentator
. About 1% of the requests dotimeout
because they exceed 10s.We also discovered that some metrics are not getting reported on 4 requests/second. Some requests took 30-50 seconds, with the
starlette/fastapi
taking long times. So it seems that under high usage, the/metrics
endpoint doesn't get enough resources, and hence all/metrics
requests wait for some time and fail eventually. So having separate container for metrics could help. Or if possible to have metrics delayed/paused under high load. Any insight/guidance would be much appreciated.