jeremyjordan / ml-monitoring

A demo of Prometheus+Grafana for monitoring an ML model served with FastAPI.
https://www.jeremyjordan.me/ml-monitoring/
MIT License
224 stars 60 forks source link

Adding prometheus instrumentation package is resulting in some requests taking a long amount of time #9

Closed rileyhun closed 2 years ago

rileyhun commented 2 years ago

Hello again @jeremyjordan,

We are trying to decrease the latency of our BERT model prediction service that is deployed using FastAPI. The predictions are called through the /predict endpoint. We looked into the tracing and found one of the bottlenecks is the prometheus-fastapi-instrumentator. About 1% of the requests do timeout because they exceed 10s.

We also discovered that some metrics are not getting reported on 4 requests/second. Some requests took 30-50 seconds, with the starlette/fastapi taking long times. So it seems that under high usage, the /metrics endpoint doesn't get enough resources, and hence all /metrics requests wait for some time and fail eventually. So having separate container for metrics could help. Or if possible to have metrics delayed/paused under high load. Any insight/guidance would be much appreciated.

Screen Shot 2021-12-03 at 6 37 51 PM Screen Shot 2021-12-03 at 6 37 38 PM Screen Shot 2021-12-03 at 7 59 40 PM
jeremyjordan commented 2 years ago

Hi there!

We looked into the tracing and found one of the bottlenecks is the prometheus-fastapi-instrumentator.

This is interesting, and I would definitely recommend raising an issue in the Github repository for that library so the maintainer is aware of the issues. However, it's not clear whether the performance issue is with prometheus-fastapi-instrumentator or the underlying prometheus_client library. The former is really just a convenient and lightweight wrapper around the latter.

So it seems that under high usage, the /metrics endpoint doesn't get enough resources, and hence all /metrics requests wait for some time and fail eventually. So having separate container for metrics could help.

According to the docs, you should be able to instrument your main application and then expose a second application which returns the metrics. However, both of these server processes would be running in the same container.

Good luck digging into this!

rileyhun commented 2 years ago

Thanks @jeremyjordan. I'll raise an issue on the prometheus-fastapi-instrumentator github repository, but doesn't look like the maintainers are particularly active, but worth a try.

Would it help to increase the number of gunicorn workers and then use prometheus_client multiprocess mode?

jeremyjordan commented 2 years ago

Yes, I'd recommend reading through https://fastapi.tiangolo.com/deployment/server-workers/ if you haven't already. The FastAPI docs are great 🎉