Closed shubh0508 closed 2 years ago
Closing this request. Found ways to resolve it with preload feature in gunicorn.
Just a brief note here:
compile()
function has a cache=
parameter, that takes a filepath for caching a previously compiled model. More info in the docs
I am using FastAPI with Gunicorn as my python application. The standard python-based lightgbm modules inference takes around 1.5GB-2GB for 3 Guicorn processes, but LLeaves is taking around 9GB for a single gunicorn worker. If I want to use 3 Gunicorn workers, will I have to increase the instance size to 32GB?
Can I reduce the memory use somehow or can I share a compiled model among multiple Gunicorn workers or python processes?
Also, from what I understood, it seems that we are required to compile the model every time when the application restarts?