Deploying for long running endpont

pplonski / my_ml_service

My Machine Learning Web Service

MIT License

614 stars 165 forks source link

Hi,

Thank you so much for making this repo available. I had a question on how to deploy this kind of API framework for ML inferences that might take very long (say 1 minute).

Typically when the predict endpoint is hit by a request, if the inference model is light, it will run fast. But for my case there is no way to reduce the inference time less than 1 minute. In this case, the whole endpoint will be locked, and wont be able to accept any further inference request. This is a very common situation with ML inferencing so I am wondering there must by something easy to handle this kind of situation.

I did some reading on DRF async but not sure if thats the right approach.

Can you please provide some directions?

pplonski / my_ml_service

Deploying for long running endpont #31