pplonski / my_ml_service

My Machine Learning Web Service
MIT License
614 stars 165 forks source link

Deploying for long running endpont #31

Closed dev-svk-flbs closed 1 year ago

dev-svk-flbs commented 1 year ago

Hi,

Thank you so much for making this repo available. I had a question on how to deploy this kind of API framework for ML inferences that might take very long (say 1 minute).

Typically when the predict endpoint is hit by a request, if the inference model is light, it will run fast. But for my case there is no way to reduce the inference time less than 1 minute. In this case, the whole endpoint will be locked, and wont be able to accept any further inference request. This is a very common situation with ML inferencing so I am wondering there must by something easy to handle this kind of situation.

I did some reading on DRF async but not sure if thats the right approach.

Can you please provide some directions?

pplonski commented 1 year ago

Hi,

I would suggest to use Celery and compute the long running task in the background and use long polling in the frontend. The workflow:

  1. Request to the server.
  2. Server starts the task in the background.
  3. Server returns the task ID as response.
  4. Client sends the task ID to get the task status.
  5. If task is still running the server returns running.
  6. If task is finished the server returns result.

I have a feedback form running with https://deploymachinelearning.com and long running tasks with Celery was the the most frequently requested topic. I will write a course about this topic, but it will be paid. I'm planning to write it in this year.