labdao / plex

Platform for running comp bio applications on distributed compute and storage infrastructure
https://lab.bio
MIT License
54 stars 14 forks source link

LAB-1482 queue, request tracking, concurrent job processing #979

Closed supraja-968 closed 2 months ago

supraja-968 commented 3 months ago

What type of PR is this?

Description

Request tracking is introduced as an append only table request tracker, which holds more useful information regarding all the retries as opposed to only the latest try of a job being tracked in the job table.

The retry logic is implemented as below:

200 - OK 404 - immediate retry once 500 - exponential back off, retry twice 504 - no retry

MAX_WORKER env variable is introduced, which is set to 4 by default, which indicates the gateway can pick 4 jobs maximum from the queue and submit to ray at any given point.

In addition to these changes, I also changed the timezone everywhere explicitly to UTC.

Related Tickets & Documents

https://github.com/convexitylabs/convexity/pull/129

Steps to Test

If you are using the blade (2 GPUs) or your AWS GPU instance (with 1 GPU), by running a sample rf diffusion experiment (a labsay equivalent endpoint that I introduced within the RF Diffusion service, that takes in the same input as colabdesign/rf diffusion, sleeps for 5 seconds and returns 200), which has the deployment config set to 0.5GPUs per job, you should be able to see 4 jobs or 2 jobs run and complete at the same time, with a buffer in the beginning to pick up and run jobs concurrently. Here's an example of time taken with and without concurrency: image image

localhost8080queue-summary

Note: on plex side, the MAX_WORKER is set to 4 by default, you might have to edit that while testing, if you are running your jobs with a single GPU, to see accurate results.

linear[bot] commented 3 months ago

LAB-1482 request tracker

vercel[bot] commented 3 months ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment | Name | Status | Preview | Comments | Updated (UTC) | | :--- | :----- | :------ | :------- | :------ | | **docs** | ⬜️ Ignored ([Inspect](https://vercel.com/convexitylabs/docs/mToBsZTPZGnipKKhLY6zCWr5t2Bd)) | [Visit Preview](https://docs-git-lab-1482-request-tracker-convexitylabs.vercel.app) | | Jun 20, 2024 6:30am |