in03 / proxima

Transcode source media directly from DaVinci Resolve using multiple machines for encoding. Great for creating proxies quickly.
MIT License
50 stars 3 forks source link

feat: Redis PubSub for Queuer side progress #225

Closed github-actions[bot] closed 1 year ago

github-actions[bot] commented 2 years ago

Alright. Here's the dealio.

Proxima uses Celery with Redis to queue video transcode as tasks with multiple workers. I want to use Redis pubsub to keep track of these long running tasks and Celery worker metrics. We are well within the use case but the desire to get a MVP for queuer side progress took precedence at the time. Polling the database manually is undesireable because:

Ideally I'd like:

How can we do this with pubsub?

We gave up on pubsub initially because of issues keeping the subscriber in sync with the publisher. Redis implicitly queues messages at the TCP buffer level when the subscriber is too slow to receive them. The fix is to properly implement asyncio handling to keep the subscriber thread unblocked. Acknowledge messages when they come through and dump them in a variable for the main loop to handle. For our use case, it's okay for us to drop messages. Guaranteed delivery is not important. Staying in sync is far more important for the sake of our progress accuracy. Better to drop frames than slow down the video!

Most Recent Worker Event Most recent worker event is easy, we just need to check that the pubsub payload includes our group ID in its metadata.

Average Progress Current implementation does a lot of gymnastics with a lot of spaghetti, task matching and additional polling are needed to get the average progress bar working. Workers are publishing individually and we need to aggregate and calculate average queuer-side. That's what's slowing down the loop performance and

Discrete progress The discrete one is easy, since we're just increasing the ratio of complete tasks out of known queued tasks whenever we get a success message from a task that belongs to us. That handler works a treat.

Dramas

What probably doesn't help is the cross-reference gymnastics we're using to check if the task progress message belongs to the task group of the queuer. If we can get the Celery group ID on the worker side once we receive a new task, we can pass that as an argument into ffmpeg and set the channel pattern dynamically:

channel: task-progress-gh84d

data:
task-id: "jk39d"
percent: 20

That's great! That means we only need one pubsub handler for task-progress.

Do the percentage calculations worker side too! Away with this advance / seconds processed vs total duration... This makes our calculations way easier. Everything is out of 100% closes #224

in03 commented 1 year ago

Ironically, revisiting PubSub made me realise the mechanism for deriving encoding progress and task status queuer-side has actually been available in Celery's AsyncResult this whole time... Literally reinvented the wheel.

Anyway, it works! No more Redis-py, no more PubSub, no more polling the database, storing SHAs in dictionaries or researching bloom filters. It's siqque.

I've also changed the worker names to use unique IDs instead of sequential numbering to allow multiple workers to be started at any time. Hopefully this can give way to automatic scaling support.