Proxima uses Celery with Redis to queue video transcode as tasks with multiple workers.
I want to use Redis pubsub to keep track of these long running tasks and Celery worker metrics. We are well within the use case but the desire to get a MVP for queuer side progress took precedence at the time. Polling the database manually is undesireable because:
it reinvents the wheel, and I'm not as good a wheel designer as the Redis team
added complexity reduces readability and maintainability
polling creates a new TCP connection for each poll in contrast to pubsub sockets
Ideally I'd like:
most recent Celery worker event ('started', 'finished', 'failed') console log
all tasks average progress bar
discrete task completion progress bar or text as fraction at end of average progress bar
How can we do this with pubsub?
We gave up on pubsub initially because of issues keeping the subscriber in sync with the publisher. Redis implicitly queues messages at the TCP buffer level when the subscriber is too slow to receive them. The fix is to properly implement asyncio handling to keep the subscriber thread unblocked. Acknowledge messages when they come through and dump them in a variable for the main loop to handle. For our use case, it's okay for us to drop messages. Guaranteed delivery is not important. Staying in sync is far more important for the sake of our progress accuracy. Better to drop frames than slow down the video!
Most Recent Worker Event
Most recent worker event is easy, we just need to check that the pubsub payload includes our group ID in its metadata.
Average Progress
Current implementation does a lot of gymnastics with a lot of spaghetti, task matching and additional polling are needed to get the average progress bar working. Workers are publishing individually and we need to aggregate and calculate average queuer-side. That's what's slowing down the loop performance and
Discrete progress
The discrete one is easy, since we're just increasing the ratio of complete tasks out of known queued tasks whenever we get a success message from a task that belongs to us. That handler works a treat.
Dramas
What probably doesn't help is the cross-reference gymnastics we're using to check if the task progress message belongs to the task group of the queuer. If we can get the Celery group ID on the worker side once we receive a new task, we can pass that as an argument into ffmpeg and set the channel pattern dynamically:
That's great! That means we only need one pubsub handler for task-progress.
Do the percentage calculations worker side too! Away with this advance / seconds processed vs total duration... This makes our calculations way easier. Everything is out of 100%
Alright. Here's the dealio.
Proxima uses Celery with Redis to queue video transcode as tasks with multiple workers. I want to use Redis pubsub to keep track of these long running tasks and Celery worker metrics. We are well within the use case but the desire to get a MVP for queuer side progress took precedence at the time. Polling the database manually is undesireable because:
Ideally I'd like:
How can we do this with pubsub?
We gave up on pubsub initially because of issues keeping the subscriber in sync with the publisher. Redis implicitly queues messages at the TCP buffer level when the subscriber is too slow to receive them. The fix is to properly implement asyncio handling to keep the subscriber thread unblocked. Acknowledge messages when they come through and dump them in a variable for the main loop to handle. For our use case, it's okay for us to drop messages. Guaranteed delivery is not important. Staying in sync is far more important for the sake of our progress accuracy. Better to drop frames than slow down the video!
Most Recent Worker Event Most recent worker event is easy, we just need to check that the pubsub payload includes our group ID in its metadata.
Average Progress Current implementation does a lot of gymnastics with a lot of spaghetti, task matching and additional polling are needed to get the average progress bar working. Workers are publishing individually and we need to aggregate and calculate average queuer-side. That's what's slowing down the loop performance and
Discrete progress The discrete one is easy, since we're just increasing the ratio of complete tasks out of known queued tasks whenever we get a success message from a task that belongs to us. That handler works a treat.
Dramas
What probably doesn't help is the cross-reference gymnastics we're using to check if the task progress message belongs to the task group of the queuer. If we can get the Celery group ID on the worker side once we receive a new task, we can pass that as an argument into ffmpeg and set the channel pattern dynamically:
That's great! That means we only need one pubsub handler for task-progress.
Do the percentage calculations worker side too! Away with this advance / seconds processed vs total duration... This makes our calculations way easier. Everything is out of 100%