reanahub / reana-server

REANA API server
http://reana-server.readthedocs.io/
MIT License
5 stars 37 forks source link

scheduler: assess requeueing procedure #118

Closed dinosk closed 3 years ago

dinosk commented 5 years ago

The scheduler.on_message method is executed with every new incoming message. The conditions in set in the config.py are checked and if all are met, RWC is called to start the workflow run. If not the message is requeued so that it retries later. The kombu docs mention this as not the proper way to select what messages to process http://docs.celeryproject.org/projects/kombu/en/latest/reference/kombu.message.html#kombu.message.Message.requeue . The current procedure works, but more testing could be done with a larger number of workflow submissions, to ensure that problems like starvation or head of line blocking (shouldn't be the case for this, but relevant in other approaches) are avoided.

Stemmed from https://github.com/reanahub/reana-server/pull/116#discussion_r248757108

diegodelemos commented 3 years ago

Moreover, this requeuing mechanism doesn't respect order, workflows are arbitrarily sent back to the queue. This causes for example that in a situation in which W1 is submitted before W2, there is a chance that W2 would get executed first.

This is because one has to "acknowledge" a workflow to be able to inspect it and decide whether to run it or not source here. This requires RabbitMQ/Kombu investigation.

audrium commented 3 years ago

Priority scheduling was introduced in https://github.com/reanahub/reana-server/issues/358, so to improve the requeueing procedure it was decided to use a message delay mechanism (with the help of RabbitMQ Delayed Message Plugin) and to count how many times the message was requeued. This could later be improved to transfer messages which are looping for a long time to some dead letter exchange and to inform a user that a workflow has expired.