linagora / james-project

Mirror of Apache James Project
Apache License 2.0
72 stars 62 forks source link

Distributed task manager: yet again improve timeout #5203

Closed chibenwa closed 3 months ago

chibenwa commented 4 months ago

RabbitMQWorkQueue applies the consumer timeout on the queue (great!)

But the task never is wrapped onto a reactor timeout prior to the consumer timeout.

This means that if I submit a 2 day long task it would effectively KILL the consumer => bug.

A nicer behaviour would be:

Arsnael commented 4 months ago

Catch the reactor timeout error and eplicitly CANCEL the task.

That I ain't too sure. We have task that can take time exponentially depending of the amount of data on the platform. Some could take days in some cases and would be critical. Is cancelling it the good action here?

chibenwa commented 4 months ago

Cancelation is IMO better than crashing the task manager consummer.

As discussed in JAMES-4032 tasks that takes crazy long times can be marked as AsyncSafe (or at least have an option to run them that way) and work around that limitation.

This would IMO be a cleaner fix.

For such Async tasks I am thinking of...

Arsnael commented 4 months ago

Yeah ok. TBH I read about the asyncsafe stuff after reading this, but I'm ok then

chibenwa commented 4 months ago

Yes the idea I have is keep those tasks synchronous by default but give the option to run them asynchronously.

chibenwa commented 4 months ago

https://github.com/apache/james-project/pull/2284

chibenwa commented 3 months ago

https://github.com/apache/james-project/pull/2284