justanhduc / task-spooler

A scheduler for GPU/CPU tasks
https://justanhduc.github.io/2021/02/03/Task-Spooler.html
GNU General Public License v2.0
273 stars 24 forks source link

Advice on how to cancel (kill or remove) task #20

Closed powelleric closed 2 years ago

powelleric commented 2 years ago

First off, thank you so much for forking/maintaining this project!

I want to know the best way to NOT run a task in the case that I do not know whether it is running or queued.

I see that -r throws an error if it is running and -k throws an error if it is not (e.g. queued).

Based on this, I came up with the command: ts -r ${taskid} || ts -k ${taskid}. Does that seem like the best approach?

justanhduc commented 2 years ago

Hey @powelleric. Thanks for using ts. Seems like you are using bash. How about checking the state of the job first? For e.g.,

if [[ "$(ts -s ${taskid})" == "running" ]];
then
    ts -k ${taskid}
else
    ts -r ${taskid}
fi

Let me know if there's anything else.

powelleric commented 2 years ago

I was being a bit pedantic, I guess. The issue is that your suggestion is vulnerable to a bit of a race condition; it is possible that when you check the status, that it is not running. But then by the time you do ts -r that it has started running.

justanhduc commented 2 years ago

Yeah you are right. To kill or remove a job, the ts client has to make a request to the server first. So unless I make a new feature that makes the server return the PID if the job is running or remove it if it's queued/finished, the next best thing is a try...catch or if...else, which requires two server requests. Could you please elaborate more on your use case why it needs to be such fractionally perfect?

powelleric commented 2 years ago

Could you please elaborate more on your use case why it needs to be such fractionally perfect?

It does not. I mistakenly thought that I was hitting a race condition when trying to kill a job the other day but realized that actually the jobs had not even started, so I was completely confused. But ever since then I have been thinking about race conditions and thought I would ask.

I guess a second reason I ask is because I use Django web framework a bit and on the django subreddit people always recommend RabbitMQ for simple job queuing. I think that is overkill and I am preparing to begin advising people to consider ts instead but I want to be sure about little details like this. (I wonder if RabbitMQ has a similar race condition consideration, actually.)

Thanks a bunch for your help.

P.S. I might actually revise my technique to: ts -k ${jobid} || ts -r ${jobid} || ts -k ${jobid}. The first ts command to kill is because (at least in my queue) in the vast majority of cases the job is running, so this is what is needed. Rarely, the job is not running yet, which will be handled by the second command. And the third command (which will probably never be reached) handles the theoretical race condition I described. So, this three-command form should be more efficient than the two-command form and will still honor the race condition. But I have to admit that it probably will confuse people unless I leave a lengthy comment explaining it.

justanhduc commented 2 years ago

Great to hear it's all been figured out. I wanna point out just in case you missed it; -r removes finished jobs too. But I think your three-part command can cover all the transitions already. Thanks a lot for recommending ts to your colleagues 🙏. Let me know if there's any problem.