lanl / BEE

Other
17 stars 3 forks source link

Cancel Workflows not working #955

Open pagrubel opened 2 hours ago

pagrubel commented 2 hours ago

Cancelling workflow is not working. At one point we let running or schedule tasks complete and just did not schedule any new ones but now all tasks continue. When testing this, I ran beeflow-dag several times and it only archived the final one, so this will also need to be tested when cancel is fixed.

The workflow is not actually being cancelled, instead it actually runs to completion: beeflow query 0a Archived/Cancelled clamr--RUNNING ffmpeg--WAITING

$ beeflow query 0a Archived/Cancelled clamr--COMPLETED ffmpeg--PENDING

$ beeflow query 0a Archived clamr--COMPLETED ffmpeg--COMPLETED

Leahh02 commented 2 hours ago

I realized that whenever I cancel a workflow I always pause it beforehand, so I think maybe that's why I've only seen his be a problem once.

I looked into it, beeflow cancel uses conn.delete(_resource(long_wf_id), json={'option': 'cancel'}, timeout=60). In beeflow/wf_manager/resources/wf_actions.py the docstring for the delete method says that "For cancel, current tasks finish running." but that shouldn't mean that tasks that haven't started yet should start.

pagrubel commented 1 hour ago

So, currently scheduled jobs should be allowed to complete, but no others should be scheduled. We have discussed an option to cancel all jobs too, but for sure tasks that are waiting should not run.

Leahh02 commented 48 minutes ago

So, currently scheduled jobs should be allowed to complete, but no others should be scheduled. We have discussed an option to cancel all jobs too, but for sure tasks that are waiting should not run.

That makes sense

pagrubel commented 46 minutes ago

Oh and the Archived/Cancelled state should not change to Archived, and of course all dags that were done should be in the archive.