Closed pagrubel closed 3 months ago
Instead of stopping a reset when there are initializing workflows, I think it'd be better to just kill them outright, but ask the user first.
Instead of stopping a reset when there are initializing workflows, I think it'd be better to just kill them outright, but ask the user first.
How should I do that? What needs to be killed?
I will try to modify this to search for any Running or Initializing workflows, give the user a chance to stop the reset process. If they want to continue I will attempt to cancel the workflows, then do the stop and delete dir. I may put a longer wait in too, just to get around the Initializing problem.
Oh sorry missed this. We want to just kill all the currently initializing or running workflows exactly as you described.
This is what I get when I have an initializing workflow and try to beeflow core reset
:
(base) [kchilleri@darwin-fe3 BEE]$ git checkout issue757/fix_reset_error
branch 'issue757/fix_reset_error' set up to track 'origin/issue757/fix_reset_error'.
Switched to a new branch 'issue757/fix_reset_error'
(base) [kchilleri@darwin-fe3 BEE]$ git status
On branch issue757/fix_reset_error
Your branch is up to date with 'origin/issue757/fix_reset_error'.
(base) [kchilleri@darwin-fe3 BEE]$ cd workdir
(base) [kchilleri@darwin-fe3 workdir]$ cp /vast/home/kchilleri/BEE/examples/cat-grep-tar/lorem.txt .
(base) [kchilleri@darwin-fe3 workdir]$ poetry shell
Spawning shell within /vast/home/kchilleri/.cache/pypoetry/virtualenvs/hpc-beeflow-PIafEbRq-py3.9
. /vast/home/kchilleri/.cache/pypoetry/virtualenvs/hpc-beeflow-PIafEbRq-py3.9/bin/activate
(base) [kchilleri@darwin-fe3 workdir]$ . /vast/home/kchilleri/.cache/pypoetry/virtualenvs/hpc-beeflow-PIafEbRq-py3.9/bin/activate
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow core start
Checking dependencies...
Found Charliecloud 0.37
Starting beeflow...
Run `beeflow core status` for more information.
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow core status
beeflow components:
redis ... RUNNING
scheduler ... RUNNING
celery ... RUNNING
slurmrestd ... RUNNING
wf_manager ... RUNNING
task_manager ... RUNNING
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow list
There are currently no workflows.
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow package /vast/home/kchilleri/BEE/examples/cat-grep-tar .
Package cat-grep-tar.tgz created successfully
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow submit wf1 ./cat-grep-tar.tgz workflow.cwl input.yml /vast/home/kchilleri/BEE/workdir
Package cat-grep-tar.tgz unpackaged successfully
Workflow submitted! Your workflow id is 67122d.
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow list
Name ID Status
wf1 67122d Initializing
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow query 67122d
Initializing
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow list
Name ID Status
wf1 67122d Initializing
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow core reset
There are 'Initializing' workflows. Reset may fail. Check 'beeflow list'i
A reset will remove this directory: /vast/home/kchilleri/.beeflow
Are you sure you want to reset?
Please ensure all workflows are complete before running a reset
Check the status of workflows by running 'beeflow list'
A reset will shutdown beeflow and its components.
A reset will delete the bee_workdir directory which results in:
Removing the archive of workflows executed.
Removing the archive of workflow containers.
Reset all databases associated with the beeflow app.
Removing all beeflow logs.
Beeflow configuration files from bee_cfg will remain.
Respond with yes(y)/no(n): y
Beeflow has been shutdown.
Waiting for components to cleanly stop.
Unable to remove /vast/home/kchilleri/.beeflow.
[Errno 39] Directory not empty: 'x86_64-linux-gnu'
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow core status
Cannot connect to the beeflow daemon, is it running? Check the log at "/vast/home/kchilleri/.beeflow/logs/beeflow.log".
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow core start
Checking dependencies...
Found Charliecloud 0.37
Starting beeflow...
Run `beeflow core status` for more information.
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow core status
beeflow components:
redis ... RUNNING
scheduler ... RUNNING
celery ... RUNNING
slurmrestd ... RUNNING
wf_manager ... RUNNING
task_manager ... RUNNING
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow list
Name ID Status
wf1 67122d Initializing
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow cancel 67122d
Workflow is Initializing cannot cancel.
I am going to close this pull request and open another, apparently I still have some rebasing problems with it. I will add some information about how I handle Running and Initializing workflows as well as other active workflows.
Resolves: issue #757