lanl / BEE

Other
13 stars 3 forks source link

Reset: when a workflow's state is either Running or Waiting causes an error #757

Closed pagrubel closed 3 months ago

pagrubel commented 6 months ago

beeflow core reset causes an error when workflows submitted but not started and when a workflow is running with the following error: OSError: [Errno 39] Directory not empty: 'x86_64-linux-gnu'

This is caused by the neo4j instance still running. We can either check for Running or Waiting workflows, inform the user that they will need to cancel those workflows and not allow the operation or we can automatically cancel the workflows and continue the reset.

I like the first option.

pagrubel commented 4 months ago

This is the current (in develop) result of trying this when there is either an Initializing or Running workflow.

beeflow core reset
A reset will remove this directory: /vast/home/pagrubel/.beeflow

Are you sure you want to reset?

Please ensure all workflows are complete before running a reset
Check the status of workflows by running 'beeflow list'

A reset will shutdown beeflow and its components.

A reset will delete the bee_workdir directory which results in:
Removing the archive of workflows executed.
Removing the archive of workflow containers.
Reset all databases associated with the beeflow app.
Removing all beeflow logs.

Beeflow configuration files from bee_cfg will remain.

Respond with yes(y)/no(n):  y
Beeflow has been shutdown.
Waiting for components to cleanly stop.
Traceback (most recent call last):

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/bin/beeflow", line 6, in <module>
    sys.exit(main())
             ^^^^^^

  File "/vast/home/pagrubel/BEE/BEE/beeflow/client/bee_client.py", line 610, in main
    app()

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/typer/main.py", line 289, in __call__

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/typer/main.py", line 280, in __call__

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/click/core.py", line 1157, in __call__

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/click/core.py", line 1078, in main

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/click/core.py", line 1688, in invoke

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/click/core.py", line 1688, in invoke

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/click/core.py", line 1434, in invoke

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/click/core.py", line 783, in invoke

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/typer/main.py", line 607, in wrapper

  File "/vast/home/pagrubel/BEE/BEE/beeflow/client/core.py", line 513, in reset
    shutil.rmtree(directory_to_delete)

  File "/projects/opt/centos8/x86_64/miniconda3/py311_23.11.0/lib/python3.11/shutil.py", line 732, in rmtree
    _rmtree_safe_fd(fd, path, onerror)

  File "/projects/opt/centos8/x86_64/miniconda3/py311_23.11.0/lib/python3.11/shutil.py", line 660, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)

  File "/projects/opt/centos8/x86_64/miniconda3/py311_23.11.0/lib/python3.11/shutil.py", line 660, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)

  File "/projects/opt/centos8/x86_64/miniconda3/py311_23.11.0/lib/python3.11/shutil.py", line 660, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)

  File "/projects/opt/centos8/x86_64/miniconda3/py311_23.11.0/lib/python3.11/shutil.py", line 666, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())

  File "/projects/opt/centos8/x86_64/miniconda3/py311_23.11.0/lib/python3.11/shutil.py", line 664, in _rmtree_safe_fd
    os.rmdir(entry.name, dir_fd=topfd)

OSError: [Errno 39] Directory not empty: 'x86_64-linux-gnu'

(hpc-beeflow-py3.11) pagrubel@darwin-fe2 beeworkdir$ beeflow list
List_workflows: Could not reach WF Manager.
(hpc-beeflow-py3.11) pagrubel@darwin-fe2 beeworkdir$ beeflow core start
Checking dependencies...
Found Charliecloud 0.37
Starting beeflow...
Run `beeflow core status` for more information.
(hpc-beeflow-py3.11) pagrubel@darwin-fe2 beeworkdir$ beeflow list
Name    ID  Status
cgt 44ff1d  Archived
cgt ff83f8  Running
(hpc-beeflow-py3.11) pagrubel@darwin-fe2 beeworkdir$ beeflow core reset
A reset will remove this directory: /vast/home/pagrubel/.beeflow

Are you sure you want to reset?

Please ensure all workflows are complete before running a reset
Check the status of workflows by running 'beeflow list'

A reset will shutdown beeflow and its components.

A reset will delete the bee_workdir directory which results in:
Removing the archive of workflows executed.
Removing the archive of workflow containers.
Reset all databases associated with the beeflow app.
Removing all beeflow logs.

Beeflow configuration files from bee_cfg will remain.

Respond with yes(y)/no(n):  y
Beeflow has been shutdown.
Waiting for components to cleanly stop.
Traceback (most recent call last):

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/bin/beeflow", line 6, in <module>
    sys.exit(main())
             ^^^^^^

  File "/vast/home/pagrubel/BEE/BEE/beeflow/client/bee_client.py", line 610, in main
    app()

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/typer/main.py", line 289, in __call__

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/typer/main.py", line 280, in __call__

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/click/core.py", line 1157, in __call__

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/click/core.py", line 1078, in main

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/click/core.py", line 1688, in invoke

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/click/core.py", line 1688, in invoke

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/click/core.py", line 1434, in invoke

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/click/core.py", line 783, in invoke

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.11/lib/python3.11/site-packages/typer/main.py", line 607, in wrapper

  File "/vast/home/pagrubel/BEE/BEE/beeflow/client/core.py", line 513, in reset
    shutil.rmtree(directory_to_delete)

  File "/projects/opt/centos8/x86_64/miniconda3/py311_23.11.0/lib/python3.11/shutil.py", line 732, in rmtree
    _rmtree_safe_fd(fd, path, onerror)

  File "/projects/opt/centos8/x86_64/miniconda3/py311_23.11.0/lib/python3.11/shutil.py", line 660, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)

  File "/projects/opt/centos8/x86_64/miniconda3/py311_23.11.0/lib/python3.11/shutil.py", line 660, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)

  File "/projects/opt/centos8/x86_64/miniconda3/py311_23.11.0/lib/python3.11/shutil.py", line 660, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)

  [Previous line repeated 1 more time]

  File "/projects/opt/centos8/x86_64/miniconda3/py311_23.11.0/lib/python3.11/shutil.py", line 683, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())

  File "/projects/opt/centos8/x86_64/miniconda3/py311_23.11.0/lib/python3.11/shutil.py", line 681, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)

OSError: [Errno 16] Device or resource busy: '.nfs020ebcd4adb47f8e00001fd0'
pagrubel commented 4 months ago

The above errors came from first an Initializing workflow and then when it was running.