lanl / BEE

Other
14 stars 3 forks source link

Can we use files that are not in the directory for the workflow such as a different yml file? #711

Closed pagrubel closed 7 months ago

pagrubel commented 11 months ago

We need to clarify how this works. In the documentation it says: Additionally, if the main_cwl and yaml files are not in the workflow directory, they will be copied into a temporary copy of the workflow directory before packaging. Compare this with the previous example.

When I tried to use a main_cwl file and a yml file that isn't in the directory specified on the submit line, it couldn't find the step cwl files. I am going to change the documentation so the example works, but we should decide what is happening here and show a use case.

pagrubel commented 11 months ago

So, I tested this out, apparently we can use a yml input file that is not in the workflow directory, however the main file we use needs to be the one in the main workflow dir. I get this error when trying to use a different main cwl which maybe doesn't make sense anyway, however we should handle the error better. Notice I'm trying to use the clamr-ffmpeg-build directory which has all the files for the workflow, but select the clamr_wf.cwl in the current directory.

ll
total 23
drwxrwxr-x 2 pagrubel pagrubel  4096 Aug 18 17:02 clamr-ffmpeg-build
-rw-rw-r-- 1 pagrubel pagrubel   404 Aug 21 13:56 clamr_job.yml
-rw-rw-r-- 1 pagrubel pagrubel  1882 Aug 21 13:55 clamr_wf.cwl
-rw-r--r-- 1 pagrubel pagrubel  4791 Aug 21 15:03 ffmpeg_stderr.txt
drwxrwxr-x 2 pagrubel pagrubel  4096 Aug 21 15:03 graphics_output
-rw-rw-r-- 1 pagrubel pagrubel  3215 Aug 18 16:49 lorem.txt
-rw-r--r-- 1 pagrubel pagrubel   302 Aug 21 13:52 occur0.txt
-rw-r--r-- 1 pagrubel pagrubel   229 Aug 21 13:52 occur1.txt
-rw-rw-r-- 1 pagrubel pagrubel 10240 Aug 21 13:53 out.tgz
-rw-rw-r-- 1 pagrubel pagrubel    64 Aug 21 15:03 total_execution_time.log
(hpc-beeflow-YDRVf3zF-py3.9) (base) pagrubel@darwin-fe1 beeworkdir2$ beeflow submit clamrb clamr-ffmpeg-build clamr_wf.cwl clamr_job.yml ~/beeworkdir2
Detected directory instead of packaged workflow. Packaging Directory...
Traceback (most recent call last):

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/schema_salad/fetcher.py", line 98, in fetch_text
    with open(urllib.request.url2pathname(str(path)), encoding="utf-8") as fp:

FileNotFoundError: [Errno 2] No such file or directory: '/vast/home/pagrubel/beeworkdir2/clamr.cwl'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/bin/beeflow", line 6, in <module>
    sys.exit(main())

  File "/vast/home/pagrubel/BEE/BEE/beeflow/client/bee_client.py", line 543, in main
    app()

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/typer/main.py", line 289, in __call__

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/typer/main.py", line 280, in __call__

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/click/core.py", line 1157, in __call__

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/click/core.py", line 1078, in main

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/click/core.py", line 1688, in invoke

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/click/core.py", line 1434, in invoke

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/click/core.py", line 783, in invoke

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/typer/main.py", line 607, in wrapper

  File "/vast/home/pagrubel/BEE/BEE/beeflow/client/bee_client.py", line 206, in submit
    workflow, tasks = parser.parse_workflow(workflow_id, str(main_cwl_path),

  File "/vast/home/pagrubel/BEE/BEE/beeflow/common/parser/parser.py", line 120, in parse_workflow
    tasks = [self.parse_step(step, workflow_id) for step in self.cwl.steps]

  File "/vast/home/pagrubel/BEE/BEE/beeflow/common/parser/parser.py", line 120, in <listcomp>
    tasks = [self.parse_step(step, workflow_id) for step in self.cwl.steps]

  File "/vast/home/pagrubel/BEE/BEE/beeflow/common/parser/parser.py", line 139, in parse_step
    step_cwl = cwl_parser.load_document(step_run)

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/cwl_utils/parser/cwl_v1_2.py", line 15494, in load_document
    return _document_load(

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/cwl_utils/parser/cwl_v1_2.py", line 605, in _document_load
    return _document_load_by_url(

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/cwl_utils/parser/cwl_v1_2.py", line 637, in _document_load_by_url
    text = loadingOptions.fetcher.fetch_text(url)

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/schema_salad/fetcher.py", line 103, in fetch_text
    raise ValidationException(str(err)) from err

schema_salad.exceptions.ValidationException: [Errno 2] No such file or directory: '/vast/home/pagrubel/beeworkdir2/clamr.cwl'
pagrubel commented 11 months ago

Selecting a different yml file does work:

beeflow submit clamrb clamr-ffmpeg-build clamr-ffmpeg-build/clamr_wf.cwl clamr_job.yml ~/beeworkdir2
Detected directory instead of packaged workflow. Packaging Directory...
Package clamr-ffmpeg-build.tgz created successfully
Workflow submitted! Your workflow id is b183d0.

I changed the time_steps to 500 in the yml file and got the expected results, less files in graphics_output and smaller movie and verified the yml file in ~/.beeflow/workflows file

pagrubel commented 10 months ago

So the use case should be: If the user wants to use a main cwl or yml, different than what is in the workflow dir, it should be copied to the temporary workflie and should end up in the archive.

pagrubel commented 8 months ago

I looked at this a bit more. The problem with just trying to use a different main cwl is that the entirer CWL specification is parsed before the temporary dir is made with the new cwl main so the other files are missing. We need to discuss if we want all the cwl files in the dir with the alternate main cwl. If so this will work and we just need to modify the documentation. parsing order

pagrubel commented 7 months ago

Addressed in #743