workflux / workflUX

An open-source, cloud-ready web application for simplified deployment of big data workflows.
Apache License 2.0
33 stars 8 forks source link

Replace cwltool by cwl-utils for parsing #38

Open KerstenBreuer opened 4 years ago

KerstenBreuer commented 4 years ago

Replace cwltool by cwl-utils for parsing of CWL documents as soon as common-workflow-language/cwl-utils#10 is resolved.

Athanaseus commented 4 years ago

Hi @KerstenBreuer any updates on this since https://github.com/common-workflow-language/cwl-utils/issues/10 is resolved?

KerstenBreuer commented 4 years ago

Hi @Athanaseus,

Thanks for the reminder. I hope it is OK if I implement it next week?

May I ask if you experienced any problems importing CWL documents? Cwltool should currently work just fine in any of the versions listed here: https://github.com/CompEpigen/CWLab/blob/master/setup.py#L40

The decision to replace cwltool by cwl-utils was only because cwltools API frequently changes between releases.

Thanks and best wishes, Kersten

Athanaseus commented 4 years ago

@KerstenBreuer my issue was that I was trying out cwlVersion: v1.1 and I got the following error when loading my workflows:

15:30:12 - The provided CWL document is not valid, the error was: https://raw.githubusercontent.com/Athanaseus/2gc-pipeline/master/cwlfiles/bdsf_fits2lsm.cwl:1:1: Field `cwlVersion` contains undefined reference to `https://raw.githubusercontent.com/Athanaseus/2gc-pipeline/master/cwlfiles/v1.1` https://raw.githubusercontent.com/Athanaseus/2gc-pipeline/master/cwlfiles/bdsf_fits2lsm.cwl:4:1: checking field `requirements` https://raw.githubusercontent.com/Athanaseus/2gc-pipeline/master/cwlfiles/bdsf_fits2lsm.cwl:15:3: checking item Field `class` contains undefined reference to `https://raw.githubusercontent.com/Athanaseus/2gc-pipeline/master/cwlfiles/InplaceUpdateRequirement`

This runs in my environment but I think the error is due to the cwlab validator.

Athanaseus commented 4 years ago

Again just to add on this discussion, here is my github repo structure: https://github.com/Athanaseus/2gc-pipeline

Somehow I was able to load and run this individual task (https://github.com/Athanaseus/2gc-pipeline/blob/master/cwlfiles/simms.cwl) which is also referenced in my main workflow.

KerstenBreuer commented 4 years ago

I tested your workflow and, in deed, cwltool has problems loading it.

For instance running cwltool --pack --debug selfcal_simulation.cwl > /dev/null gives me:

INFO /home/kersten/.local/bin/cwltool 1.0.20191022103248
INFO Resolved 'selfcal_simulation.cwl' to 'file:///mnt/c/Users/kerst/OneDrive/home/2gc-pipeline/selfcal_simulation.cwl'
ERROR I'm sorry, I couldn't load this CWL file.
The error was:
Traceback (most recent call last):
  File "/home/kersten/.local/lib/python3.6/site-packages/cwltool/main.py", line 767, in main
    stdout.write(print_pack(loadingContext.loader, processobj, uri, metadata))
  File "/home/kersten/.local/lib/python3.6/site-packages/cwltool/main.py", line 467, in print_pack
    packed = pack(document_loader, processobj, uri, metadata)
  File "/home/kersten/.local/lib/python3.6/site-packages/cwltool/pack.py", line 210, in pack
    import_embed(packed, set())
  File "/home/kersten/.local/lib/python3.6/site-packages/cwltool/pack.py", line 108, in import_embed
    import_embed(d[k], seen)
  File "/home/kersten/.local/lib/python3.6/site-packages/cwltool/pack.py", line 94, in import_embed
    import_embed(v, seen)
  File "/home/kersten/.local/lib/python3.6/site-packages/cwltool/pack.py", line 108, in import_embed
    import_embed(d[k], seen)
  File "/home/kersten/.local/lib/python3.6/site-packages/cwltool/pack.py", line 98, in import_embed
    if d[n] in seen:
TypeError: unhashable type: 'CommentedMap'

I would guess there is some error in the CWL, however, of course, it can also be a bug of cwltool. This error message is just very uninformatory, but I will try to track it down.

Have you already sucessfully tested your workflow with any cwl runner?

Best wishes, Kersten

KerstenBreuer commented 4 years ago

Oh one more thing. I found that you're using the CommandInputEnumSchema feature.

Unfortunately, CWLab currently only supports standard CWLTypes.

But I see that this is an extremely useful feature, that can be relatively easy visualized in an GUI.

Athanaseus commented 4 years ago

Thanks for your effort @KerstenBreuer.

This seems to run for me when using cwltool. I had to install the latest cwltool (2.0.20200107113851) since cwl installed (cwltool-1.0)

(cwlab) athanaseus@athanaseus-E5550:~/Documents/WORKS/cwlab/2gc-pipeline$ cwltool selfcal_simulation.cwl selfcal_simulation.yml
INFO /home/athanaseus/.virtualenvs/cwlab/bin/cwltool 2.0.20200107113851
INFO Resolved 'selfcal_simulation.cwl' to 'file:///home/athanaseus/Documents/WORKS/cwlab/2gc-pipeline/selfcal_simulation.cwl'
INFO [workflow ] start
INFO [workflow ] starting step simms
INFO [step simms] start
INFO [job simms] /tmp/1057vj_x$ docker \
    run \
    -i \
    --volume=/tmp/1057vj_x:/uRfwdb:rw \
    --volume=/tmp/fft_o2lc:/tmp:rw \
    --workdir=/uRfwdb \
    --read-only=true \
    --user=1000:1000 \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/uRfwdb \
    --cidfile=/tmp/7d7uatn_/20200117172046-498357.cid \
    --env=USER=root \
    stimela/simms:1.2.0 \
    simms \
    --dfreq \
    1000000.0 \
    --direction \
    J2000,0deg,-30deg \
    --dtime \
    5 \
    --freq0 \
    1420000000.0 \
    --name \
    meerkat_SourceRecovery.ms \
    --nchan \
    4 \
    --synthesis-time \
    0.5 \
    --tel \
    meerkat

=========================================
The start-up time of CASA may vary
depending on whether the shared libraries
are cached or not.
=========================================

2020-01-17 15:20:50 INFO    ::casa  CASA Version  5.6.0-60  
IPython 5.1.0 -- An enhanced Interactive Python.

CASA 5.6.0-60   -- Common Astronomy Software Applications

Creating a new telemetry file
Telemetry initialized. Telemetry will send anonymized usage statistics to NRAO.
You can disable telemetry by adding the following line to your ~/.casarc file:
...
Athanaseus commented 4 years ago

@sphemakh come see this

KerstenBreuer commented 4 years ago

Hi @Athanaseus,

sorry for not, replying. Currently, I am super, busy.

I did some further testing and the error is thrown by a cwltool function called print_pack.

You can use following code to reproduce it:

cwl_path="https://raw.githubusercontent.com/Athanaseus/2gc-pipeline/master/selfcal_simulation.cwl"

from cwltool.load_tool import fetch_document, resolve_and_validate_document
from cwltool.main import print_pack
import json
loadingContext, workflowobj, uri = fetch_document(cwl_path)
loadingContext.do_update = False
loadingContext, uri = resolve_and_validate_document(loadingContext, workflowobj, uri)
processobj = loadingContext.loader.resolve_ref(uri)[0]
packed_cwl = json.loads(print_pack(loadingContext.loader, processobj, uri, loadingContext.metadata))

I think we should move this discussion to cwltool's github.

Best wishes, Kersten

KerstenBreuer commented 4 years ago

@Athanaseus, OK, that issue is solved. I will do the implementation 0f #39 on the weekend. Then the import should work just fine.

Best wishes, Kersten

SpheMakh commented 4 years ago

That's great, thanks Kersten.