rabix / bunny

[Legacy] Executor for CWL workflows. Executes sbg:draft-2 and CWL 1.0
http://rabix.io
Apache License 2.0
74 stars 28 forks source link

Workflows from cwltool --pack aren't loaded #159

Open buchanae opened 7 years ago

buchanae commented 7 years ago

Seems like bunny has an issue with workflows packed by cwltool, possibly this is a bug with $graph handling.

I'll try to describe some steps to reproduce

git clone https://github.com/common-workflow-language/workflows.git cwl-workflows

cwltool --debug --pack cwl-workflows/workflows/hello/hello-param.cwl#main > cwl-workflows/workflows/hello/hello-param-packed.cwl

java -jar rabix-backend-local/target/rabix-backend-local-1.0.0-rc3.jar cwl-workflows/workflows/hello/hello-param-packed.cwl#main
14:07:53.347 [main] DEBUG o.r.backend.local.BackendCommandLine - Config path: /Users/buchanae/projects/bunny/rabix-backend-local/config
14:07:53.349 [main] DEBUG o.r.backend.local.BackendCommandLine - Configuration directory found localy.
14:07:54.003 [main] ERROR o.r.backend.local.BackendCommandLine - Error: file://cwl-workflows/workflows/hello/hello-param-packed.cwl#main is not a valid app!

Building with mvn package -P cwl and adding a couple debug logs, I get:

RJHB552 bunny (tes-fixes)
java -jar rabix-backend-local/target/rabix-backend-local-1.0.0-rc3.jar cwl-workflows/workflows/hello/hello-param-packed.cwl
14:15:40.917 [main] DEBUG o.r.backend.local.BackendCommandLine - Config path: /Users/buchanae/projects/bunny/rabix-backend-local/config
14:15:40.919 [main] DEBUG o.r.backend.local.BackendCommandLine - Configuration directory found localy.
14:15:41.525 [main] DEBUG org.rabix.bindings.BindingsFactory - {}
java.lang.NullPointerException: current node id is null
    at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:228) ~[rabix-backend-local-1.0.0-rc3.jar:na]
    at org.rabix.bindings.cwl.resolver.CWLDocumentResolver.traverse(CWLDocumentResolver.java:270) ~[rabix-backend-local-1.0.0-rc3.jar:na]
    at org.rabix.bindings.cwl.resolver.CWLDocumentResolver.traverse(CWLDocumentResolver.java:317) ~[rabix-backend-local-1.0.0-rc3.jar:na]
    at org.rabix.bindings.cwl.resolver.CWLDocumentResolver.traverse(CWLDocumentResolver.java:336) ~[rabix-backend-local-1.0.0-rc3.jar:na]
    at org.rabix.bindings.cwl.resolver.CWLDocumentResolver.traverse(CWLDocumentResolver.java:336) ~[rabix-backend-local-1.0.0-rc3.jar:na]
    at org.rabix.bindings.cwl.resolver.CWLDocumentResolver.traverse(CWLDocumentResolver.java:336) ~[rabix-backend-local-1.0.0-rc3.jar:na]
    at org.rabix.bindings.cwl.resolver.CWLDocumentResolver.traverse(CWLDocumentResolver.java:336) ~[rabix-backend-local-1.0.0-rc3.jar:na]
    at org.rabix.bindings.cwl.resolver.CWLDocumentResolver.resolve(CWLDocumentResolver.java:142) ~[rabix-backend-local-1.0.0-rc3.jar:na]
    at org.rabix.bindings.cwl.CWLAppProcessor.loadApp(CWLAppProcessor.java:20) ~[rabix-backend-local-1.0.0-rc3.jar:na]
    at org.rabix.bindings.cwl.CWLAppProcessor.loadAppObject(CWLAppProcessor.java:25) ~[rabix-backend-local-1.0.0-rc3.jar:na]
    at org.rabix.bindings.cwl.CWLBindings.loadAppObject(CWLBindings.java:63) ~[rabix-backend-local-1.0.0-rc3.jar:na]
    at org.rabix.bindings.BindingsFactory.create(BindingsFactory.java:47) ~[rabix-backend-local-1.0.0-rc3.jar:na]
    at org.rabix.backend.local.BackendCommandLine.main(BackendCommandLine.java:280) [rabix-backend-local-1.0.0-rc3.jar:na]
14:15:41.525 [main] ERROR o.r.backend.local.BackendCommandLine - Error: file://cwl-workflows/workflows/hello/hello-param-packed.cwl is not a valid app!
14:15:41.525 [main] ERROR o.r.backend.local.BackendCommandLine - {}
org.rabix.bindings.BindingException: Cannot find binding for the payload.
    at org.rabix.bindings.BindingsFactory.create(BindingsFactory.java:66) ~[rabix-backend-local-1.0.0-rc3.jar:na]
    at org.rabix.backend.local.BackendCommandLine.main(BackendCommandLine.java:280) ~[rabix-backend-local-1.0.0-rc3.jar:na]

Not sure yet if that exposes the root cause. I tried to track down the (alleged) bug in traverse(), but didn't succeed.

StarvingMarvin commented 7 years ago

It appears that --pack mangles ids, for example outputSource: step0/echo-out-filename became "outputSource": "#main/step0/echo-out-filename" which is, I guess by the look or it, RDF processing done on ids, and bunny doesn't do RDF (nor I'm aware of the spec interpretation that would make understanding such id's necessary from the conformance standpoint). So I would treat it as a bug in cwltool, because it does more than it promisses (Combine components into single document and print).

buchanae commented 7 years ago

@tetron Can you confirm that cwltool is doing something special here?

mr-c commented 7 years ago

FYI: http://www.commonwl.org/v1.0/Workflow.html#Syntax reminds CWL implementations that the preprocessing steps described in schema salad must be applied: http://www.commonwl.org/v1.0/SchemaSalad.html#Document_preprocessing

mr-c commented 7 years ago

The addition of #main is done to disambiguate between the objects within the $graph of https://github.com/common-workflow-language/workflows/blob/master/workflows/hello/hello-param.cwl

StarvingMarvin commented 7 years ago

@mr-c So you are saying that resolved document is valid CWL1? I understood that "document should be preprocessed in this way" means that it should be comprehended/resolved by stated rules, not that the result of such processing is still a valid app. Finally there is no conformance test using form "#id" for identifiers, let alone one with prepended additional identifiers, so I'm not sure how should anyone implementing the spec figure out that they are supposed to expect such ids.

mr-c commented 7 years ago

Hey @StarvingMarvin

I found a conformance test that uses $graph and prepending the id to a step name (like #main/index/result): https://github.com/common-workflow-language/common-workflow-language/blob/master/v1.0/conformance_test_v1.0.yaml#L370 https://github.com/common-workflow-language/common-workflow-language/blob/master/v1.0/v1.0/search.cwl

mr-c commented 7 years ago

Looks like Rabix passed this test?

http://ci.commonwl.org/job/rabix-conformance/119/testReport/conformance_test_v1/0/Test_InitialWorkDirRequirement_linking_input_files_and_capturing_secondaryFiles_on_input_and_output_/

mr-c commented 7 years ago

@StarvingMarvin I agree that this is an area in need of clarification in the spec

StarvingMarvin commented 7 years ago

@mr-c, @buchanae this was quite a blunder on my side. Since we are passing conformance tests, and since the "hello" example started working when I've returned ids to their original values, I jumped to the wrong conclusions. @sinisa88 should be able to give better answer to what is going on, since he implemented most of the features needed to pass the tests.

mr-c commented 7 years ago

@StarvingMarvin No worries, it was a useful conversation

buchanae commented 7 years ago

No worries, thanks for taking the time to look into it everyone!

StarvingMarvin commented 7 years ago

It seems we have a conflict between specifications. Here is quote from the spec:

A WorkflowStepInput object must contain an id field in the form #fieldname or #stepname.fieldname. When the id field contains a period . the field name consists of the characters following the final period. This defines a field of the workflow step input object with the value of the source parameter(s).

and pack produces id in form #main/step0/echo-in-message which has additional prefix, and is separated with slashes instead of a dot.

If we look at the mentioned conformance test, it has id's as specified by the CWL spec, not in the form that pack command outputs.

tetron commented 7 years ago

What I intended to happen was to switch from dot to slash as separators between draft-3 and v1.0. So the spec text should have been updated, but was not.

The scoped identifier rules in schema salad specify slashes for constructing the concrete identifiers, which is where those ids in --pack come from.

If that's going to be a problem and not just a clarification, we should talk about it.

You're right that there needs to be additional conformance tests using concrete identifiers (including dots and slashes).

tetron commented 7 years ago

I'm working on a v1.0.1 errata spec update here: https://github.com/common-workflow-language/common-workflow-language/pull/410

StarvingMarvin commented 7 years ago

@tetron A silly question, assuming this cwl snippet:

cwlVersion: v1.0
id: toplevel
$graph:
  - id: wf
    steps:
      step_name:
        out: [<ID GOES HERE>]
     other_step_name:
     # ...

which of these (if any) would not be a valid at ID GOES HERE place?

  1. outfile
  2. outfile

  3. step_name/outfile
  4. step_name/outfile

  5. wf/step_name/outfile
  6. wf/stepname/outfile

  7. toplevel/wf/step_name/outfile
  8. toplevel/wf/step_name/outfile

  9. /toplevel/wf/step_name/outfile
  10. /toplevel/wf/step_name/outfile

  11. file:///path/to/file.cwl#toplevel/wf/step_name/outfile
  12. other_step_name/outfile
  13. something/completely/unrelated/outfile

  14. http://example.com/#outfile
  15. ou%74file
  16. toplevel/wf/step_name/ou%74file

Followup question: which would be invalid as a source on other's step input?