yuch7 / cwlexec

A new open source tool to run CWL workflows on LSF
Other
36 stars 8 forks source link

Undefined file on scatter #34

Closed drjrm3 closed 5 years ago

drjrm3 commented 5 years ago

In an attempt to overcome #20 (using the same general example as in #33) I have moved the scatter down to the lowest level. However, I found that when building a filename with valueFrom inside the scatter that it returns undefined.

steps:
  foo:
    run: foo.cwl
    scatter: input_file
    in: 
      input_file: input_files
      output_filename:
        valueFrom: ${return inputs.input_file.nameroot + ".out";}
    out:
      [output_file]

However, this is not reproducible in cwltool where the filename gets correctly built. This seems like a cwlexec specific issue, but it could also be that I am not using best practices when building a string from within a scatter.

UndefinedFile.tar.gz

mr-c commented 5 years ago

As a workaround, did you try valueFrom: $(inputs.input_file.nameroot).out ? That is also valid, and simpler.

mr-c commented 5 years ago

Okay, that doesn't work. Sticking with your original example and running with --debug we see that the scatter isn't being applied prior to evaluating the valueFrom as per the spec:

https://www.commonwl.org/v1.0/Workflow.html#WorkflowStepInput

The value of inputs in the parameter reference or expression must be the input object to the workflow step after assigning the source values, applying default, and then scattering.

(emphasis added)

08:08:35.583 default [main] DEBUG c.i.s.c.e.util.evaluator.JSEvaluator - Evaluate js expression "${return inputs.input_file.nameroot + ".out";}" with c
ontext
[var runtime={"outdirSize":"52709240832","tmpdir":"/home/jenkins/cwl-workdir/7fa42f5f-ce85-4cca-a77e-cfe2f63ea3f1/scatter_flow","tmpdirSize":"527092408
32","cores":"1","outdir":"/home/jenkins/cwl-workdir/7fa42f5f-ce85-4cca-a77e-cfe2f63ea3f1/scatter_flow","ram":"1"};, var inputs={"input_file":[{"locatio
n":"/home/jenkins/cwl-workdir/UndefinedFile/data/file3.txt","path":"/home/jenkins/cwl-workdir/UndefinedFile/data/file3.txt","basename":"file3.txt","dir
name":"/home/jenkins/cwl-workdir/UndefinedFile/data","nameroot":"file3","nameext":".txt","size":2,"secondaryFiles":[],"class":"File"},{"location":"/hom
e/jenkins/cwl-workdir/UndefinedFile/data/file4.txt","path":"/home/jenkins/cwl-workdir/UndefinedFile/data/file4.txt","basename":"file4.txt","dirname":"/
home/jenkins/cwl-workdir/UndefinedFile/data","nameroot":"file4","nameext":".txt","size":2,"secondaryFiles":[],"class":"File"}],"output_filename":null};
, var self=null;]
08:08:36.616 default [main] DEBUG c.i.s.c.e.util.evaluator.JSEvaluator - Evaluated js expression "${return inputs.input_file.nameroot + ".out";}" to un
defined.out

Here we see the value of inputs.input_file is the original array of inputs, not the post-scatter single item that it should be.

mr-c commented 5 years ago

Even running inner.cwl directly (after adding the missing ScatterFeatureRequirement) the same failure to scatter before evaluating the valueFrom is observed

mr-c commented 5 years ago

@drjrm3 Here is a workaround, as self is correctly set inside a scattered valueFrom:

inner.cwl


#!/usr/bin/env cwl-tool

cwlVersion: v1.0 class: Workflow

requirements: StepInputExpressionRequirement: {} ScatterFeatureRequirement: {}

inputs: input_files: File[]

steps: foo: run: foo.cwl scatter: [ input_file, output_filename ] scatterMethod: dotproduct in: input_file: input_files output_filename: source: input_files valueFrom: $(self.nameroot).out out: [output_file]

outputs: output_files: type: File[] outputSource: foo/output_file

adamsla commented 5 years ago

If you guys find the fix, don't forget to create a pull request and sign off per the instructions. Thanks!

skeeey commented 5 years ago

@drjrm3, I test the @mr-c method, it works. you can use it as a workaround, and to make cwlexec more compatibility, we will try to fix this problem

@mr-c my understanding, for ${return inputs.input_file.nameroot + ".out";}, the input_file should be an array in its context, for scatter case, can we introduce an implicit variable: scatter_index (like self) for each scattered item on specific level, so we can use it like ${return inputs.input_file[scatter_index].nameroot + ".out";}, I think this more clear for users, what is your opinion?

drjrm3 commented 5 years ago

@skeeey - Thanks. This does resolved the current issue I am experiencing. However, I use this example in a larger workflow and now the next step is having problem. I will work on a minimal example and open a new ticket. For now, however, I think this can be considered a good workaround.

@mr-c - Comparing your scatter with my original example, which is best practice?