pepkit / looper

A job submitter for Portable Encapsulated Projects
http://looper.databio.org
BSD 2-Clause "Simplified" License
20 stars 7 forks source link

Add support for pre-submit hooks and CWL workflows #292

Closed nsheff closed 3 years ago

nsheff commented 4 years ago

Wait for:

stolarczyk commented 3 years ago

We need to update tests and piface schema to accommodate the pre submission hooks before we merge this.

stolarczyk commented 3 years ago

before we implemented the pre-submission hooks system the sample yaml path could be changed with pipeline. sample_yaml_path key. Now we want to take advantage of the new pipeline.var_templates section. Therefore, I've decided to make the looper.write_sample_yaml plugin parameterizable with pipeline.var_templates.sample_yaml_path but kept the old way (with lower priority) for backwards compatibility. But this release will make big changes in the way we save sample yamls etc, so maybe we should completely switch to the new system?

nsheff commented 3 years ago

I was able to make this work with both CWL demos here: https://github.com/pepkit/pep-cwl

@xuebingjie1990 would you mind giving those a run through to see if they work for you to test? If they work I think we can release this.

xuebingjie1990 commented 3 years ago

@xuebingjie1990 would you mind giving those a run through to see if they work for you to test? If they work I think we can release this.

sure, I'll do that first thing tomorrow

stolarczyk commented 3 years ago

I have peppy and eido releases prepared and writing some final looper tests. Do we wait with the releases until @xuebingjie1990 has finished the CWL test runs?

nsheff commented 3 years ago

Do we wait with the releases until @xuebingjie1990 has finished the CWL test runs?

yes let's do that.

xuebingjie1990 commented 3 years ago

For the CWL simple-demo: I got the pre-submission working (sample_yaml_cwl generated), but the actual jobs didn't, with following error:

Workflow error, try again with --debug for more information:
Invalid job input record:
pipeline_results/submission/frog_1_cwl.yaml:3:1: the `file` field is not valid because
                                                   is not a dict

similar issue with the bioinformatic-demo:

Resolved '/home/bx2ur/Documents/GitHubRepo/pep-cwl/bioinformatics_demo/bowtie2-tool.cwl' to 'file:///home/bx2ur/Documents/GitHubRepo/pep-cwl/bioinformatics_demo/bowtie2-tool.cwl'
bioinformatics_demo/bowtie2-tool.cwl:61:7:   Field `path` contains undefined reference to `file:///home/nsheff/code/refgenie_sandbox/hg38/bowtie2_index/default/hg38.fa`
Workflow error, try again with --debug for more information:
Invalid job input record:
pipeline_results/submission/sample2_cwl.yaml:4:1: * the `read1` field is not valid because
                                                      is not a dict
pipeline_results/submission/sample2_cwl.yaml:5:1: * the `read2` field is not valid because
                                                      is not a dict
nsheff commented 3 years ago

that seems to indicate that the correct yaml file isn't getting written, can you look into that?

xuebingjie1990 commented 3 years ago

that seems to indicate that the correct yaml file isn't getting written, can you look into that?

this is the current yaml output:

library: anySampleType
file: data/frog1_data.txt
pipeline_interfaces: cwl_interface.yaml
sample_yaml_cwl: pipeline_results/submission/frog_1_cwl.yaml

file should be a dict? what is the correct format for the cwl yaml? is it like this:

file:
  class: File
  path: hello_looper/data/frog1_data.txt

the wc-job.yaml is no longer available. I'm not sure this is everything...

xuebingjie1990 commented 3 years ago

@nsheff The sample_yaml_cwl wrote correctly. I think because I have both the released looper and the dev installed, so it was running the released looper instead. Now, I'm having trouble with the bioinformatic-demo at this line https://github.com/pepkit/pep-cwl/blob/ccc24370a1605ed565a9f94dcf497a1398a84952/bioinformatics_demo/bowtie2-tool.cwl#L61 I don't have the hg38/bowtie2_index so I'm building it now,

nsheff commented 3 years ago

yes, that makes sense -- I had hard-coded that to my path. I will need to change that. to test you can just point that to your location.

xuebingjie1990 commented 3 years ago

I edit that line to my hg38/ bowtie2_index. but still got the same error. I have the v0.4 version, so I tried the path with both data and alias dir

Calling pre-submit function: looper.write_sample_yaml_cwl
Writing sample yaml to pipeline_results/submission/sample1_sample_cwl.yaml
Writing script to /home/bx2ur/Documents/GitHubRepo/pep-cwl/pipeline_results/submission/bowtie2_alignment_sample1.sub
Job script (n=1; 0.00Gb): pipeline_results/submission/bowtie2_alignment_sample1.sub
Compute node: cphg-dlw6ch2-sheffieldlab
Start time: 2020-10-06 15:33:21
/usr/bin/cwl-runner 1.0.20180302231433
Resolved '/home/bx2ur/Documents/GitHubRepo/pep-cwl/bioinformatics_demo/bowtie2-tool.cwl' to 'file:///home/bx2ur/Documents/GitHubRepo/pep-cwl/bioinformatics_demo/bowtie2-tool.cwl'
Got workflow error
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/cwltool/executors.py", line 98, in run_jobs
    for r in jobiter:
  File "/usr/lib/python2.7/dist-packages/cwltool/command_line_tool.py", line 379, in job
    visit_class([builder.files, builder.bindings], ("File", "Directory"), _check_adjust)
  File "/usr/lib/python2.7/dist-packages/cwltool/pathmapper.py", line 51, in visit_class
    visit_class(d, cls, op)
  File "/usr/lib/python2.7/dist-packages/cwltool/pathmapper.py", line 51, in visit_class
    visit_class(d, cls, op)
  File "/usr/lib/python2.7/dist-packages/cwltool/pathmapper.py", line 48, in visit_class
    visit_class(rec[d], cls, op)
  File "/usr/lib/python2.7/dist-packages/cwltool/pathmapper.py", line 46, in visit_class
    op(rec)
  File "/usr/lib/python2.7/dist-packages/cwltool/command_line_tool.py", line 181, in check_adjust
    f["path"] = docker_windows_path_adjust(builder.pathmapper.mapper(f["location"])[1])
  File "/usr/lib/python2.7/dist-packages/cwltool/pathmapper.py", line 290, in mapper
    return self._pathmap[src]
KeyError: 'file:///home/bx2ur/Documents/Testing/refgenie/data/58de7f33a36ccd9d6e3b1b3afe6b9f37cd5b2867bbfb929a/bowtie2_index/default/58de7f33a36ccd9d6e3b1b3afe6b9f37cd5b2867bbfb929a.fa'
Workflow error, try again with --debug for more information:
'file:///home/bx2ur/Documents/Testing/refgenie/data/58de7f33a36ccd9d6e3b1b3afe6b9f37cd5b2867bbfb929a/bowtie2_index/default/58de7f33a36ccd9d6e3b1b3afe6b9f37cd5b2867bbfb929a.fa'
stolarczyk commented 3 years ago

It looks like this is a problem with the pipeline, not the pre-submission hooks system. According to @xuebingjie1990 it is working as expected in this case.

Can we proceed with the release or will fixing the pipeline require some adjustments in looper, @nsheff?

nsheff commented 3 years ago

agreed -- I will need to fix the pipeline. I think it's safe to proceed as you suggest.