rabix / bunny

[Legacy] Executor for CWL workflows. Executes sbg:draft-2 and CWL 1.0
http://rabix.io
Apache License 2.0
74 stars 28 forks source link

Bunny Bcbio support #94

Closed StarvingMarvin closed 7 years ago

StarvingMarvin commented 7 years ago

To continue discussion from #92

chapmanb commented 7 years ago

Luka and Stefan; Thanks much for starting this separate thread. I pushed the full bcbio testing repository here:

https://github.com/bcbio/test_bcbio_cwl

If you clone that and do bash run_bunny.sh you should be able to replicate the issue without needing bcbio installed. I believe the inputs are correctly nested in the input definition:

https://github.com/bcbio/test_bcbio_cwl/blob/566151a82a31d406c2613677f0eaff93b9447823/run_info-cwl-workflow/main-run_info-cwl.cwl#L8

and here is the input sample:

https://github.com/bcbio/test_bcbio_cwl/blob/566151a82a31d406c2613677f0eaff93b9447823/run_info-cwl-workflow/main-run_info-cwl-samples.json#L6

Here is the scatter:

https://github.com/bcbio/test_bcbio_cwl/blob/566151a82a31d406c2613677f0eaff93b9447823/run_info-cwl-workflow/main-run_info-cwl.cwl#L305

and this is the error I get:

org.rabix.engine.processor.handler.EventHandlerException: Port config__algorithm__align_split_size for root.alignment.1.prep_align_inputs and rootId abc9f1f9-e5f4-4efe-9fcc-70d6403ae4c6 is not a list and therefore cannot be scattered.

Happy to provide anything else that would help. Thanks again for the help working through this.

StarvingMarvin commented 7 years ago

That's definitely a bug in bunny. I've opened a separate issue with a small test case: #97

chapmanb commented 7 years ago

Thanks much, let me know if I can help with anything else. I'll definitely keep testing once this is squashed, looking forward to getting bcbio working.

chapmanb commented 7 years ago

Thanks for the latest bunny release, I'm excited to keep moving bcbio CWL support forward. I tested out with my small self contained test case (https://github.com/bcbio/test_bcbio_cwl) which you can run with the run_bunny.sh shell script. I'm running into a different null pointer error:

[2017-02-15 05:59:33.874] [ERROR] EventProcessor failed to process event OutputUpdateEvent [jobId=root.prep_samples.2, contextId=6509ef2d-1198-499a-a31d-80137c12560c, portId=config__algorithm__variant_regions, value=FileValue [size=65, path=/home/chapmanb/drive/work/cwl/test_bcbio_cwl/bunny_work/main-run_info-cwl-20170215055916194/root/prep_samples/2/bedprep/variant_regions-bam.bed, location=/home/chapmanb/drive/work/cwl/test_bcbio_cwl/bunny_work/main-run_info-cwl-20170215055916194/root/prep_samples/2/bedprep/variant_regions-bam.bed, checksum=sha1$02cb0626caa3c5d7db51b0ff867553d7ff5422d6, secondaryFiles=[], properties={sbg:metadata=null}], fromScatter=false, numberOfScattered=null].
java.lang.NullPointerException: null
        at org.rabix.engine.model.VariableRecord.expand(VariableRecord.java:101) ~[rabix-backend-local.jar:na]
        at org.rabix.engine.model.VariableRecord.addValue(VariableRecord.java:80) ~[rabix-backend-local.jar:na]
        at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:73) ~[rabix-backend-local.jar:na]
        at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:30) ~[rabix-backend-local.jar:na]
        at org.rabix.engine.processor.dispatcher.impl.SyncEventDispatcher.send(SyncEventDispatcher.java:21) ~[rabix-backend-local.jar:na]
        at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:132) ~[rabix-backend-local.jar:na]
        at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:55) ~[rabix-backend-local.jar:na]
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) ~[na:na]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_121]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
        at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-backend-local.jar:na]
        at com.sun.proxy.$Proxy37.send(Unknown Source) ~[na:na]
        at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:198) ~[rabix-backend-local.jar:na]
        at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:43) ~[rabix-backend-local.jar:na]
        at org.rabix.engine.processor.impl.EventProcessorImpl$1.run(EventProcessorImpl.java:79) ~[rabix-backend-local.jar:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_121]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]

From the logs all of the launched tasks appear to complete but then there is some issue merging the outputs. Happy to work more on this from the bcbio side with any details about what triggers this. Thank you again for all the work.

StarvingMarvin commented 7 years ago

Sorry for the late reply, I'm playing around with bcbio for the last couple of days, but I'm having hard time pinpointing the issues. First off, I was having issues running tools because what seems to be command line escaping problems, but I couldn't reproduce it on minimal example. Here is the fork where I manually escaped things in order to even get to your exception: https://github.com/StarvingMarvin/test_bcbio_cwl

Null pointer issue itself was easy to fix, because we already had similar problem on another branch which will be merged to develop soon, but than I hit what seem to be another scattering issue. So... work is still in progress, I'll link to relevant issues when we isolate them / create them.

chapmanb commented 7 years ago

Luka; Thanks much for looking at this. Apologies about the quoting problems, I just updated the bcbio CWL generation, test data and Docker container to use a non-JSON way of passing these values. This avoids the different quoting between inside and outside of Docker runs. I hope the new version will make it easier to test and evaluate. Let me know if there is anything else I can do on our side to make things run smoother.

StarvingMarvin commented 7 years ago

As I've explained in a comment on #165, shell quoting was a bit messier to fix than I expected, but should hopefully work more reliable now. I'll start digging other issues next.

StarvingMarvin commented 7 years ago

Hey, @chapmanb sorry for a bit of delay, we got sidetracked with other issues. Anyway, I've tried to run latest version of bcbio, but it seems to be broken:

File "/usr/local/bin/bcbio_nextgen.py", line 219, in <module>
runfn.process(kwargs["args"])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 28, in process
raise AttributeError("Did not find exposed function in bcbio.distributed.multitasks named '%s'" % args.name)
AttributeError: Did not find exposed function in bcbio.distributed.multitasks named 'alignment_to_rec'
chapmanb commented 7 years ago

Luka; Thanks so much for continuing to look at this. I'm giving a talk as part of the CGC course in early May and was hoping to be able to demonstrate bcbio working with bunny so definitely happy to help from my side to keep pushing this along.

The CWL has changed recently to support conversion into WDL so will need a sync of the test workflow with bcbio (https://github.com/bcbio/test_bcbio_cwl). Are you running from a Docker container or bcbio locally? From Docker Hub, the latest bcbio/bcbio should have this. If locally you need the development version of bcbio (bcbio_nextgen.py upgrade -u development). Let me know if you're still stuck and happy to look into it more. Thanks again.

StarvingMarvin commented 7 years ago

You're right, I had stale docker image. Unfortunately, even with latest image, I haven't progressed much further. I have process_alignment step within alignment subworkflow failing with this error:

ValueError: No 'files' specified for input sample: None

Backtracking from there, I think there is an issue with the results of the first step: alignment_to_rec, where out of 12 records, only some have files field set:

{
  "alignment_rec" : [ {
    "config__algorithm__align_split_size" : 25000,
    "config__algorithm__aligner" : null,
    "config__algorithm__mark_duplicates" : null,
    "reference__bowtie2__indexes" : null,
    "reference__bwa__indexes" : null,
    "reference__novoalign__indexes" : null,
    "reference__snap__indexes" : null,
    "rgnames__lane" : null,
    "rgnames__lb" : null,
    "rgnames__pl" : null,
    "rgnames__pu" : null,
    "rgnames__rg" : null,
    "rgnames__sample" : null
  }, {
    "config__algorithm__align_split_size" : 25000,
    "config__algorithm__aligner" : "snap",
    "config__algorithm__mark_duplicates" : null,
    "reference__bowtie2__indexes" : null,
    "reference__bwa__indexes" : null,
    "reference__novoalign__indexes" : null,
    "reference__snap__indexes" : null,
    "rgnames__lane" : null,
    "rgnames__lb" : null,
    "rgnames__pl" : null,
    "rgnames__pu" : null,
    "rgnames__rg" : null,
    "rgnames__sample" : null
  }, {
    "config__algorithm__align_split_size" : null,
    "config__algorithm__aligner" : "bwa",
    "config__algorithm__mark_duplicates" : "True",
    "reference__bowtie2__indexes" : null,
    "reference__bwa__indexes" : null,
    "reference__novoalign__indexes" : null,
    "reference__snap__indexes" : null,
    "rgnames__lane" : null,
    "rgnames__lb" : null,
    "rgnames__pl" : null,
    "rgnames__pu" : null,
    "rgnames__rg" : null,
    "rgnames__sample" : null
  }, {
    "config__algorithm__align_split_size" : null,
    "config__algorithm__aligner" : null,
    "config__algorithm__mark_duplicates" : "True",
    "description" : "Test1",
    "reference__bowtie2__indexes" : null,
    "reference__bwa__indexes" : null,
    "reference__novoalign__indexes" : null,
    "reference__snap__indexes" : null,
    "rgnames__lane" : null,
    "rgnames__lb" : null,
    "rgnames__pl" : null,
    "rgnames__pu" : null,
    "rgnames__rg" : null,
    "rgnames__sample" : "Test1"
  }, {
    "config__algorithm__align_split_size" : null,
    "config__algorithm__aligner" : null,
    "config__algorithm__mark_duplicates" : null,
    "description" : "Test2",
    "files" : [ {
      "checksum" : "sha1$629a1fcf47dfc625e4d6aed097aedc563c9efdbd",
      "class" : "File",
      "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam",
      "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam",
      "secondaryFiles" : [ {
        "checksum" : "sha1$1a29d9009bc278ca6fbca7fba0a61061a4a1c1b2",
        "class" : "File",
        "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai",
        "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai",
        "size" : 232
      } ],
      "size" : 3363919
    } ],
    "reference__bowtie2__indexes" : null,
    "reference__bwa__indexes" : null,
    "reference__novoalign__indexes" : null,
    "reference__snap__indexes" : null,
    "rgnames__lane" : null,
    "rgnames__lb" : null,
    "rgnames__pl" : null,
    "rgnames__pu" : null,
    "rgnames__rg" : null,
    "rgnames__sample" : "Test2"
  }, {
    "config__algorithm__align_split_size" : null,
    "config__algorithm__aligner" : null,
    "config__algorithm__mark_duplicates" : null,
    "files" : [ {
      "checksum" : "sha1$629a1fcf47dfc625e4d6aed097aedc563c9efdbd",
      "class" : "File",
      "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX.bam",
      "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX.bam",
      "secondaryFiles" : [ {
        "checksum" : "sha1$639d954149591413cdcfba6e887b86c447d34b87",
        "class" : "File",
        "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX.bam.bai",
        "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX.bam.bai",
        "size" : 184
      } ],
      "size" : 3363919
    } ],
    "reference__bowtie2__indexes" : null,
    "reference__bwa__indexes" : {
      "checksum" : "sha1$22ec87b283d1a354733eb33eaa632ee04542bbad",
      "class" : "File",
      "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.amb",
      "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.amb",
      "secondaryFiles" : [ {
        "checksum" : "sha1$6db9e9d10fd79f63c7f4be4f514244052c65b628",
        "class" : "File",
        "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.pac",
        "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.pac",
        "size" : 9145
      }, {
        "checksum" : "sha1$2b6fb1bb71c10ffda852f3e4988cb0bdd24ebf9c",
        "class" : "File",
        "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.ann",
        "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.ann",
        "size" : 64
      }, {
        "checksum" : "sha1$41af3bc76675641360d4817ffc77b99213126938",
        "class" : "File",
        "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.sa",
        "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.sa",
        "size" : 18336
      }, {
        "checksum" : "sha1$c4068635c6b5d33b2e64931ef056c26cca88f660",
        "class" : "File",
        "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.bwt",
        "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.bwt",
        "size" : 36664
      } ],
      "size" : 10
    },
    "reference__fasta__base" : {
      "checksum" : "sha1$e2ca54abb52ba4013b16f3f31d4083b8bf6de054",
      "class" : "File",
      "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa",
      "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa",
      "secondaryFiles" : [ {
        "checksum" : "sha1$f2e30d7e4f304ffd45ddd3cc26441434df8bf5fe",
        "class" : "File",
        "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai",
        "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai",
        "size" : 43
      }, {
        "checksum" : "sha1$d8584a6cb5bcdc476b4577bf89a25e215ca61449",
        "class" : "File",
        "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict",
        "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict",
        "size" : 292
      } ],
      "size" : 37196
    },
    "reference__novoalign__indexes" : null,
    "reference__snap__indexes" : null,
    "rgnames__lane" : null,
    "rgnames__lb" : null,
    "rgnames__pl" : null,
    "rgnames__pu" : null,
    "rgnames__rg" : null,
    "rgnames__sample" : null
  }, {
    "config__algorithm__align_split_size" : null,
    "config__algorithm__aligner" : null,
    "config__algorithm__mark_duplicates" : null,
    "reference__bowtie2__indexes" : null,
    "reference__bwa__indexes" : null,
    "reference__fasta__base" : {
      "checksum" : "sha1$e2ca54abb52ba4013b16f3f31d4083b8bf6de054",
      "class" : "File",
      "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa",
      "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa",
      "secondaryFiles" : [ {
        "checksum" : "sha1$f2e30d7e4f304ffd45ddd3cc26441434df8bf5fe",
        "class" : "File",
        "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai",
        "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai",
        "size" : 43
      }, {
        "checksum" : "sha1$d8584a6cb5bcdc476b4577bf89a25e215ca61449",
        "class" : "File",
        "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict",
        "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict",
        "size" : 292
      } ],
      "size" : 37196
    },
    "reference__novoalign__indexes" : null,
    "reference__snap__indexes" : {
      "checksum" : "sha1$a712a1af9110f9f81e5c9063622b799b5e1ca366",
      "class" : "File",
      "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/Genome",
      "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/Genome",
      "secondaryFiles" : [ {
        "checksum" : "sha1$0644efe12369c8d54d80388ad14576f3da1f3ccd",
        "class" : "File",
        "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/GenomeIndex",
        "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/GenomeIndex",
        "size" : 31
      }, {
        "checksum" : "sha1$2502aab068aa30174da61b7c99728b41b590c47e",
        "class" : "File",
        "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/OverflowTable",
        "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/OverflowTable",
        "size" : 2016
      }, {
        "checksum" : "sha1$2ad35f253c653dca1f08eaf0b2f365e9a8f9f15f",
        "class" : "File",
        "location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/GenomeIndexHash",
        "path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/GenomeIndexHash",
        "size" : 400544
      } ],
      "size" : 38101
    },
    "rgnames__lane" : "Test1",
    "rgnames__lb" : null,
    "rgnames__pl" : null,
    "rgnames__pu" : null,
    "rgnames__rg" : null,
    "rgnames__sample" : null
  }, {
    "config__algorithm__align_split_size" : null,
    "config__algorithm__aligner" : null,
    "config__algorithm__mark_duplicates" : null,
    "reference__bowtie2__indexes" : null,
    "reference__bwa__indexes" : null,
    "reference__novoalign__indexes" : null,
    "reference__snap__indexes" : null,
    "rgnames__lane" : "Test2",
    "rgnames__lb" : null,
    "rgnames__pl" : "illumina",
    "rgnames__pu" : null,
    "rgnames__rg" : null,
    "rgnames__sample" : null
  }, {
    "config__algorithm__align_split_size" : null,
    "config__algorithm__aligner" : null,
    "config__algorithm__mark_duplicates" : null,
    "reference__bowtie2__indexes" : null,
    "reference__bwa__indexes" : null,
    "reference__novoalign__indexes" : null,
    "reference__snap__indexes" : null,
    "rgnames__lane" : null,
    "rgnames__lb" : null,
    "rgnames__pl" : "illumina",
    "rgnames__pu" : "Test1",
    "rgnames__rg" : null,
    "rgnames__sample" : null
  }, {
    "config__algorithm__align_split_size" : null,
    "config__algorithm__aligner" : null,
    "config__algorithm__mark_duplicates" : null,
    "reference__bowtie2__indexes" : null,
    "reference__bwa__indexes" : null,
    "reference__novoalign__indexes" : null,
    "reference__snap__indexes" : null,
    "rgnames__lane" : null,
    "rgnames__lb" : null,
    "rgnames__pl" : null,
    "rgnames__pu" : "Test2",
    "rgnames__rg" : "Test1",
    "rgnames__sample" : null
  }, {
    "config__algorithm__align_split_size" : null,
    "config__algorithm__aligner" : null,
    "config__algorithm__mark_duplicates" : null,
    "reference__bowtie2__indexes" : null,
    "reference__bwa__indexes" : null,
    "reference__novoalign__indexes" : null,
    "reference__snap__indexes" : null,
    "rgnames__lane" : null,
    "rgnames__lb" : null,
    "rgnames__pl" : null,
    "rgnames__pu" : null,
    "rgnames__rg" : "Test2",
    "rgnames__sample" : "Test1"
  }, {
    "config__algorithm__align_split_size" : null,
    "config__algorithm__aligner" : null,
    "config__algorithm__mark_duplicates" : null,
    "reference__bowtie2__indexes" : null,
    "reference__bwa__indexes" : null,
    "reference__novoalign__indexes" : null,
    "reference__snap__indexes" : null,
    "rgnames__lane" : null,
    "rgnames__lb" : null,
    "rgnames__pl" : null,
    "rgnames__pu" : null,
    "rgnames__rg" : null,
    "rgnames__sample" : "Test2"
  } ]
}

It seems to me that scattered instances that got file on input propagated it correctly, but I'm not sure why are these values the way they are in the first place.

chapmanb commented 7 years ago

Luka; Thanks for digging into this further. It looks like bcbio is going wrong there -- it should return two records with all of the attributes rather than this mess of individual records with a single attribute in each. I can't reproduce this with cwltool or Toil runs so there must be something problematic about how bcbio interacts with bunny here. Which branch/release of bunny are you testing this off of? I can try to reproduce to see if I can identify the underlying issue and fix. Thanks again.

StarvingMarvin commented 7 years ago

I'm running off develop branch.

StarvingMarvin commented 7 years ago

I'm really confused: there is no scattering, there are no expressions, really not sure what can go wrong. These should be the inputs with which tool is invoked:

{
    "config__algorithm__align_split_size" : [ 25000, 25000 ],
    "config__algorithm__aligner" : [ "snap", "bwa" ],
    "config__algorithm__mark_duplicates" : [ "True", "True" ],
    "description" : [ "Test1", "Test2" ],
    "files" : [ [ {
      "path" : "../testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam",
      "name" : "7_100326_FC6107FAAXX.bam",
      "dirname" : "../testdata/100326_FC6107FAAXX",
      "nameroot" : "7_100326_FC6107FAAXX",
      "nameext" : "bam",
      "secondaryFiles" : [ {
        "class" : "File",
        "path" : "../testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai",
        "name" : "7_100326_FC6107FAAXX.bam.bai",
        "dirname" : "../testdata/100326_FC6107FAAXX",
        "nameroot" : "7_100326_FC6107FAAXX.bam",
        "nameext" : "bai",
        "secondaryFiles" : [ ],
        "properties" : {
          "sbg:metadata" : null
        },
        "class" : "File"
      } ],
      "properties" : {
        "sbg:metadata" : null
      },
      "class" : "File"
    } ], [ {
      "path" : "../testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX.bam",
      "name" : "6_100326_FC6107FAAXX.bam",
      "dirname" : "../testdata/100326_FC6107FAAXX",
      "nameroot" : "6_100326_FC6107FAAXX",
      "nameext" : "bam",
      "secondaryFiles" : [ {
        "class" : "File",
        "path" : "../testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX.bam.bai",
        "name" : "6_100326_FC6107FAAXX.bam.bai",
        "dirname" : "../testdata/100326_FC6107FAAXX",
        "nameroot" : "6_100326_FC6107FAAXX.bam",
        "nameext" : "bai",
        "secondaryFiles" : [ ],
        "properties" : {
          "sbg:metadata" : null
        },
        "class" : "File"
      } ],
      "properties" : {
        "sbg:metadata" : null
      },
      "class" : "File"
    } ] ],
    "reference__bwa__indexes" : [ null, {
      "path" : "../testdata/genomes/hg19/bwa/hg19.fa.amb",
      "name" : "hg19.fa.amb",
      "dirname" : "../testdata/genomes/hg19/bwa",
      "nameroot" : "hg19.fa",
      "nameext" : "amb",
      "secondaryFiles" : [ {
        "class" : "File",
        "path" : "../testdata/genomes/hg19/bwa/hg19.fa.ann",
        "name" : "hg19.fa.ann",
        "dirname" : "../testdata/genomes/hg19/bwa",
        "nameroot" : "hg19.fa",
        "nameext" : "ann",
        "secondaryFiles" : [ ],
        "properties" : {
          "sbg:metadata" : null
        },
        "class" : "File"
      }, {
        "class" : "File",
        "path" : "../testdata/genomes/hg19/bwa/hg19.fa.bwt",
        "name" : "hg19.fa.bwt",
        "dirname" : "../testdata/genomes/hg19/bwa",
        "nameroot" : "hg19.fa",
        "nameext" : "bwt",
        "secondaryFiles" : [ ],
        "properties" : {
          "sbg:metadata" : null
        },
        "class" : "File"
      }, {
        "class" : "File",
        "path" : "../testdata/genomes/hg19/bwa/hg19.fa.pac",
        "name" : "hg19.fa.pac",
        "dirname" : "../testdata/genomes/hg19/bwa",
        "nameroot" : "hg19.fa",
        "nameext" : "pac",
        "secondaryFiles" : [ ],
        "properties" : {
          "sbg:metadata" : null
        },
        "class" : "File"
      }, {
        "class" : "File",
        "path" : "../testdata/genomes/hg19/bwa/hg19.fa.sa",
        "name" : "hg19.fa.sa",
        "dirname" : "../testdata/genomes/hg19/bwa",
        "nameroot" : "hg19.fa",
        "nameext" : "sa",
        "secondaryFiles" : [ ],
        "properties" : {
          "sbg:metadata" : null
        },
        "class" : "File"
      } ],
      "properties" : {
        "sbg:metadata" : null
      },
      "class" : "File"
    } ],
    "reference__fasta__base" : [ {
      "path" : "../testdata/genomes/hg19/seq/hg19.fa",
      "name" : "hg19.fa",
      "dirname" : "../testdata/genomes/hg19/seq",
      "nameroot" : "hg19",
      "nameext" : "fa",
      "secondaryFiles" : [ {
        "class" : "File",
        "path" : "../testdata/genomes/hg19/seq/hg19.fa.fai",
        "name" : "hg19.fa.fai",
        "dirname" : "../testdata/genomes/hg19/seq",
        "nameroot" : "hg19.fa",
        "nameext" : "fai",
        "secondaryFiles" : [ ],
        "properties" : {
          "sbg:metadata" : null
        },
        "class" : "File"
      }, {
        "class" : "File",
        "path" : "../testdata/genomes/hg19/seq/hg19.dict",
        "name" : "hg19.dict",
        "dirname" : "../testdata/genomes/hg19/seq",
        "nameroot" : "hg19",
        "nameext" : "dict",
        "secondaryFiles" : [ ],
        "properties" : {
          "sbg:metadata" : null
        },
        "class" : "File"
      } ],
      "properties" : {
        "sbg:metadata" : null
      },
      "class" : "File"
    }, {
      "path" : "../testdata/genomes/hg19/seq/hg19.fa",
      "name" : "hg19.fa",
      "dirname" : "../testdata/genomes/hg19/seq",
      "nameroot" : "hg19",
      "nameext" : "fa",
      "secondaryFiles" : [ {
        "class" : "File",
        "path" : "../testdata/genomes/hg19/seq/hg19.fa.fai",
        "name" : "hg19.fa.fai",
        "dirname" : "../testdata/genomes/hg19/seq",
        "nameroot" : "hg19.fa",
        "nameext" : "fai",
        "secondaryFiles" : [ ],
        "properties" : {
          "sbg:metadata" : null
        },
        "class" : "File"
      }, {
        "class" : "File",
        "path" : "../testdata/genomes/hg19/seq/hg19.dict",
        "name" : "hg19.dict",
        "dirname" : "../testdata/genomes/hg19/seq",
        "nameroot" : "hg19",
        "nameext" : "dict",
        "secondaryFiles" : [ ],
        "properties" : {
          "sbg:metadata" : null
        },
        "class" : "File"
      } ],
      "properties" : {
        "sbg:metadata" : null
      },
      "class" : "File"
    } ],
    "reference__snap__indexes" : [ {
      "path" : "../testdata/genomes/hg19/snap/Genome",
      "name" : "Genome",
      "dirname" : "../testdata/genomes/hg19/snap",
      "nameroot" : "Genome",
      "secondaryFiles" : [ {
        "class" : "File",
        "path" : "../testdata/genomes/hg19/snap/GenomeIndex",
        "name" : "GenomeIndex",
        "dirname" : "../testdata/genomes/hg19/snap",
        "nameroot" : "GenomeIndex",
        "secondaryFiles" : [ ],
        "properties" : {
          "sbg:metadata" : null
        },
        "class" : "File"
      }, {
        "class" : "File",
        "path" : "../testdata/genomes/hg19/snap/GenomeIndexHash",
        "name" : "GenomeIndexHash",
        "dirname" : "../testdata/genomes/hg19/snap",
        "nameroot" : "GenomeIndexHash",
        "secondaryFiles" : [ ],
        "properties" : {
          "sbg:metadata" : null
        },
        "class" : "File"
      }, {
        "class" : "File",
        "path" : "../testdata/genomes/hg19/snap/OverflowTable",
        "name" : "OverflowTable",
        "dirname" : "../testdata/genomes/hg19/snap",
        "nameroot" : "OverflowTable",
        "secondaryFiles" : [ ],
        "properties" : {
          "sbg:metadata" : null
        },
        "class" : "File"
      } ],
      "properties" : {
        "sbg:metadata" : null
      },
      "class" : "File"
    }, null ],
    "rgnames__lane" : [ "Test1", "Test2" ],
    "rgnames__lb" : [ null, null ],
    "rgnames__pl" : [ "illumina", "illumina" ],
    "rgnames__pu" : [ "Test1", "Test2" ],
    "rgnames__rg" : [ "Test1", "Test2" ],
    "rgnames__sample" : [ "Test1", "Test2" ],
    "sentinel_outputs" : "alignment_rec",
    "sentinel_parallel" : "multi-combined"
  }

and this is the resulting command line:

bcbio_nextgen.py runfn alignment_to_rec cwl sentinel_runtime=cores,2,ram,4096 \
config__algorithm__align_split_size=25000 \
config__algorithm__align_split_size=25000 \
config__algorithm__aligner=snap \
config__algorithm__aligner=bwa \
config__algorithm__mark_duplicates=True \
config__algorithm__mark_duplicates=True \
description=Test1 description=Test2 \
files=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam \
files=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX.bam \
reference__bwa__indexes=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.amb \
reference__fasta__base=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa \
reference__fasta__base=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa \
reference__snap__indexes=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/Genome \
rgnames__lane=Test1 rgnames__lane=Test2 \
rgnames__pl=illumina rgnames__pl=illumina \
rgnames__pu=Test1 rgnames__pu=Test2 \
rgnames__rg=Test1 rgnames__rg=Test2 \
rgnames__sample=Test1 rgnames__sample=Test2 \
sentinel_parallel=multi-combined sentinel_outputs=alignment_rec

actually, looking at the command line, it seems that additional files got lost, bacause nested inputBinding got ignored. Will explore further tomorrow.

StarvingMarvin commented 7 years ago

Here is command line generatod by cwltool:

bcbio_nextgen.py \
    runfn \
    alignment_to_rec \
    cwl \
    sentinel_runtime=cores,2,ram,4096 \
    files=/var/lib/cwl/stg93d6aba6-4bef-4488-ad46-4a70cd2baf60/7_100326_FC6107FAAXX.bam \
    config__algorithm__align_split_size=25000 \
    reference__fasta__base=/var/lib/cwl/stgba4d1d78-f65b-4b98-96a2-84e57bba5b7e/hg19.fa \
    rgnames__pl=illumina \
    rgnames__sample=Test1 \
    rgnames__pu=Test1 \
    rgnames__lane=Test1 \
    rgnames__rg=Test1 \
    reference__snap__indexes=/var/lib/cwl/stg30f423b1-d0df-4707-bcc6-16ae120bbc96/Genome \
    config__algorithm__aligner=snap \
    config__algorithm__mark_duplicates=True \
    description=Test1 \
    sentinel_parallel=multi-combined \
    files=/var/lib/cwl/stgd8f8c10c-3a42-433d-8a7f-9885b1ce1417/6_100326_FC6107FAAXX.bam \
    config__algorithm__align_split_size=25000 \
    reference__fasta__base=/var/lib/cwl/stgba4d1d78-f65b-4b98-96a2-84e57bba5b7e/hg19.fa \
    rgnames__pl=illumina \
    rgnames__sample=Test2 \
    rgnames__pu=Test2 \
    rgnames__lane=Test2 \
    rgnames__rg=Test2 \
    reference__bwa__indexes=/var/lib/cwl/stgd90e6c3c-5c4f-4059-9042-ce955fb07023/hg19.fa.amb \
    config__algorithm__aligner=bwa \
    config__algorithm__mark_duplicates=True \
    description=Test2 \
    sentinel_outputs=alignment_rec

Only difference I can tell is ordering...

StarvingMarvin commented 7 years ago

That's actually probably it. cwltool has "remove implicit 0 position prior to sorting" algorithm. Bunny treats all outer bindings (on list level, not item level) as position: 0 and sorts them alphabetically.

chapmanb commented 7 years ago

Luka; Thanks for digging. I got the development branch built and can replicate the problem. You've got it: the issue is with bcbio in that for multi-record inputs like this we're implicitly assuming a different argument order for arguments present multiple times. cwltool/Toil generate all of one set of arguments for an input, then generate the next set of arguments. In contrast, bunny generates these as groups (all of the config__algorithm__align_split_size, then all of the config__algorithm__aligner) so we're not splitting them into two records right on the back side.

It seems like both methods are valid approaches, although bunny here is being a bit more lax, since there is an explicit position specification in the tool description that the alphabetical ordering ignores.

I could work on bcbio supporting the alphabetical approach, although it does get complicated this way for things like reference__bwa__indices which are only in a single sample and null in the other so do not get included on the command line.

StarvingMarvin commented 7 years ago

But that's the thing: position isn't property of an input, it's a property of an element of the list, because the input binding is nested inside a type field.

StarvingMarvin commented 7 years ago

Couple of questions: If you need array of records, why don't you provide it as an input, but instead reconstruct it from lists. If you really want to provide it as lists, you could transform it through relatively simple javascript expression and sidestep any command line parsing issues. Finally, if you want to do it from python, you can just dump input job as json and consume it from your script:

requirements:
  InlineJavascriptRequirement: {}
  InitialWorkDirRequirement:
    listing:
      - entryname: inputs.json
        entry: $(JSON.stringify(inputs))
      - entryname: runtime.json
        entry: $(JSON.stringify(runtime))

instead of, again, messing around with argument parsing

chapmanb commented 7 years ago

Luka; Thanks for all the thoughts on this. Practically I pushed a fix to bcbio and have a new bcbio/bcbio Docker container that will avoid the problem. If you update, it gets past that step with the devel branch but then fails at a subsequent scatter step and I'm not able to diagnose what goes wrong.

More generally, I'm open to new ways for representing this as the current setup is problematic with null inputs when some records have null and others don't. Then we don't have a good way to know which sample the non-null records get assigned to. I'm thinking of converting nulls into empty strings to work around that.

The JSON dumping idea is really cool but I've been trying to build up a workflow representation that we can convert to WDL for portability. As soon as I venture into JSON manipulation it makes it hard to do this. Is there a way to replicate the JSON dumping in WDL?

Thanks again for all this help and discussion.

StarvingMarvin commented 7 years ago

@chapmanb it doesn't even fail for me, it just gets stuck... will investigate it further. Unfortunately I'm not acquainted enough with WDL, to give you any suggestions in that regard.

StarvingMarvin commented 7 years ago

We are getting close, but now pipeline summary is failing.

bcbio_nextgen.py runfn pipeline_summary cwl \
sentinel_runtime=cores,2,ram,4096 \
sentinel_parallel=multi-parallel \
sentinel_outputs=summary__qc,summary__metrics \
description=Test2 \
reference__fasta__base=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa \
config__algorithm__coverage_interval=regional genome_build=hg19 \
config__algorithm__coverage=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/run_info-cwl-workflow/2f5b7a00-9a6a-44e3-8cda-6f2698207e2e/root/prep_samples/2/bedprep/cov-coverage_transcripts-bam.bed \
'config__algorithm__tools_off=gemini;;vqsr' \
config__algorithm__qc=variants \
analysis=variant2 \
'config__algorithm__tools_on=gvcf;;qualimap_full' \
config__algorithm__variant_regions=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/run_info-cwl-workflow/2f5b7a00-9a6a-44e3-8cda-6f2698207e2e/root/prep_samples/2/bedprep/variant_regions-bam.bed \
align_bam=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/run_info-cwl-workflow/2f5b7a00-9a6a-44e3-8cda-6f2698207e2e/root/alignment/2/merge_split_alignments/align/Test2/Test2-sort.bam \
config__algorithm__variant_regions_merged=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/run_info-cwl-workflow/2f5b7a00-9a6a-44e3-8cda-6f2698207e2e/root/prep_samples/2/bedprep/variant_regions-bam-merged.bed \
config__algorithm__coverage_merged=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/run_info-cwl-workflow/2f5b7a00-9a6a-44e3-8cda-6f2698207e2e/root/prep_samples/2/bedprep/cov-coverage_transcripts-bam-merged.bed
[2017-04-10T13:46Z] b0031376c44b: QC: Test2 v, a, r, i, a, n, t, s
[2017-04-10T13:46Z] b0031376c44b: Uncaught exception occurred
Traceback (most recent call last):
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 53, in process
    out = fn(fnargs)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 51, in wrapper
    return apply(f, *args, **kwargs)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 181, in pipeline_summary
    return qcsummary.pipeline_summary(*args)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/qcsummary.py", line 66, in pipeline_summary
    data["summary"] = _run_qc_tools(work_bam, data)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/qcsummary.py", line 134, in _run_qc_tools
    qc_fn = tools[program_name]
KeyError: 'v'
Traceback (most recent call last):
  File "/usr/local/bin/bcbio_nextgen.py", line 219, in <module>
    runfn.process(kwargs["args"])
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 53, in process
    out = fn(fnargs)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 51, in wrapper
    return apply(f, *args, **kwargs)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 181, in pipeline_summary
    return qcsummary.pipeline_summary(*args)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/qcsummary.py", line 66, in pipeline_summary
    data["summary"] = _run_qc_tools(work_bam, data)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/qcsummary.py", line 134, in _run_qc_tools
    qc_fn = tools[program_name]
KeyError: 'v'

Here is for reference command line from cwltool:

    bcbio_nextgen.py \
    runfn \
    pipeline_summary \
    cwl \
    sentinel_runtime=cores,2,ram,4096 \
    sentinel_parallel=multi-parallel \
    sentinel_outputs=summary__qc,summary__metrics \
    description=Test2 \
    reference__fasta__base=/var/lib/cwl/stg96269ba9-0025-4296-b9fb-ebe202cfc45d/hg19.fa \
    config__algorithm__coverage_interval=regional \
    genome_build=hg19 \
    config__algorithm__coverage=None \
    'config__algorithm__tools_off=[u'"'"'gemini'"'"', u'"'"'vqsr'"'"']' \
    'config__algorithm__qc=[u'"'"'fastqc'"'"']' \
    analysis=variant2 \
    'config__algorithm__tools_on=[u'"'"'gvcf'"'"', u'"'"'qualimap_full'"'"']' \
    config__algorithm__variant_regions=/var/lib/cwl/stgd07f7077-2537-47b5-ab08-525c41f9bcf2/variant_regions-bam.bed \
    align_bam=/var/lib/cwl/stg3d328270-d728-4d4e-bd4b-9dc262c59227/Test2-sort.bam \
    config__algorithm__variant_regions_merged=/var/lib/cwl/stgc7e6a862-99d0-4cfe-93d5-d20fe86550db/variant_regions-bam-merged.bed \
    config__algorithm__coverage_merged=None

(this tool is scattered to 15 sub-jobs, so these parameters might not be from the exact same invocation, but they kinda look they are)

What bothers me is that cwltool seem to dump serialized python lists on the command line and it works, where rabix actually joins strings as requested in bindings, yet it's rabix that fails. It seems to me that there are complementary bugs in workflow or cwltool or both that make things pass.

Bunny code that runs this far is on bug/cwl-links-processing branch.

chapmanb commented 7 years ago

Luka; Thanks for the update, that's brilliant progress. Apologies, this was a bug in bcbio since it should be checking for the config__algorithm__qc parameter to be either a list or single item and doing the right thing with the single item. I pushed a fix for this so if you update to the latest Docker instance (or development version), it should not work correctly.

Regarding the larger question of how to represent these, they are a bit confusing since they're nested lists of lists. I think the bunny behavior of separating with ;; is right rather than dumping json, but I have shims in bcbio to handle both cases:

https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/distributed/runfn.py#L289

The cwltool version worked because it was a list of a single item but I should have been treating your single item the same way.

Hope this gets it running cleanly now. Let me know if you run into anything else at all.

simonovic86 commented 7 years ago

@chapmanb

Now it's fixed. Thanks! multiqc_summary app is failing now. This is the stacktrace:

Traceback (most recent call last):
  File "/usr/local/bin/bcbio_nextgen.py", line 219, in <module>
    runfn.process(kwargs["args"])
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 45, in process
    fnargs, parallel, out_keys = _world_from_cwl(fnargs[1:], work_dir)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 194, in _world_from_cwl
    out = _split_groups_finalize_cwl(dict(grouped_keys), data, work_dir, passed_keys, output_cwl_keys, runtime)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 237, in _split_groups_finalize_cwl
    val = _resolve_null_vals(key, vals, reci, num_recs)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 258, in _resolve_null_vals
    raise ValueError("Unsure how to resolve uneven values for %s: %s" % (key, vals))
ValueError: Unsure how to resolve uneven values for summary__qc: ['/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/1/qc/Test1/fastqc/fastqc_report.html', '/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/2/qc/Test1/qualimap/genome_results.txt', '/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/3/qc/Test1/samtools/Test1.txt', '/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/5/qc/Test1/coverage/Test1_bcbio_coverage.txt', '/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/9/qc/Test2/fastqc/fastqc_report.html', '/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/10/qc/Test2/qualimap/genome_results.txt', '/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/11/qc/Test2/samtools/Test2.txt', '/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/13/qc/Test2/coverage/Test2_bcbio_coverage.txt']

I'll investigate further.

StarvingMarvin commented 7 years ago

@chapmanb It is possible that there is also a bug in bunny, and that ;; join is on the wrong level considering list is nested, but then tools should probably be changed to specify both how inner and outer lists are joined (or not joined for that matter). It's possible that because inner list doesn't have an input binding, cwltool does something like str(value) and gets serialization of python value.

StarvingMarvin commented 7 years ago

Difference between command line that cwltool and rabix provide (other than previous two issues with order and lists) is that 8th summary__qc parameter with cwltool is "variant_regions-bam-merged-padded.bed" but with bunny it's "Test2_bcbio_coverage.txt". It's 225 parameters long, so not sure if there are other differences.

chapmanb commented 7 years ago

Luka and Janko; Thanks for digging into this. This is another tricky case where we have multiple records to collapse together. I don't think bcbio is handling this in the best way, but I pushed a fix to avoid the issue and keep us moving and will think more about how best to handle this longer term. With this in place I now get stuck with a scatter error:

[2017-04-11 06:42:10.764] [ERROR] EventProcessor failed to process event JobStatusEvent [jobId=root.variantcall.2.variantcall_batch_region.4, state=COMPLETED, contextId=f478b821-02fe-49d0-8bda-5e5d64e38fb7, result={region=chr22:15068-15500, vrn_file_region=FileValue [size=1770, path=/home/chapmanb/drive/work/cwl/test_bcbio_cwl/bunny_work/f478b821-02fe-49d0-8bda-5e5d64e38fb7/root/variantcall/2/variantcall_batch_region/4/vardict/chr22/b1-chr22_15068_15500.vcf.gz, location=/home/chapmanb/drive/work/cwl/test_bcbio_cwl/bunny_work/f478b821-02fe-49d0-8bda-5e5d64e38fb7/root/variantcall/2/variantcall_batch_region/4/vardict/chr22/b1-chr22_15068_15500.vcf.gz, checksum=sha1$20f84394ceed675bee3b8910c32a03aeacba006e, secondaryFiles=[FileValue [size=72, path=/home/chapmanb/drive/work/cwl/test_bcbio_cwl/bunny_work/f478b821-02fe-49d0-8bda-5e5d64e38fb7/root/variantcall/2/variantcall_batch_region/4/vardict/chr22/b1-chr22_15068_15500.vcf.gz.tbi, location=/home/chapmanb/drive/work/cwl/test_bcbio_cwl/bunny_work/f478b821-02fe-49d0-8bda-5e5d64e38fb7/root/variantcall/2/variantcall_batch_region/4/vardict/chr22/b1-chr22_15068_15500.vcf.gz.tbi, checksum=sha1$27a0f22e02be99c928cf13e818eb7523636c5929, secondaryFiles=[], properties={sbg:metadata=null}]], properties={sbg:metadata=null}]}].
org.rabix.engine.repository.TransactionHelper$TransactionException: org.rabix.engine.processor.handler.EventHandlerException: Port region for root.variantcall.2.variantcall_batch_region and rootId f478b821-02fe-49d0-8bda-5e5d64e38fb7 is not a list and therefore cannot be scattered.
        at org.rabix.engine.processor.impl.EventProcessorImpl.handle(EventProcessorImpl.java:144) ~[rabix-backend-local-1.0.0-rc3.jar:na]
        at org.rabix.engine.processor.impl.EventProcessorImpl.access$300(EventProcessorImpl.java:38) ~[rabix-backend-local-1.0.0-rc3.jar:na]
        at org.rabix.engine.processor.impl.EventProcessorImpl$1$1.call(EventProcessorImpl.java:93) ~[rabix-backend-local-1.0.0-rc3.jar:na]
        at org.rabix.engine.processor.impl.EventProcessorImpl$1$1.call(EventProcessorImpl.java:90) ~[rabix-backend-local-1.0.0-rc3.jar:na]
        at org.rabix.engine.memory.InMemoryRepositoryRegistry.doInTransaction(InMemoryRepositoryRegistry.java:82) ~[rabix-backend-local-1.0.0-rc3.jar:na]
        at org.rabix.engine.processor.impl.EventProcessorImpl$1.run(EventProcessorImpl.java:90) ~[rabix-backend-local-1.0.0-rc3.jar:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_121]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
Caused by: org.rabix.engine.processor.handler.EventHandlerException: Port region for root.variantcall.2.variantcall_batch_region and rootId f478b821-02fe-49d0-8bda-5e5d64e38fb7 is not a list and therefore cannot be scattered.
        at org.rabix.engine.processor.handler.impl.ScatterHandler.scatterPort(ScatterHandler.java:98) ~[rabix-backend-local-1.0.0-rc3.jar:na]
        at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:83) ~[rabix-backend-local-1.0.0-rc3.jar:na]
        at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:30) ~[rabix-backend-local-1.0.0-rc3.jar:na]
        at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:175) ~[rabix-backend-local-1.0.0-rc3.jar:na]
        at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:54) ~[rabix-backend-local-1.0.0-rc3.jar:na]
        at sun.reflect.GeneratedMethodAccessor79.invoke(Unknown Source) ~[na:na]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_121]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
        at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-backend-local-1.0.0-rc3.jar:na]
        at com.sun.proxy.$Proxy37.send(Unknown Source) ~[na:na]
        at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:198) ~[rabix-backend-local-1.0.0-rc3.jar:na]
        at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:41) ~[rabix-backend-local-1.0.0-rc3.jar:na]
        at org.rabix.engine.processor.impl.EventProcessorImpl.handle(EventProcessorImpl.java:142) ~[rabix-backend-local-1.0.0-rc3.jar:na]
        ... 8 common frames omitted

Thanks for all the help here, excited to be making so much progress.

chapmanb commented 7 years ago

Sorry I meant to mention that the fix is pushed to the latest bcbio/bcbio docker container so updating should get you past the multiqc problem. Thanks again.

simonovic86 commented 7 years ago

Thanks for the feedback! We've fixed that and now the whole workflow is passing. These are the results:

{
  "align_bam" : [ {
    "basename" : "Test1-sort.bam",
    "checksum" : "sha1$a3c2c1daca6b598a406a1a07a2d5377eae4695e6",
    "class" : "File",
    "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/1/merge_split_alignments/align/Test1",
    "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/1/merge_split_alignments/align/Test1/Test1-sort.bam",
    "nameext" : "bam",
    "nameroot" : "Test1-sort",
    "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/1/merge_split_alignments/align/Test1/Test1-sort.bam",
    "secondaryFiles" : [ {
      "basename" : "Test1-sort.bam.bai",
      "checksum" : "sha1$f2c5b96bef1cfe90c45d666567e0ef4f0adc1388",
      "class" : "File",
      "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/1/merge_split_alignments/align/Test1",
      "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/1/merge_split_alignments/align/Test1/Test1-sort.bam.bai",
      "nameext" : "bai",
      "nameroot" : "Test1-sort.bam",
      "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/1/merge_split_alignments/align/Test1/Test1-sort.bam.bai",
      "secondaryFiles" : [ ],
      "size" : 232
    } ],
    "size" : 4056894
  }, {
    "basename" : "Test2-sort.bam",
    "checksum" : "sha1$093662882cd986a4f54ced68d5ef61a84e64c8bc",
    "class" : "File",
    "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/2/merge_split_alignments/align/Test2",
    "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/2/merge_split_alignments/align/Test2/Test2-sort.bam",
    "nameext" : "bam",
    "nameroot" : "Test2-sort",
    "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/2/merge_split_alignments/align/Test2/Test2-sort.bam",
    "secondaryFiles" : [ {
      "basename" : "Test2-sort.bam.bai",
      "checksum" : "sha1$fb55b6066798c62e45a04f04991eb96caf1adf6a",
      "class" : "File",
      "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/2/merge_split_alignments/align/Test2",
      "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/2/merge_split_alignments/align/Test2/Test2-sort.bam.bai",
      "nameext" : "bai",
      "nameroot" : "Test2-sort.bam",
      "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/2/merge_split_alignments/align/Test2/Test2-sort.bam.bai",
      "secondaryFiles" : [ ],
      "size" : 232
    } ],
    "size" : 3882031
  } ],
  "summary__multiqc" : [ {
    "basename" : "multiqc_report.html",
    "checksum" : "sha1$e66e6dfaa2e84b05fd66578c4a944457f803120f",
    "class" : "File",
    "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc",
    "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_report.html",
    "nameext" : "html",
    "nameroot" : "multiqc_report",
    "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_report.html",
    "secondaryFiles" : [ {
      "basename" : "multiqc_bcbio_metrics.txt",
      "checksum" : "sha1$6061af0d6bdab4dc25c495796b1919a8dd0b9de4",
      "class" : "File",
      "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data",
      "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_bcbio_metrics.txt",
      "nameext" : "txt",
      "nameroot" : "multiqc_bcbio_metrics",
      "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_bcbio_metrics.txt",
      "secondaryFiles" : [ ],
      "size" : 582
    }, {
      "basename" : "multiqc_fastqc.txt",
      "checksum" : "sha1$6df234655ec306e4145e15efcb7f23714f2c7226",
      "class" : "File",
      "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data",
      "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_fastqc.txt",
      "nameext" : "txt",
      "nameroot" : "multiqc_fastqc",
      "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_fastqc.txt",
      "secondaryFiles" : [ ],
      "size" : 736
    }, {
      "basename" : "multiqc_general_stats.txt",
      "checksum" : "sha1$9421cf4a00698c5021204d9c78394e11bc647f6a",
      "class" : "File",
      "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data",
      "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_general_stats.txt",
      "nameext" : "txt",
      "nameroot" : "multiqc_general_stats",
      "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_general_stats.txt",
      "secondaryFiles" : [ ],
      "size" : 1234
    }, {
      "basename" : "multiqc_samtools_stats.txt",
      "checksum" : "sha1$9e44459b7d4d77019724881fab09225067070ce4",
      "class" : "File",
      "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data",
      "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_samtools_stats.txt",
      "nameext" : "txt",
      "nameroot" : "multiqc_samtools_stats",
      "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_samtools_stats.txt",
      "secondaryFiles" : [ ],
      "size" : 1352
    }, {
      "basename" : "multiqc_sources.txt",
      "checksum" : "sha1$9d4be4722cc6f1b680403139b96975c6ace52561",
      "class" : "File",
      "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data",
      "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_sources.txt",
      "nameext" : "txt",
      "nameroot" : "multiqc_sources",
      "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_sources.txt",
      "secondaryFiles" : [ ],
      "size" : 2884
    }, {
      "basename" : "seqbuster_isomirs.txt",
      "checksum" : "sha1$ce200716f65cb44584bc8423b82cc3ad2fb13e3f",
      "class" : "File",
      "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data",
      "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/seqbuster_isomirs.txt",
      "nameext" : "txt",
      "nameroot" : "seqbuster_isomirs",
      "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/seqbuster_isomirs.txt",
      "secondaryFiles" : [ ],
      "size" : 7
    }, {
      "basename" : "seqbuster_mirs.txt",
      "checksum" : "sha1$ce200716f65cb44584bc8423b82cc3ad2fb13e3f",
      "class" : "File",
      "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data",
      "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/seqbuster_mirs.txt",
      "nameext" : "txt",
      "nameroot" : "seqbuster_mirs",
      "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/seqbuster_mirs.txt",
      "secondaryFiles" : [ ],
      "size" : 7
    }, {
      "basename" : "Test1_bcbio.txt",
      "checksum" : "sha1$1db1ae7ddb10ed15e74df868fafafdda5e78003e",
      "class" : "File",
      "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics",
      "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics/Test1_bcbio.txt",
      "nameext" : "txt",
      "nameroot" : "Test1_bcbio",
      "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics/Test1_bcbio.txt",
      "secondaryFiles" : [ ],
      "size" : 505
    }, {
      "basename" : "Test2_bcbio.txt",
      "checksum" : "sha1$ab6d1bf16a8ea02343d6178ad0ae2f521fff5785",
      "class" : "File",
      "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics",
      "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics/Test2_bcbio.txt",
      "nameext" : "txt",
      "nameroot" : "Test2_bcbio",
      "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics/Test2_bcbio.txt",
      "secondaryFiles" : [ ],
      "size" : 507
    }, {
      "basename" : "target_info.yaml",
      "checksum" : "sha1$040f44a4b8e907955769d261cb8c33bc3d2373c8",
      "class" : "File",
      "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics",
      "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics/target_info.yaml",
      "nameext" : "yaml",
      "nameroot" : "target_info",
      "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics/target_info.yaml",
      "secondaryFiles" : [ ],
      "size" : 114
    }, {
      "basename" : "qc-coverage-report-run.R",
      "checksum" : "sha1$29a43161623d09d080ab1216934be40969c4373d",
      "class" : "File",
      "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report",
      "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/qc-coverage-report-run.R",
      "nameext" : "R",
      "nameroot" : "qc-coverage-report-run",
      "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/qc-coverage-report-run.R",
      "secondaryFiles" : [ ],
      "size" : 180
    }, {
      "basename" : "report-ready.Rmd",
      "checksum" : "sha1$591462c9e329f1223ed339bc815f60d0f7a5e019",
      "class" : "File",
      "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report",
      "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/report-ready.Rmd",
      "nameext" : "Rmd",
      "nameroot" : "report-ready",
      "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/report-ready.Rmd",
      "secondaryFiles" : [ ],
      "size" : 13768
    }, {
      "basename" : "list_files.txt",
      "checksum" : "sha1$e6dd1325bbe3b9501d5adccf150ad8c358d6ca45",
      "class" : "File",
      "dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc",
      "location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/list_files.txt",
      "nameext" : "txt",
      "nameroot" : "list_files",
      "path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/list_files.txt",
      "secondaryFiles" : [ ],
      "size" : 7271
    } ],
    "size" : 1309847
  }, null ]
}
chapmanb commented 7 years ago

Brilliant. That's awesome news, thank you so much for all the work on this. I'm excited to test this out more on real workflows now that it's working. Once you have a release ready with all these updates I'd like to ensure it works cleanly on the validation workflows we're putting together for the GA4GH workflow challenge:

https://github.com/bcbio/bcbio_validation_workflows

Is there a hope of running this directly on CGC in the near term? If not, I can test local runs until that's all in place. I'm excited to have this coming together, thank you again.

StarvingMarvin commented 7 years ago

Bunny release will hopefully be in a day or two. CGC availability is a bit harder to predict, but might be as early as next week, but more likely two weeks.

chapmanb commented 7 years ago

Thanks so much. I created a conda package of the latest 1.0.0-rc4 release and it works cleanly with the bcbio CWL test data. Brilliant. My next step is to run the GA4GH workflow challenge CWL:

https://github.com/bcbio/bcbio_validation_workflows

This is more of a real example so ideally I could run multicore on a single machine. It sounds like I should wait for #231 to do that.

We can close this issue and happy to reopen discussion on a separate one if we run into any problems. Thank you again for all the help.

simonovic86 commented 7 years ago

You're welcome! That's great news. We've started working on #231 and it will be done soon.

bogdang989 commented 7 years ago

Hi @chapmanb , I ran the latest test-bcbio-cwl with bunny and got stuck on "concat_batch_variantcalls" step, most likely due to the same issue with the alphabetical ordering of command-line inputs. Command-line with bunny:

bcbio_nextgen.py \ runfn \ concat_batch_variantcalls \ cwl \ sentinel_runtime=cores,1,ram,2048 \ align_bam=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_alignment_1_s_merge_split_alignments/align/Test1/Test1-sort.bam \ align_bam=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_alignment_2_s_merge_split_alignments/align/Test2/Test2-sort.bam \ analysis=variant2 \ analysis=variant2 \ configalgorithmcallable_regions=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_combine_sample_regions/regions/Test1-analysis_blocks.bed \ configalgorithmcallable_regions=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_combine_sample_regions/regions/Test2-analysis_blocks.bed \ configalgorithmcoverage_interval=regional \ configalgorithmcoverage_interval=regional \ configalgorithmtools_off=gemini \ 'configalgorithmtools_off=gemini;;vqsr' \ configalgorithmtools_on=qualimap_full \ 'configalgorithmtools_on=gvcf;;qualimap_full' \ configalgorithmvalidate=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/7_100326_FC6107FAAXX-grade.vcf \ configalgorithmvalidate=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/7_100326_FC6107FAAXX-grade.vcf \ configalgorithmvalidate_regions=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/variant_regions-bam.bed \ configalgorithmvalidate_regions=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/variant_regions-bam.bed \ configalgorithmvariant_regions=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_prep_samples_1_s/bedprep/variant_regions-bam.bed \ configalgorithmvariant_regions=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_prep_samples_2_s/bedprep/variant_regions-bam.bed \ configalgorithmvariantcaller=freebayes \ configalgorithmvariantcaller=freebayes \ description=Test1 \ description=Test2 \ genome_build=hg19 \ genome_build=hg19 \ genome_resourcesvariationcosmic=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/cosmic-v68-hg19.vcf.gz \ genome_resourcesvariationcosmic=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/cosmic-v68-hg19.vcf.gz \ genome_resourcesvariationdbsnp=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/dbsnp_132.vcf.gz \ genome_resourcesvariationdbsnp=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/dbsnp_132.vcf.gz \ metadatabatch=b1 \ metadatabatch=b1 \ metadataphenotype=tumor \ metadataphenotype=normal \ referencefastabase=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/hg19.fa \ referencefastabase=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/hg19.fa \ referencegenome_context=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/test.bed.gz \ reference__genome_context=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/test2.bed.gz \ referencertg=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/mainIndex \ referencertg=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/mainIndex \ region=chrM:0-1000 \ region=chrM:2000-5000 \ region=chr22:0-14595 \ region=chr22:15068-15500 \ regionscallable=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_postprocess_alignment_1_s/align/Test1/Test1-coverage.callable-vrsubset.bed \ regions__callable=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_postprocess_alignment_2_s/align/Test2/Test2-coverage.callable-vrsubset.bed \ sentinel_parallel=batch-merge \ vrn_file_region=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_variantcall_1_s_variantcall_batch_region_1_s/freebayes/chrM/b1-chrM_0_1000.vcf.gz \ vrn_file_region=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_variantcall_1_s_variantcall_batch_region_2_s/freebayes/chrM/b1-chrM_2000_5000.vcf.gz \ vrn_file_region=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_variantcall_1_s_variantcall_batch_region_3_s/freebayes/chr22/b1-chr22_0_14595.vcf.gz \ vrn_file_region=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_variantcall_1_s_variantcall_batch_region_4_s/freebayes/chr22/b1-chr22_15068_15500.vcf.gz \ sentinel_outputs=vrn_file`

with cwl-tool:

bcbio_nextgen.py \ runfn \ concat_batch_variantcalls \ cwl \ sentinel_runtime=cores,1,ram,2048 \ description=Test1 \ configalgorithmvalidate=/var/lib/cwl/stg0aa7a2c9-86bc-48f3-80c8-71aaff066d42/7_100326_FC6107FAAXX-grade.vcf \ referencefastabase=/var/lib/cwl/stgde1396cd-ac96-47b0-8c17-14b662bcb6b2/hg19.fa \ referencertg=/var/lib/cwl/stg441389be-7263-429a-aed0-634cdd8b0f50/mainIndex \ configalgorithmvariantcaller=freebayes \ configalgorithmcoverage_interval=regional \ metadatabatch=b1 \ metadataphenotype=tumor \ 'reference__genome_context=/var/lib/cwl/stg368108c8-c795-4b86-adc5-85f435283645/test.bed.gz;;/var/lib/cwl/stg9102f634-1906-40d0-8a81-7371745682df/test2.bed.gz' \ configalgorithmvalidate_regions=/var/lib/cwl/stg68f99767-e936-4415-bdb1-db26d043aaa1/variant_regions-bam.bed \ genome_build=hg19 \ configalgorithmtools_off=gemini \ genome_resourcesvariationdbsnp=/var/lib/cwl/stg579ee780-01b8-46c7-8bb6-159380c7f418/dbsnp_132.vcf.gz \ genome_resourcesvariationcosmic=/var/lib/cwl/stgc7a64bcd-12e9-4e7a-adfc-0cb4760733e5/cosmic-v68-hg19.vcf.gz \ analysis=variant2 \ configalgorithmtools_on=qualimap_full \ configalgorithmvariant_regions=/var/lib/cwl/stg5c61a4b3-c1cc-4761-87f8-70e86f32afd5/variant_regions-bam.bed \ align_bam=/var/lib/cwl/stg72ce9a6d-0566-4a97-b0a0-343b5d8fe50e/Test1-sort.bam \ regionscallable=/var/lib/cwl/stg6265737c-b78a-4c92-8bb0-1a8b9949d749/Test1-coverage.callable-vrsubset.bed \ configalgorithmcallable_regions=/var/lib/cwl/stg4ecdacb3-8634-4ea8-b54f-4b51c97d031e/Test1-analysis_blocks.bed \ region=chrM:0-1000 \ vrn_file_region=/var/lib/cwl/stg18ab04ed-ed4b-4ee3-afeb-d36e9fee0f0f/b1-chrM_0_1000.vcf.gz \ sentinel_parallel=batch-merge \ description=Test2 \ configalgorithmvalidate=/var/lib/cwl/stg0aa7a2c9-86bc-48f3-80c8-71aaff066d42/7_100326_FC6107FAAXX-grade.vcf \ referencefastabase=/var/lib/cwl/stgde1396cd-ac96-47b0-8c17-14b662bcb6b2/hg19.fa \ referencertg=/var/lib/cwl/stg441389be-7263-429a-aed0-634cdd8b0f50/mainIndex \ configalgorithmvariantcaller=freebayes \ configalgorithmcoverage_interval=regional \ metadatabatch=b1 \ metadataphenotype=normal \ 'reference__genome_context=/var/lib/cwl/stg368108c8-c795-4b86-adc5-85f435283645/test.bed.gz;;/var/lib/cwl/stg9102f634-1906-40d0-8a81-7371745682df/test2.bed.gz' \ configalgorithmvalidate_regions=/var/lib/cwl/stg68f99767-e936-4415-bdb1-db26d043aaa1/variant_regions-bam.bed \ genome_build=hg19 \ 'configalgorithmtools_off=gemini;;vqsr' \ genome_resourcesvariationdbsnp=/var/lib/cwl/stg579ee780-01b8-46c7-8bb6-159380c7f418/dbsnp_132.vcf.gz \ genome_resourcesvariationcosmic=/var/lib/cwl/stgc7a64bcd-12e9-4e7a-adfc-0cb4760733e5/cosmic-v68-hg19.vcf.gz \ analysis=variant2 \ 'configalgorithmtools_on=gvcf;;qualimap_full' \ configalgorithmvariant_regions=/var/lib/cwl/stge275be18-7913-436e-96e0-4387c780498c/variant_regions-bam.bed \ align_bam=/var/lib/cwl/stg7df1dd46-7f47-4af8-91bb-3c469609f28f/Test2-sort.bam \ regionscallable=/var/lib/cwl/stg2c624fa6-eb50-42ec-925f-e9f511fbbcc8/Test2-coverage.callable-vrsubset.bed \ configalgorithmcallable_regions=/var/lib/cwl/stg4f2b9e52-2372-44e6-8d4d-a0ae21c3d018/Test2-analysis_blocks.bed \ region=chrM:2000-5000 \ vrn_file_region=/var/lib/cwl/stg7b286f5d-8ab5-4a50-b8b2-b2cd429446df/b1-chrM_2000_5000.vcf.gz \ sentinel_outputs=vrn_file \ region=chr22:0-14595 \ vrn_file_region=/var/lib/cwl/stg6e752145-47a7-4137-998f-7003b3232a17/b1-chr22_0_14595.vcf.gz \ region=chr22:15068-15500 \ vrn_file_region=/var/lib/cwl/stg8ce73ab2-44e8-44cc-b2dd-10e1a6a016e3/b1-chr22_15068_15500.vcf.gz

Error log:

Traceback (most recent call last): File "/usr/local/bin/bcbio_nextgen.py", line 219, in runfn.process(kwargs["args"]) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 45, in process fnargs, parallel, out_keys = _world_from_cwl(fnargs[1:], work_dir) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 194, in _world_from_cwl out = _split_groups_finalize_cwl(dict(grouped_keys), data, work_dir, passed_keys, output_cwl_keys, runtime) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 237, in _split_groups_finalize_cwl val = _resolve_null_vals(key, vals, reci, num_recs) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 266, in _resolve_null_vals raise ValueError("Unsure how to resolve uneven values for %s with %s records: %s" % (key, num_recs, vals)) ValueError: Unsure how to resolve uneven values for align_bam with 4 records: ['/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_alignment_1_s_merge_split_alignments/align/Test1/Test1-sort.bam', '/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_alignment_2_s_merge_split_alignments/align/Test2/Test2-sort.bam']

EDIT: From the latest docker image, in runfn.py, in def _resolve_null_vals(key, vals, reci, num_recs): allowed_uneven = set(["summary__qc"]) While on the bcbio-nextgen github page it has: allowed_uneven = set(["concat_batch_variantcalls", "multiqc_summary"])

Hope this helps :)

chapmanb commented 7 years ago

Sorry about the issue, with the latest bcbio test CWL files (https://github.com/bcbio/test_bcbio_cwl) we've moved to using smaller Docker containers instead of the one big one:

https://quay.io/organization/bcbio

and these have a fix for this problem. Are you running with the latest test CWL and Docker or in some other way? Happy to provide what you need to test and run and can rebuild the full container if that helps.

Longer term, I'm looking at switching over to use Luka's InitialWorkDirRequirement idea to avoid this but hopefully the fixed containers get you going in the short term.

bogdang989 commented 7 years ago

Thanks for the quick reply. Yeah, I tried running with the latest version of the test CWL, and also docker images from quay repo.

However, it seems that this fix is not included in the latest quay.io/bcbio/bcbio-vc image, as the files are different than those on https://github.com/chapmanb/bcbio-nextgen/blob/master.

I started to manually update some files in a local image (runfn.py, multiprocess.py, genotype.py) to be as in bcbio-nextgen master, and got a few steps forward EDIT: and got it running :)

chapmanb commented 7 years ago

Brilliant, glad you managed to get it running cleanly. Apologies on the out of date docker containers, I bumped those now so they contain the compatibility fix. Please let me know if you run into anything else at all. Thanks again for looking at this.

bogdang989 commented 7 years ago

Great, thanks a lot! I'll put an update here if something comes up.