Closed StarvingMarvin closed 7 years ago
Luka and Stefan; Thanks much for starting this separate thread. I pushed the full bcbio testing repository here:
https://github.com/bcbio/test_bcbio_cwl
If you clone that and do bash run_bunny.sh
you should be able to replicate the issue without needing bcbio installed. I believe the inputs are correctly nested in the input definition:
and here is the input sample:
Here is the scatter:
and this is the error I get:
org.rabix.engine.processor.handler.EventHandlerException: Port config__algorithm__align_split_size for root.alignment.1.prep_align_inputs and rootId abc9f1f9-e5f4-4efe-9fcc-70d6403ae4c6 is not a list and therefore cannot be scattered.
Happy to provide anything else that would help. Thanks again for the help working through this.
That's definitely a bug in bunny. I've opened a separate issue with a small test case: #97
Thanks much, let me know if I can help with anything else. I'll definitely keep testing once this is squashed, looking forward to getting bcbio working.
Thanks for the latest bunny release, I'm excited to keep moving bcbio CWL support forward. I tested out with my small self contained test case (https://github.com/bcbio/test_bcbio_cwl) which you can run with the run_bunny.sh
shell script. I'm running into a different null pointer error:
[2017-02-15 05:59:33.874] [ERROR] EventProcessor failed to process event OutputUpdateEvent [jobId=root.prep_samples.2, contextId=6509ef2d-1198-499a-a31d-80137c12560c, portId=config__algorithm__variant_regions, value=FileValue [size=65, path=/home/chapmanb/drive/work/cwl/test_bcbio_cwl/bunny_work/main-run_info-cwl-20170215055916194/root/prep_samples/2/bedprep/variant_regions-bam.bed, location=/home/chapmanb/drive/work/cwl/test_bcbio_cwl/bunny_work/main-run_info-cwl-20170215055916194/root/prep_samples/2/bedprep/variant_regions-bam.bed, checksum=sha1$02cb0626caa3c5d7db51b0ff867553d7ff5422d6, secondaryFiles=[], properties={sbg:metadata=null}], fromScatter=false, numberOfScattered=null].
java.lang.NullPointerException: null
at org.rabix.engine.model.VariableRecord.expand(VariableRecord.java:101) ~[rabix-backend-local.jar:na]
at org.rabix.engine.model.VariableRecord.addValue(VariableRecord.java:80) ~[rabix-backend-local.jar:na]
at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:73) ~[rabix-backend-local.jar:na]
at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:30) ~[rabix-backend-local.jar:na]
at org.rabix.engine.processor.dispatcher.impl.SyncEventDispatcher.send(SyncEventDispatcher.java:21) ~[rabix-backend-local.jar:na]
at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:132) ~[rabix-backend-local.jar:na]
at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:55) ~[rabix-backend-local.jar:na]
at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_121]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-backend-local.jar:na]
at com.sun.proxy.$Proxy37.send(Unknown Source) ~[na:na]
at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:198) ~[rabix-backend-local.jar:na]
at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:43) ~[rabix-backend-local.jar:na]
at org.rabix.engine.processor.impl.EventProcessorImpl$1.run(EventProcessorImpl.java:79) ~[rabix-backend-local.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
From the logs all of the launched tasks appear to complete but then there is some issue merging the outputs. Happy to work more on this from the bcbio side with any details about what triggers this. Thank you again for all the work.
Sorry for the late reply, I'm playing around with bcbio for the last couple of days, but I'm having hard time pinpointing the issues. First off, I was having issues running tools because what seems to be command line escaping problems, but I couldn't reproduce it on minimal example. Here is the fork where I manually escaped things in order to even get to your exception: https://github.com/StarvingMarvin/test_bcbio_cwl
Null pointer issue itself was easy to fix, because we already had similar problem on another branch which will be merged to develop soon, but than I hit what seem to be another scattering issue. So... work is still in progress, I'll link to relevant issues when we isolate them / create them.
Luka; Thanks much for looking at this. Apologies about the quoting problems, I just updated the bcbio CWL generation, test data and Docker container to use a non-JSON way of passing these values. This avoids the different quoting between inside and outside of Docker runs. I hope the new version will make it easier to test and evaluate. Let me know if there is anything else I can do on our side to make things run smoother.
As I've explained in a comment on #165, shell quoting was a bit messier to fix than I expected, but should hopefully work more reliable now. I'll start digging other issues next.
Hey, @chapmanb sorry for a bit of delay, we got sidetracked with other issues. Anyway, I've tried to run latest version of bcbio, but it seems to be broken:
File "/usr/local/bin/bcbio_nextgen.py", line 219, in <module>
runfn.process(kwargs["args"])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 28, in process
raise AttributeError("Did not find exposed function in bcbio.distributed.multitasks named '%s'" % args.name)
AttributeError: Did not find exposed function in bcbio.distributed.multitasks named 'alignment_to_rec'
Luka; Thanks so much for continuing to look at this. I'm giving a talk as part of the CGC course in early May and was hoping to be able to demonstrate bcbio working with bunny so definitely happy to help from my side to keep pushing this along.
The CWL has changed recently to support conversion into WDL so will need a sync of the test workflow with bcbio (https://github.com/bcbio/test_bcbio_cwl). Are you running from a Docker container or bcbio locally? From Docker Hub, the latest bcbio/bcbio should have this. If locally you need the development version of bcbio (bcbio_nextgen.py upgrade -u development
). Let me know if you're still stuck and happy to look into it more. Thanks again.
You're right, I had stale docker image. Unfortunately, even with latest image, I haven't progressed much further. I have process_alignment
step within alignment
subworkflow failing with this error:
ValueError: No 'files' specified for input sample: None
Backtracking from there, I think there is an issue with the results of the first step: alignment_to_rec
, where out of 12 records, only some have files
field set:
{
"alignment_rec" : [ {
"config__algorithm__align_split_size" : 25000,
"config__algorithm__aligner" : null,
"config__algorithm__mark_duplicates" : null,
"reference__bowtie2__indexes" : null,
"reference__bwa__indexes" : null,
"reference__novoalign__indexes" : null,
"reference__snap__indexes" : null,
"rgnames__lane" : null,
"rgnames__lb" : null,
"rgnames__pl" : null,
"rgnames__pu" : null,
"rgnames__rg" : null,
"rgnames__sample" : null
}, {
"config__algorithm__align_split_size" : 25000,
"config__algorithm__aligner" : "snap",
"config__algorithm__mark_duplicates" : null,
"reference__bowtie2__indexes" : null,
"reference__bwa__indexes" : null,
"reference__novoalign__indexes" : null,
"reference__snap__indexes" : null,
"rgnames__lane" : null,
"rgnames__lb" : null,
"rgnames__pl" : null,
"rgnames__pu" : null,
"rgnames__rg" : null,
"rgnames__sample" : null
}, {
"config__algorithm__align_split_size" : null,
"config__algorithm__aligner" : "bwa",
"config__algorithm__mark_duplicates" : "True",
"reference__bowtie2__indexes" : null,
"reference__bwa__indexes" : null,
"reference__novoalign__indexes" : null,
"reference__snap__indexes" : null,
"rgnames__lane" : null,
"rgnames__lb" : null,
"rgnames__pl" : null,
"rgnames__pu" : null,
"rgnames__rg" : null,
"rgnames__sample" : null
}, {
"config__algorithm__align_split_size" : null,
"config__algorithm__aligner" : null,
"config__algorithm__mark_duplicates" : "True",
"description" : "Test1",
"reference__bowtie2__indexes" : null,
"reference__bwa__indexes" : null,
"reference__novoalign__indexes" : null,
"reference__snap__indexes" : null,
"rgnames__lane" : null,
"rgnames__lb" : null,
"rgnames__pl" : null,
"rgnames__pu" : null,
"rgnames__rg" : null,
"rgnames__sample" : "Test1"
}, {
"config__algorithm__align_split_size" : null,
"config__algorithm__aligner" : null,
"config__algorithm__mark_duplicates" : null,
"description" : "Test2",
"files" : [ {
"checksum" : "sha1$629a1fcf47dfc625e4d6aed097aedc563c9efdbd",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam",
"secondaryFiles" : [ {
"checksum" : "sha1$1a29d9009bc278ca6fbca7fba0a61061a4a1c1b2",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai",
"size" : 232
} ],
"size" : 3363919
} ],
"reference__bowtie2__indexes" : null,
"reference__bwa__indexes" : null,
"reference__novoalign__indexes" : null,
"reference__snap__indexes" : null,
"rgnames__lane" : null,
"rgnames__lb" : null,
"rgnames__pl" : null,
"rgnames__pu" : null,
"rgnames__rg" : null,
"rgnames__sample" : "Test2"
}, {
"config__algorithm__align_split_size" : null,
"config__algorithm__aligner" : null,
"config__algorithm__mark_duplicates" : null,
"files" : [ {
"checksum" : "sha1$629a1fcf47dfc625e4d6aed097aedc563c9efdbd",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX.bam",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX.bam",
"secondaryFiles" : [ {
"checksum" : "sha1$639d954149591413cdcfba6e887b86c447d34b87",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX.bam.bai",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX.bam.bai",
"size" : 184
} ],
"size" : 3363919
} ],
"reference__bowtie2__indexes" : null,
"reference__bwa__indexes" : {
"checksum" : "sha1$22ec87b283d1a354733eb33eaa632ee04542bbad",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.amb",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.amb",
"secondaryFiles" : [ {
"checksum" : "sha1$6db9e9d10fd79f63c7f4be4f514244052c65b628",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.pac",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.pac",
"size" : 9145
}, {
"checksum" : "sha1$2b6fb1bb71c10ffda852f3e4988cb0bdd24ebf9c",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.ann",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.ann",
"size" : 64
}, {
"checksum" : "sha1$41af3bc76675641360d4817ffc77b99213126938",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.sa",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.sa",
"size" : 18336
}, {
"checksum" : "sha1$c4068635c6b5d33b2e64931ef056c26cca88f660",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.bwt",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.bwt",
"size" : 36664
} ],
"size" : 10
},
"reference__fasta__base" : {
"checksum" : "sha1$e2ca54abb52ba4013b16f3f31d4083b8bf6de054",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa",
"secondaryFiles" : [ {
"checksum" : "sha1$f2e30d7e4f304ffd45ddd3cc26441434df8bf5fe",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai",
"size" : 43
}, {
"checksum" : "sha1$d8584a6cb5bcdc476b4577bf89a25e215ca61449",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict",
"size" : 292
} ],
"size" : 37196
},
"reference__novoalign__indexes" : null,
"reference__snap__indexes" : null,
"rgnames__lane" : null,
"rgnames__lb" : null,
"rgnames__pl" : null,
"rgnames__pu" : null,
"rgnames__rg" : null,
"rgnames__sample" : null
}, {
"config__algorithm__align_split_size" : null,
"config__algorithm__aligner" : null,
"config__algorithm__mark_duplicates" : null,
"reference__bowtie2__indexes" : null,
"reference__bwa__indexes" : null,
"reference__fasta__base" : {
"checksum" : "sha1$e2ca54abb52ba4013b16f3f31d4083b8bf6de054",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa",
"secondaryFiles" : [ {
"checksum" : "sha1$f2e30d7e4f304ffd45ddd3cc26441434df8bf5fe",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai",
"size" : 43
}, {
"checksum" : "sha1$d8584a6cb5bcdc476b4577bf89a25e215ca61449",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict",
"size" : 292
} ],
"size" : 37196
},
"reference__novoalign__indexes" : null,
"reference__snap__indexes" : {
"checksum" : "sha1$a712a1af9110f9f81e5c9063622b799b5e1ca366",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/Genome",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/Genome",
"secondaryFiles" : [ {
"checksum" : "sha1$0644efe12369c8d54d80388ad14576f3da1f3ccd",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/GenomeIndex",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/GenomeIndex",
"size" : 31
}, {
"checksum" : "sha1$2502aab068aa30174da61b7c99728b41b590c47e",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/OverflowTable",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/OverflowTable",
"size" : 2016
}, {
"checksum" : "sha1$2ad35f253c653dca1f08eaf0b2f365e9a8f9f15f",
"class" : "File",
"location" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/GenomeIndexHash",
"path" : "/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/GenomeIndexHash",
"size" : 400544
} ],
"size" : 38101
},
"rgnames__lane" : "Test1",
"rgnames__lb" : null,
"rgnames__pl" : null,
"rgnames__pu" : null,
"rgnames__rg" : null,
"rgnames__sample" : null
}, {
"config__algorithm__align_split_size" : null,
"config__algorithm__aligner" : null,
"config__algorithm__mark_duplicates" : null,
"reference__bowtie2__indexes" : null,
"reference__bwa__indexes" : null,
"reference__novoalign__indexes" : null,
"reference__snap__indexes" : null,
"rgnames__lane" : "Test2",
"rgnames__lb" : null,
"rgnames__pl" : "illumina",
"rgnames__pu" : null,
"rgnames__rg" : null,
"rgnames__sample" : null
}, {
"config__algorithm__align_split_size" : null,
"config__algorithm__aligner" : null,
"config__algorithm__mark_duplicates" : null,
"reference__bowtie2__indexes" : null,
"reference__bwa__indexes" : null,
"reference__novoalign__indexes" : null,
"reference__snap__indexes" : null,
"rgnames__lane" : null,
"rgnames__lb" : null,
"rgnames__pl" : "illumina",
"rgnames__pu" : "Test1",
"rgnames__rg" : null,
"rgnames__sample" : null
}, {
"config__algorithm__align_split_size" : null,
"config__algorithm__aligner" : null,
"config__algorithm__mark_duplicates" : null,
"reference__bowtie2__indexes" : null,
"reference__bwa__indexes" : null,
"reference__novoalign__indexes" : null,
"reference__snap__indexes" : null,
"rgnames__lane" : null,
"rgnames__lb" : null,
"rgnames__pl" : null,
"rgnames__pu" : "Test2",
"rgnames__rg" : "Test1",
"rgnames__sample" : null
}, {
"config__algorithm__align_split_size" : null,
"config__algorithm__aligner" : null,
"config__algorithm__mark_duplicates" : null,
"reference__bowtie2__indexes" : null,
"reference__bwa__indexes" : null,
"reference__novoalign__indexes" : null,
"reference__snap__indexes" : null,
"rgnames__lane" : null,
"rgnames__lb" : null,
"rgnames__pl" : null,
"rgnames__pu" : null,
"rgnames__rg" : "Test2",
"rgnames__sample" : "Test1"
}, {
"config__algorithm__align_split_size" : null,
"config__algorithm__aligner" : null,
"config__algorithm__mark_duplicates" : null,
"reference__bowtie2__indexes" : null,
"reference__bwa__indexes" : null,
"reference__novoalign__indexes" : null,
"reference__snap__indexes" : null,
"rgnames__lane" : null,
"rgnames__lb" : null,
"rgnames__pl" : null,
"rgnames__pu" : null,
"rgnames__rg" : null,
"rgnames__sample" : "Test2"
} ]
}
It seems to me that scattered instances that got file on input propagated it correctly, but I'm not sure why are these values the way they are in the first place.
Luka; Thanks for digging into this further. It looks like bcbio is going wrong there -- it should return two records with all of the attributes rather than this mess of individual records with a single attribute in each. I can't reproduce this with cwltool or Toil runs so there must be something problematic about how bcbio interacts with bunny here. Which branch/release of bunny are you testing this off of? I can try to reproduce to see if I can identify the underlying issue and fix. Thanks again.
I'm running off develop branch.
I'm really confused: there is no scattering, there are no expressions, really not sure what can go wrong. These should be the inputs with which tool is invoked:
{
"config__algorithm__align_split_size" : [ 25000, 25000 ],
"config__algorithm__aligner" : [ "snap", "bwa" ],
"config__algorithm__mark_duplicates" : [ "True", "True" ],
"description" : [ "Test1", "Test2" ],
"files" : [ [ {
"path" : "../testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam",
"name" : "7_100326_FC6107FAAXX.bam",
"dirname" : "../testdata/100326_FC6107FAAXX",
"nameroot" : "7_100326_FC6107FAAXX",
"nameext" : "bam",
"secondaryFiles" : [ {
"class" : "File",
"path" : "../testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai",
"name" : "7_100326_FC6107FAAXX.bam.bai",
"dirname" : "../testdata/100326_FC6107FAAXX",
"nameroot" : "7_100326_FC6107FAAXX.bam",
"nameext" : "bai",
"secondaryFiles" : [ ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
} ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
} ], [ {
"path" : "../testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX.bam",
"name" : "6_100326_FC6107FAAXX.bam",
"dirname" : "../testdata/100326_FC6107FAAXX",
"nameroot" : "6_100326_FC6107FAAXX",
"nameext" : "bam",
"secondaryFiles" : [ {
"class" : "File",
"path" : "../testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX.bam.bai",
"name" : "6_100326_FC6107FAAXX.bam.bai",
"dirname" : "../testdata/100326_FC6107FAAXX",
"nameroot" : "6_100326_FC6107FAAXX.bam",
"nameext" : "bai",
"secondaryFiles" : [ ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
} ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
} ] ],
"reference__bwa__indexes" : [ null, {
"path" : "../testdata/genomes/hg19/bwa/hg19.fa.amb",
"name" : "hg19.fa.amb",
"dirname" : "../testdata/genomes/hg19/bwa",
"nameroot" : "hg19.fa",
"nameext" : "amb",
"secondaryFiles" : [ {
"class" : "File",
"path" : "../testdata/genomes/hg19/bwa/hg19.fa.ann",
"name" : "hg19.fa.ann",
"dirname" : "../testdata/genomes/hg19/bwa",
"nameroot" : "hg19.fa",
"nameext" : "ann",
"secondaryFiles" : [ ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
}, {
"class" : "File",
"path" : "../testdata/genomes/hg19/bwa/hg19.fa.bwt",
"name" : "hg19.fa.bwt",
"dirname" : "../testdata/genomes/hg19/bwa",
"nameroot" : "hg19.fa",
"nameext" : "bwt",
"secondaryFiles" : [ ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
}, {
"class" : "File",
"path" : "../testdata/genomes/hg19/bwa/hg19.fa.pac",
"name" : "hg19.fa.pac",
"dirname" : "../testdata/genomes/hg19/bwa",
"nameroot" : "hg19.fa",
"nameext" : "pac",
"secondaryFiles" : [ ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
}, {
"class" : "File",
"path" : "../testdata/genomes/hg19/bwa/hg19.fa.sa",
"name" : "hg19.fa.sa",
"dirname" : "../testdata/genomes/hg19/bwa",
"nameroot" : "hg19.fa",
"nameext" : "sa",
"secondaryFiles" : [ ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
} ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
} ],
"reference__fasta__base" : [ {
"path" : "../testdata/genomes/hg19/seq/hg19.fa",
"name" : "hg19.fa",
"dirname" : "../testdata/genomes/hg19/seq",
"nameroot" : "hg19",
"nameext" : "fa",
"secondaryFiles" : [ {
"class" : "File",
"path" : "../testdata/genomes/hg19/seq/hg19.fa.fai",
"name" : "hg19.fa.fai",
"dirname" : "../testdata/genomes/hg19/seq",
"nameroot" : "hg19.fa",
"nameext" : "fai",
"secondaryFiles" : [ ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
}, {
"class" : "File",
"path" : "../testdata/genomes/hg19/seq/hg19.dict",
"name" : "hg19.dict",
"dirname" : "../testdata/genomes/hg19/seq",
"nameroot" : "hg19",
"nameext" : "dict",
"secondaryFiles" : [ ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
} ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
}, {
"path" : "../testdata/genomes/hg19/seq/hg19.fa",
"name" : "hg19.fa",
"dirname" : "../testdata/genomes/hg19/seq",
"nameroot" : "hg19",
"nameext" : "fa",
"secondaryFiles" : [ {
"class" : "File",
"path" : "../testdata/genomes/hg19/seq/hg19.fa.fai",
"name" : "hg19.fa.fai",
"dirname" : "../testdata/genomes/hg19/seq",
"nameroot" : "hg19.fa",
"nameext" : "fai",
"secondaryFiles" : [ ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
}, {
"class" : "File",
"path" : "../testdata/genomes/hg19/seq/hg19.dict",
"name" : "hg19.dict",
"dirname" : "../testdata/genomes/hg19/seq",
"nameroot" : "hg19",
"nameext" : "dict",
"secondaryFiles" : [ ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
} ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
} ],
"reference__snap__indexes" : [ {
"path" : "../testdata/genomes/hg19/snap/Genome",
"name" : "Genome",
"dirname" : "../testdata/genomes/hg19/snap",
"nameroot" : "Genome",
"secondaryFiles" : [ {
"class" : "File",
"path" : "../testdata/genomes/hg19/snap/GenomeIndex",
"name" : "GenomeIndex",
"dirname" : "../testdata/genomes/hg19/snap",
"nameroot" : "GenomeIndex",
"secondaryFiles" : [ ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
}, {
"class" : "File",
"path" : "../testdata/genomes/hg19/snap/GenomeIndexHash",
"name" : "GenomeIndexHash",
"dirname" : "../testdata/genomes/hg19/snap",
"nameroot" : "GenomeIndexHash",
"secondaryFiles" : [ ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
}, {
"class" : "File",
"path" : "../testdata/genomes/hg19/snap/OverflowTable",
"name" : "OverflowTable",
"dirname" : "../testdata/genomes/hg19/snap",
"nameroot" : "OverflowTable",
"secondaryFiles" : [ ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
} ],
"properties" : {
"sbg:metadata" : null
},
"class" : "File"
}, null ],
"rgnames__lane" : [ "Test1", "Test2" ],
"rgnames__lb" : [ null, null ],
"rgnames__pl" : [ "illumina", "illumina" ],
"rgnames__pu" : [ "Test1", "Test2" ],
"rgnames__rg" : [ "Test1", "Test2" ],
"rgnames__sample" : [ "Test1", "Test2" ],
"sentinel_outputs" : "alignment_rec",
"sentinel_parallel" : "multi-combined"
}
and this is the resulting command line:
bcbio_nextgen.py runfn alignment_to_rec cwl sentinel_runtime=cores,2,ram,4096 \
config__algorithm__align_split_size=25000 \
config__algorithm__align_split_size=25000 \
config__algorithm__aligner=snap \
config__algorithm__aligner=bwa \
config__algorithm__mark_duplicates=True \
config__algorithm__mark_duplicates=True \
description=Test1 description=Test2 \
files=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam \
files=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/100326_FC6107FAAXX/6_100326_FC6107FAAXX.bam \
reference__bwa__indexes=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/bwa/hg19.fa.amb \
reference__fasta__base=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa \
reference__fasta__base=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa \
reference__snap__indexes=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/snap/Genome \
rgnames__lane=Test1 rgnames__lane=Test2 \
rgnames__pl=illumina rgnames__pl=illumina \
rgnames__pu=Test1 rgnames__pu=Test2 \
rgnames__rg=Test1 rgnames__rg=Test2 \
rgnames__sample=Test1 rgnames__sample=Test2 \
sentinel_parallel=multi-combined sentinel_outputs=alignment_rec
actually, looking at the command line, it seems that additional files got lost, bacause nested inputBinding got ignored. Will explore further tomorrow.
Here is command line generatod by cwltool:
bcbio_nextgen.py \
runfn \
alignment_to_rec \
cwl \
sentinel_runtime=cores,2,ram,4096 \
files=/var/lib/cwl/stg93d6aba6-4bef-4488-ad46-4a70cd2baf60/7_100326_FC6107FAAXX.bam \
config__algorithm__align_split_size=25000 \
reference__fasta__base=/var/lib/cwl/stgba4d1d78-f65b-4b98-96a2-84e57bba5b7e/hg19.fa \
rgnames__pl=illumina \
rgnames__sample=Test1 \
rgnames__pu=Test1 \
rgnames__lane=Test1 \
rgnames__rg=Test1 \
reference__snap__indexes=/var/lib/cwl/stg30f423b1-d0df-4707-bcc6-16ae120bbc96/Genome \
config__algorithm__aligner=snap \
config__algorithm__mark_duplicates=True \
description=Test1 \
sentinel_parallel=multi-combined \
files=/var/lib/cwl/stgd8f8c10c-3a42-433d-8a7f-9885b1ce1417/6_100326_FC6107FAAXX.bam \
config__algorithm__align_split_size=25000 \
reference__fasta__base=/var/lib/cwl/stgba4d1d78-f65b-4b98-96a2-84e57bba5b7e/hg19.fa \
rgnames__pl=illumina \
rgnames__sample=Test2 \
rgnames__pu=Test2 \
rgnames__lane=Test2 \
rgnames__rg=Test2 \
reference__bwa__indexes=/var/lib/cwl/stgd90e6c3c-5c4f-4059-9042-ce955fb07023/hg19.fa.amb \
config__algorithm__aligner=bwa \
config__algorithm__mark_duplicates=True \
description=Test2 \
sentinel_outputs=alignment_rec
Only difference I can tell is ordering...
That's actually probably it. cwltool has "remove implicit 0 position prior to sorting" algorithm. Bunny treats all outer bindings (on list level, not item level) as position: 0 and sorts them alphabetically.
Luka;
Thanks for digging. I got the development branch built and can replicate the problem. You've got it: the issue is with bcbio in that for multi-record inputs like this we're implicitly assuming a different argument order for arguments present multiple times. cwltool/Toil generate all of one set of arguments for an input, then generate the next set of arguments. In contrast, bunny generates these as groups (all of the config__algorithm__align_split_size
, then all of the config__algorithm__aligner
) so we're not splitting them into two records right on the back side.
It seems like both methods are valid approaches, although bunny here is being a bit more lax, since there is an explicit position
specification in the tool description that the alphabetical ordering ignores.
I could work on bcbio supporting the alphabetical approach, although it does get complicated this way for things like reference__bwa__indices
which are only in a single sample and null in the other so do not get included on the command line.
But that's the thing: position isn't property of an input, it's a property of an element of the list, because the input binding is nested inside a type
field.
Couple of questions: If you need array of records, why don't you provide it as an input, but instead reconstruct it from lists. If you really want to provide it as lists, you could transform it through relatively simple javascript expression and sidestep any command line parsing issues. Finally, if you want to do it from python, you can just dump input job as json and consume it from your script:
requirements:
InlineJavascriptRequirement: {}
InitialWorkDirRequirement:
listing:
- entryname: inputs.json
entry: $(JSON.stringify(inputs))
- entryname: runtime.json
entry: $(JSON.stringify(runtime))
instead of, again, messing around with argument parsing
Luka;
Thanks for all the thoughts on this. Practically I pushed a fix to bcbio and have a new bcbio/bcbio Docker container that will avoid the problem. If you update, it gets past that step with the devel
branch but then fails at a subsequent scatter step and I'm not able to diagnose what goes wrong.
More generally, I'm open to new ways for representing this as the current setup is problematic with null inputs when some records have null and others don't. Then we don't have a good way to know which sample the non-null records get assigned to. I'm thinking of converting nulls into empty strings to work around that.
The JSON dumping idea is really cool but I've been trying to build up a workflow representation that we can convert to WDL for portability. As soon as I venture into JSON manipulation it makes it hard to do this. Is there a way to replicate the JSON dumping in WDL?
Thanks again for all this help and discussion.
@chapmanb it doesn't even fail for me, it just gets stuck... will investigate it further. Unfortunately I'm not acquainted enough with WDL, to give you any suggestions in that regard.
We are getting close, but now pipeline summary is failing.
bcbio_nextgen.py runfn pipeline_summary cwl \
sentinel_runtime=cores,2,ram,4096 \
sentinel_parallel=multi-parallel \
sentinel_outputs=summary__qc,summary__metrics \
description=Test2 \
reference__fasta__base=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa \
config__algorithm__coverage_interval=regional genome_build=hg19 \
config__algorithm__coverage=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/run_info-cwl-workflow/2f5b7a00-9a6a-44e3-8cda-6f2698207e2e/root/prep_samples/2/bedprep/cov-coverage_transcripts-bam.bed \
'config__algorithm__tools_off=gemini;;vqsr' \
config__algorithm__qc=variants \
analysis=variant2 \
'config__algorithm__tools_on=gvcf;;qualimap_full' \
config__algorithm__variant_regions=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/run_info-cwl-workflow/2f5b7a00-9a6a-44e3-8cda-6f2698207e2e/root/prep_samples/2/bedprep/variant_regions-bam.bed \
align_bam=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/run_info-cwl-workflow/2f5b7a00-9a6a-44e3-8cda-6f2698207e2e/root/alignment/2/merge_split_alignments/align/Test2/Test2-sort.bam \
config__algorithm__variant_regions_merged=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/run_info-cwl-workflow/2f5b7a00-9a6a-44e3-8cda-6f2698207e2e/root/prep_samples/2/bedprep/variant_regions-bam-merged.bed \
config__algorithm__coverage_merged=/home/luka/devel/bunny/test-cases/test_bcbio_cwl/run_info-cwl-workflow/2f5b7a00-9a6a-44e3-8cda-6f2698207e2e/root/prep_samples/2/bedprep/cov-coverage_transcripts-bam-merged.bed
[2017-04-10T13:46Z] b0031376c44b: QC: Test2 v, a, r, i, a, n, t, s
[2017-04-10T13:46Z] b0031376c44b: Uncaught exception occurred
Traceback (most recent call last):
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 53, in process
out = fn(fnargs)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 51, in wrapper
return apply(f, *args, **kwargs)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 181, in pipeline_summary
return qcsummary.pipeline_summary(*args)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/qcsummary.py", line 66, in pipeline_summary
data["summary"] = _run_qc_tools(work_bam, data)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/qcsummary.py", line 134, in _run_qc_tools
qc_fn = tools[program_name]
KeyError: 'v'
Traceback (most recent call last):
File "/usr/local/bin/bcbio_nextgen.py", line 219, in <module>
runfn.process(kwargs["args"])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 53, in process
out = fn(fnargs)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 51, in wrapper
return apply(f, *args, **kwargs)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 181, in pipeline_summary
return qcsummary.pipeline_summary(*args)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/qcsummary.py", line 66, in pipeline_summary
data["summary"] = _run_qc_tools(work_bam, data)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/qcsummary.py", line 134, in _run_qc_tools
qc_fn = tools[program_name]
KeyError: 'v'
Here is for reference command line from cwltool:
bcbio_nextgen.py \
runfn \
pipeline_summary \
cwl \
sentinel_runtime=cores,2,ram,4096 \
sentinel_parallel=multi-parallel \
sentinel_outputs=summary__qc,summary__metrics \
description=Test2 \
reference__fasta__base=/var/lib/cwl/stg96269ba9-0025-4296-b9fb-ebe202cfc45d/hg19.fa \
config__algorithm__coverage_interval=regional \
genome_build=hg19 \
config__algorithm__coverage=None \
'config__algorithm__tools_off=[u'"'"'gemini'"'"', u'"'"'vqsr'"'"']' \
'config__algorithm__qc=[u'"'"'fastqc'"'"']' \
analysis=variant2 \
'config__algorithm__tools_on=[u'"'"'gvcf'"'"', u'"'"'qualimap_full'"'"']' \
config__algorithm__variant_regions=/var/lib/cwl/stgd07f7077-2537-47b5-ab08-525c41f9bcf2/variant_regions-bam.bed \
align_bam=/var/lib/cwl/stg3d328270-d728-4d4e-bd4b-9dc262c59227/Test2-sort.bam \
config__algorithm__variant_regions_merged=/var/lib/cwl/stgc7e6a862-99d0-4cfe-93d5-d20fe86550db/variant_regions-bam-merged.bed \
config__algorithm__coverage_merged=None
(this tool is scattered to 15 sub-jobs, so these parameters might not be from the exact same invocation, but they kinda look they are)
What bothers me is that cwltool seem to dump serialized python lists on the command line and it works, where rabix actually joins strings as requested in bindings, yet it's rabix that fails. It seems to me that there are complementary bugs in workflow or cwltool or both that make things pass.
Bunny code that runs this far is on bug/cwl-links-processing branch.
Luka;
Thanks for the update, that's brilliant progress. Apologies, this was a bug in bcbio since it should be checking for the config__algorithm__qc
parameter to be either a list or single item and doing the right thing with the single item. I pushed a fix for this so if you update to the latest Docker instance (or development version), it should not work correctly.
Regarding the larger question of how to represent these, they are a bit confusing since they're nested lists of lists. I think the bunny behavior of separating with ;;
is right rather than dumping json, but I have shims in bcbio to handle both cases:
https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/distributed/runfn.py#L289
The cwltool version worked because it was a list of a single item but I should have been treating your single item the same way.
Hope this gets it running cleanly now. Let me know if you run into anything else at all.
@chapmanb
Now it's fixed. Thanks! multiqc_summary app is failing now. This is the stacktrace:
Traceback (most recent call last):
File "/usr/local/bin/bcbio_nextgen.py", line 219, in <module>
runfn.process(kwargs["args"])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 45, in process
fnargs, parallel, out_keys = _world_from_cwl(fnargs[1:], work_dir)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 194, in _world_from_cwl
out = _split_groups_finalize_cwl(dict(grouped_keys), data, work_dir, passed_keys, output_cwl_keys, runtime)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 237, in _split_groups_finalize_cwl
val = _resolve_null_vals(key, vals, reci, num_recs)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 258, in _resolve_null_vals
raise ValueError("Unsure how to resolve uneven values for %s: %s" % (key, vals))
ValueError: Unsure how to resolve uneven values for summary__qc: ['/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/1/qc/Test1/fastqc/fastqc_report.html', '/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/2/qc/Test1/qualimap/genome_results.txt', '/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/3/qc/Test1/samtools/Test1.txt', '/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/5/qc/Test1/coverage/Test1_bcbio_coverage.txt', '/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/9/qc/Test2/fastqc/fastqc_report.html', '/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/10/qc/Test2/qualimap/genome_results.txt', '/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/11/qc/Test2/samtools/Test2.txt', '/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a0b191b4-f858-4f7c-a0d8-eecd2cf3bb77/root/pipeline_summary/13/qc/Test2/coverage/Test2_bcbio_coverage.txt']
I'll investigate further.
@chapmanb It is possible that there is also a bug in bunny, and that ;;
join is on the wrong level considering list is nested, but then tools should probably be changed to specify both how inner and outer lists are joined (or not joined for that matter). It's possible that because inner list doesn't have an input binding, cwltool does something like str(value)
and gets serialization of python value.
Difference between command line that cwltool and rabix provide (other than previous two issues with order and lists) is that 8th summary__qc
parameter with cwltool is "variant_regions-bam-merged-padded.bed" but with bunny it's "Test2_bcbio_coverage.txt". It's 225 parameters long, so not sure if there are other differences.
Luka and Janko; Thanks for digging into this. This is another tricky case where we have multiple records to collapse together. I don't think bcbio is handling this in the best way, but I pushed a fix to avoid the issue and keep us moving and will think more about how best to handle this longer term. With this in place I now get stuck with a scatter error:
[2017-04-11 06:42:10.764] [ERROR] EventProcessor failed to process event JobStatusEvent [jobId=root.variantcall.2.variantcall_batch_region.4, state=COMPLETED, contextId=f478b821-02fe-49d0-8bda-5e5d64e38fb7, result={region=chr22:15068-15500, vrn_file_region=FileValue [size=1770, path=/home/chapmanb/drive/work/cwl/test_bcbio_cwl/bunny_work/f478b821-02fe-49d0-8bda-5e5d64e38fb7/root/variantcall/2/variantcall_batch_region/4/vardict/chr22/b1-chr22_15068_15500.vcf.gz, location=/home/chapmanb/drive/work/cwl/test_bcbio_cwl/bunny_work/f478b821-02fe-49d0-8bda-5e5d64e38fb7/root/variantcall/2/variantcall_batch_region/4/vardict/chr22/b1-chr22_15068_15500.vcf.gz, checksum=sha1$20f84394ceed675bee3b8910c32a03aeacba006e, secondaryFiles=[FileValue [size=72, path=/home/chapmanb/drive/work/cwl/test_bcbio_cwl/bunny_work/f478b821-02fe-49d0-8bda-5e5d64e38fb7/root/variantcall/2/variantcall_batch_region/4/vardict/chr22/b1-chr22_15068_15500.vcf.gz.tbi, location=/home/chapmanb/drive/work/cwl/test_bcbio_cwl/bunny_work/f478b821-02fe-49d0-8bda-5e5d64e38fb7/root/variantcall/2/variantcall_batch_region/4/vardict/chr22/b1-chr22_15068_15500.vcf.gz.tbi, checksum=sha1$27a0f22e02be99c928cf13e818eb7523636c5929, secondaryFiles=[], properties={sbg:metadata=null}]], properties={sbg:metadata=null}]}].
org.rabix.engine.repository.TransactionHelper$TransactionException: org.rabix.engine.processor.handler.EventHandlerException: Port region for root.variantcall.2.variantcall_batch_region and rootId f478b821-02fe-49d0-8bda-5e5d64e38fb7 is not a list and therefore cannot be scattered.
at org.rabix.engine.processor.impl.EventProcessorImpl.handle(EventProcessorImpl.java:144) ~[rabix-backend-local-1.0.0-rc3.jar:na]
at org.rabix.engine.processor.impl.EventProcessorImpl.access$300(EventProcessorImpl.java:38) ~[rabix-backend-local-1.0.0-rc3.jar:na]
at org.rabix.engine.processor.impl.EventProcessorImpl$1$1.call(EventProcessorImpl.java:93) ~[rabix-backend-local-1.0.0-rc3.jar:na]
at org.rabix.engine.processor.impl.EventProcessorImpl$1$1.call(EventProcessorImpl.java:90) ~[rabix-backend-local-1.0.0-rc3.jar:na]
at org.rabix.engine.memory.InMemoryRepositoryRegistry.doInTransaction(InMemoryRepositoryRegistry.java:82) ~[rabix-backend-local-1.0.0-rc3.jar:na]
at org.rabix.engine.processor.impl.EventProcessorImpl$1.run(EventProcessorImpl.java:90) ~[rabix-backend-local-1.0.0-rc3.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
Caused by: org.rabix.engine.processor.handler.EventHandlerException: Port region for root.variantcall.2.variantcall_batch_region and rootId f478b821-02fe-49d0-8bda-5e5d64e38fb7 is not a list and therefore cannot be scattered.
at org.rabix.engine.processor.handler.impl.ScatterHandler.scatterPort(ScatterHandler.java:98) ~[rabix-backend-local-1.0.0-rc3.jar:na]
at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:83) ~[rabix-backend-local-1.0.0-rc3.jar:na]
at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:30) ~[rabix-backend-local-1.0.0-rc3.jar:na]
at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:175) ~[rabix-backend-local-1.0.0-rc3.jar:na]
at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:54) ~[rabix-backend-local-1.0.0-rc3.jar:na]
at sun.reflect.GeneratedMethodAccessor79.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_121]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-backend-local-1.0.0-rc3.jar:na]
at com.sun.proxy.$Proxy37.send(Unknown Source) ~[na:na]
at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:198) ~[rabix-backend-local-1.0.0-rc3.jar:na]
at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:41) ~[rabix-backend-local-1.0.0-rc3.jar:na]
at org.rabix.engine.processor.impl.EventProcessorImpl.handle(EventProcessorImpl.java:142) ~[rabix-backend-local-1.0.0-rc3.jar:na]
... 8 common frames omitted
Thanks for all the help here, excited to be making so much progress.
Sorry I meant to mention that the fix is pushed to the latest bcbio/bcbio docker container so updating should get you past the multiqc problem. Thanks again.
Thanks for the feedback! We've fixed that and now the whole workflow is passing. These are the results:
{
"align_bam" : [ {
"basename" : "Test1-sort.bam",
"checksum" : "sha1$a3c2c1daca6b598a406a1a07a2d5377eae4695e6",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/1/merge_split_alignments/align/Test1",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/1/merge_split_alignments/align/Test1/Test1-sort.bam",
"nameext" : "bam",
"nameroot" : "Test1-sort",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/1/merge_split_alignments/align/Test1/Test1-sort.bam",
"secondaryFiles" : [ {
"basename" : "Test1-sort.bam.bai",
"checksum" : "sha1$f2c5b96bef1cfe90c45d666567e0ef4f0adc1388",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/1/merge_split_alignments/align/Test1",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/1/merge_split_alignments/align/Test1/Test1-sort.bam.bai",
"nameext" : "bai",
"nameroot" : "Test1-sort.bam",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/1/merge_split_alignments/align/Test1/Test1-sort.bam.bai",
"secondaryFiles" : [ ],
"size" : 232
} ],
"size" : 4056894
}, {
"basename" : "Test2-sort.bam",
"checksum" : "sha1$093662882cd986a4f54ced68d5ef61a84e64c8bc",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/2/merge_split_alignments/align/Test2",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/2/merge_split_alignments/align/Test2/Test2-sort.bam",
"nameext" : "bam",
"nameroot" : "Test2-sort",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/2/merge_split_alignments/align/Test2/Test2-sort.bam",
"secondaryFiles" : [ {
"basename" : "Test2-sort.bam.bai",
"checksum" : "sha1$fb55b6066798c62e45a04f04991eb96caf1adf6a",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/2/merge_split_alignments/align/Test2",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/2/merge_split_alignments/align/Test2/Test2-sort.bam.bai",
"nameext" : "bai",
"nameroot" : "Test2-sort.bam",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/alignment/2/merge_split_alignments/align/Test2/Test2-sort.bam.bai",
"secondaryFiles" : [ ],
"size" : 232
} ],
"size" : 3882031
} ],
"summary__multiqc" : [ {
"basename" : "multiqc_report.html",
"checksum" : "sha1$e66e6dfaa2e84b05fd66578c4a944457f803120f",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_report.html",
"nameext" : "html",
"nameroot" : "multiqc_report",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_report.html",
"secondaryFiles" : [ {
"basename" : "multiqc_bcbio_metrics.txt",
"checksum" : "sha1$6061af0d6bdab4dc25c495796b1919a8dd0b9de4",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_bcbio_metrics.txt",
"nameext" : "txt",
"nameroot" : "multiqc_bcbio_metrics",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_bcbio_metrics.txt",
"secondaryFiles" : [ ],
"size" : 582
}, {
"basename" : "multiqc_fastqc.txt",
"checksum" : "sha1$6df234655ec306e4145e15efcb7f23714f2c7226",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_fastqc.txt",
"nameext" : "txt",
"nameroot" : "multiqc_fastqc",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_fastqc.txt",
"secondaryFiles" : [ ],
"size" : 736
}, {
"basename" : "multiqc_general_stats.txt",
"checksum" : "sha1$9421cf4a00698c5021204d9c78394e11bc647f6a",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_general_stats.txt",
"nameext" : "txt",
"nameroot" : "multiqc_general_stats",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_general_stats.txt",
"secondaryFiles" : [ ],
"size" : 1234
}, {
"basename" : "multiqc_samtools_stats.txt",
"checksum" : "sha1$9e44459b7d4d77019724881fab09225067070ce4",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_samtools_stats.txt",
"nameext" : "txt",
"nameroot" : "multiqc_samtools_stats",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_samtools_stats.txt",
"secondaryFiles" : [ ],
"size" : 1352
}, {
"basename" : "multiqc_sources.txt",
"checksum" : "sha1$9d4be4722cc6f1b680403139b96975c6ace52561",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_sources.txt",
"nameext" : "txt",
"nameroot" : "multiqc_sources",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/multiqc_sources.txt",
"secondaryFiles" : [ ],
"size" : 2884
}, {
"basename" : "seqbuster_isomirs.txt",
"checksum" : "sha1$ce200716f65cb44584bc8423b82cc3ad2fb13e3f",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/seqbuster_isomirs.txt",
"nameext" : "txt",
"nameroot" : "seqbuster_isomirs",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/seqbuster_isomirs.txt",
"secondaryFiles" : [ ],
"size" : 7
}, {
"basename" : "seqbuster_mirs.txt",
"checksum" : "sha1$ce200716f65cb44584bc8423b82cc3ad2fb13e3f",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/seqbuster_mirs.txt",
"nameext" : "txt",
"nameroot" : "seqbuster_mirs",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/multiqc_data/seqbuster_mirs.txt",
"secondaryFiles" : [ ],
"size" : 7
}, {
"basename" : "Test1_bcbio.txt",
"checksum" : "sha1$1db1ae7ddb10ed15e74df868fafafdda5e78003e",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics/Test1_bcbio.txt",
"nameext" : "txt",
"nameroot" : "Test1_bcbio",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics/Test1_bcbio.txt",
"secondaryFiles" : [ ],
"size" : 505
}, {
"basename" : "Test2_bcbio.txt",
"checksum" : "sha1$ab6d1bf16a8ea02343d6178ad0ae2f521fff5785",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics/Test2_bcbio.txt",
"nameext" : "txt",
"nameroot" : "Test2_bcbio",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics/Test2_bcbio.txt",
"secondaryFiles" : [ ],
"size" : 507
}, {
"basename" : "target_info.yaml",
"checksum" : "sha1$040f44a4b8e907955769d261cb8c33bc3d2373c8",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics/target_info.yaml",
"nameext" : "yaml",
"nameroot" : "target_info",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/metrics/target_info.yaml",
"secondaryFiles" : [ ],
"size" : 114
}, {
"basename" : "qc-coverage-report-run.R",
"checksum" : "sha1$29a43161623d09d080ab1216934be40969c4373d",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/qc-coverage-report-run.R",
"nameext" : "R",
"nameroot" : "qc-coverage-report-run",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/qc-coverage-report-run.R",
"secondaryFiles" : [ ],
"size" : 180
}, {
"basename" : "report-ready.Rmd",
"checksum" : "sha1$591462c9e329f1223ed339bc815f60d0f7a5e019",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/report-ready.Rmd",
"nameext" : "Rmd",
"nameroot" : "report-ready",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/report/report-ready.Rmd",
"secondaryFiles" : [ ],
"size" : 13768
}, {
"basename" : "list_files.txt",
"checksum" : "sha1$e6dd1325bbe3b9501d5adccf150ad8c358d6ca45",
"class" : "File",
"dirname" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc",
"location" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/list_files.txt",
"nameext" : "txt",
"nameroot" : "list_files",
"path" : "/Users/janko/Desktop/test_bcbio_cwl/run_info-cwl-workflow/a71a0b95-6d01-4000-a182-514859096a97/root/multiqc_summary/qc/multiqc/list_files.txt",
"secondaryFiles" : [ ],
"size" : 7271
} ],
"size" : 1309847
}, null ]
}
Brilliant. That's awesome news, thank you so much for all the work on this. I'm excited to test this out more on real workflows now that it's working. Once you have a release ready with all these updates I'd like to ensure it works cleanly on the validation workflows we're putting together for the GA4GH workflow challenge:
https://github.com/bcbio/bcbio_validation_workflows
Is there a hope of running this directly on CGC in the near term? If not, I can test local runs until that's all in place. I'm excited to have this coming together, thank you again.
Bunny release will hopefully be in a day or two. CGC availability is a bit harder to predict, but might be as early as next week, but more likely two weeks.
Thanks so much. I created a conda package of the latest 1.0.0-rc4 release and it works cleanly with the bcbio CWL test data. Brilliant. My next step is to run the GA4GH workflow challenge CWL:
https://github.com/bcbio/bcbio_validation_workflows
This is more of a real example so ideally I could run multicore on a single machine. It sounds like I should wait for #231 to do that.
We can close this issue and happy to reopen discussion on a separate one if we run into any problems. Thank you again for all the help.
You're welcome! That's great news. We've started working on #231 and it will be done soon.
Hi @chapmanb , I ran the latest test-bcbio-cwl with bunny and got stuck on "concat_batch_variantcalls" step, most likely due to the same issue with the alphabetical ordering of command-line inputs. Command-line with bunny:
bcbio_nextgen.py \ runfn \ concat_batch_variantcalls \ cwl \ sentinel_runtime=cores,1,ram,2048 \ align_bam=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_alignment_1_s_merge_split_alignments/align/Test1/Test1-sort.bam \ align_bam=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_alignment_2_s_merge_split_alignments/align/Test2/Test2-sort.bam \ analysis=variant2 \ analysis=variant2 \ configalgorithmcallable_regions=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_combine_sample_regions/regions/Test1-analysis_blocks.bed \ configalgorithmcallable_regions=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_combine_sample_regions/regions/Test2-analysis_blocks.bed \ configalgorithmcoverage_interval=regional \ configalgorithmcoverage_interval=regional \ configalgorithmtools_off=gemini \ 'configalgorithmtools_off=gemini;;vqsr' \ configalgorithmtools_on=qualimap_full \ 'configalgorithmtools_on=gvcf;;qualimap_full' \ configalgorithmvalidate=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/7_100326_FC6107FAAXX-grade.vcf \ configalgorithmvalidate=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/7_100326_FC6107FAAXX-grade.vcf \ configalgorithmvalidate_regions=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/variant_regions-bam.bed \ configalgorithmvalidate_regions=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/variant_regions-bam.bed \ configalgorithmvariant_regions=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_prep_samples_1_s/bedprep/variant_regions-bam.bed \ configalgorithmvariant_regions=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_prep_samples_2_s/bedprep/variant_regions-bam.bed \ configalgorithmvariantcaller=freebayes \ configalgorithmvariantcaller=freebayes \ description=Test1 \ description=Test2 \ genome_build=hg19 \ genome_build=hg19 \ genome_resourcesvariationcosmic=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/cosmic-v68-hg19.vcf.gz \ genome_resourcesvariationcosmic=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/cosmic-v68-hg19.vcf.gz \ genome_resourcesvariationdbsnp=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/dbsnp_132.vcf.gz \ genome_resourcesvariationdbsnp=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/dbsnp_132.vcf.gz \ metadatabatch=b1 \ metadatabatch=b1 \ metadataphenotype=tumor \ metadataphenotype=normal \ referencefastabase=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/hg19.fa \ referencefastabase=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/hg19.fa \ referencegenome_context=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/test.bed.gz \ reference__genome_context=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/test2.bed.gz \ referencertg=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/mainIndex \ referencertg=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/mainIndex \ region=chrM:0-1000 \ region=chrM:2000-5000 \ region=chr22:0-14595 \ region=chr22:15068-15500 \ regionscallable=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_postprocess_alignment_1_s/align/Test1/Test1-coverage.callable-vrsubset.bed \ regions__callable=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_postprocess_alignment_2_s/align/Test2/Test2-coverage.callable-vrsubset.bed \ sentinel_parallel=batch-merge \ vrn_file_region=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_variantcall_1_s_variantcall_batch_region_1_s/freebayes/chrM/b1-chrM_0_1000.vcf.gz \ vrn_file_region=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_variantcall_1_s_variantcall_batch_region_2_s/freebayes/chrM/b1-chrM_2000_5000.vcf.gz \ vrn_file_region=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_variantcall_1_s_variantcall_batch_region_3_s/freebayes/chr22/b1-chr22_0_14595.vcf.gz \ vrn_file_region=/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_variantcall_1_s_variantcall_batch_region_4_s/freebayes/chr22/b1-chr22_15068_15500.vcf.gz \ sentinel_outputs=vrn_file`
with cwl-tool:
bcbio_nextgen.py \ runfn \ concat_batch_variantcalls \ cwl \ sentinel_runtime=cores,1,ram,2048 \ description=Test1 \ configalgorithmvalidate=/var/lib/cwl/stg0aa7a2c9-86bc-48f3-80c8-71aaff066d42/7_100326_FC6107FAAXX-grade.vcf \ referencefastabase=/var/lib/cwl/stgde1396cd-ac96-47b0-8c17-14b662bcb6b2/hg19.fa \ referencertg=/var/lib/cwl/stg441389be-7263-429a-aed0-634cdd8b0f50/mainIndex \ configalgorithmvariantcaller=freebayes \ configalgorithmcoverage_interval=regional \ metadatabatch=b1 \ metadataphenotype=tumor \ 'reference__genome_context=/var/lib/cwl/stg368108c8-c795-4b86-adc5-85f435283645/test.bed.gz;;/var/lib/cwl/stg9102f634-1906-40d0-8a81-7371745682df/test2.bed.gz' \ configalgorithmvalidate_regions=/var/lib/cwl/stg68f99767-e936-4415-bdb1-db26d043aaa1/variant_regions-bam.bed \ genome_build=hg19 \ configalgorithmtools_off=gemini \ genome_resourcesvariationdbsnp=/var/lib/cwl/stg579ee780-01b8-46c7-8bb6-159380c7f418/dbsnp_132.vcf.gz \ genome_resourcesvariationcosmic=/var/lib/cwl/stgc7a64bcd-12e9-4e7a-adfc-0cb4760733e5/cosmic-v68-hg19.vcf.gz \ analysis=variant2 \ configalgorithmtools_on=qualimap_full \ configalgorithmvariant_regions=/var/lib/cwl/stg5c61a4b3-c1cc-4761-87f8-70e86f32afd5/variant_regions-bam.bed \ align_bam=/var/lib/cwl/stg72ce9a6d-0566-4a97-b0a0-343b5d8fe50e/Test1-sort.bam \ regionscallable=/var/lib/cwl/stg6265737c-b78a-4c92-8bb0-1a8b9949d749/Test1-coverage.callable-vrsubset.bed \ configalgorithmcallable_regions=/var/lib/cwl/stg4ecdacb3-8634-4ea8-b54f-4b51c97d031e/Test1-analysis_blocks.bed \ region=chrM:0-1000 \ vrn_file_region=/var/lib/cwl/stg18ab04ed-ed4b-4ee3-afeb-d36e9fee0f0f/b1-chrM_0_1000.vcf.gz \ sentinel_parallel=batch-merge \ description=Test2 \ configalgorithmvalidate=/var/lib/cwl/stg0aa7a2c9-86bc-48f3-80c8-71aaff066d42/7_100326_FC6107FAAXX-grade.vcf \ referencefastabase=/var/lib/cwl/stgde1396cd-ac96-47b0-8c17-14b662bcb6b2/hg19.fa \ referencertg=/var/lib/cwl/stg441389be-7263-429a-aed0-634cdd8b0f50/mainIndex \ configalgorithmvariantcaller=freebayes \ configalgorithmcoverage_interval=regional \ metadatabatch=b1 \ metadataphenotype=normal \ 'reference__genome_context=/var/lib/cwl/stg368108c8-c795-4b86-adc5-85f435283645/test.bed.gz;;/var/lib/cwl/stg9102f634-1906-40d0-8a81-7371745682df/test2.bed.gz' \ configalgorithmvalidate_regions=/var/lib/cwl/stg68f99767-e936-4415-bdb1-db26d043aaa1/variant_regions-bam.bed \ genome_build=hg19 \ 'configalgorithmtools_off=gemini;;vqsr' \ genome_resourcesvariationdbsnp=/var/lib/cwl/stg579ee780-01b8-46c7-8bb6-159380c7f418/dbsnp_132.vcf.gz \ genome_resourcesvariationcosmic=/var/lib/cwl/stgc7a64bcd-12e9-4e7a-adfc-0cb4760733e5/cosmic-v68-hg19.vcf.gz \ analysis=variant2 \ 'configalgorithmtools_on=gvcf;;qualimap_full' \ configalgorithmvariant_regions=/var/lib/cwl/stge275be18-7913-436e-96e0-4387c780498c/variant_regions-bam.bed \ align_bam=/var/lib/cwl/stg7df1dd46-7f47-4af8-91bb-3c469609f28f/Test2-sort.bam \ regionscallable=/var/lib/cwl/stg2c624fa6-eb50-42ec-925f-e9f511fbbcc8/Test2-coverage.callable-vrsubset.bed \ configalgorithmcallable_regions=/var/lib/cwl/stg4f2b9e52-2372-44e6-8d4d-a0ae21c3d018/Test2-analysis_blocks.bed \ region=chrM:2000-5000 \ vrn_file_region=/var/lib/cwl/stg7b286f5d-8ab5-4a50-b8b2-b2cd429446df/b1-chrM_2000_5000.vcf.gz \ sentinel_outputs=vrn_file \ region=chr22:0-14595 \ vrn_file_region=/var/lib/cwl/stg6e752145-47a7-4137-998f-7003b3232a17/b1-chr22_0_14595.vcf.gz \ region=chr22:15068-15500 \ vrn_file_region=/var/lib/cwl/stg8ce73ab2-44e8-44cc-b2dd-10e1a6a016e3/b1-chr22_15068_15500.vcf.gz
Error log:
Traceback (most recent call last): File "/usr/local/bin/bcbio_nextgen.py", line 219, in
runfn.process(kwargs["args"]) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 45, in process fnargs, parallel, out_keys = _world_from_cwl(fnargs[1:], work_dir) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 194, in _world_from_cwl out = _split_groups_finalize_cwl(dict(grouped_keys), data, work_dir, passed_keys, output_cwl_keys, runtime) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 237, in _split_groups_finalize_cwl val = _resolve_null_vals(key, vals, reci, num_recs) File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 266, in _resolve_null_vals raise ValueError("Unsure how to resolve uneven values for %s with %s records: %s" % (key, num_recs, vals)) ValueError: Unsure how to resolve uneven values for align_bam with 4 records: ['/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_alignment_1_s_merge_split_alignments/align/Test1/Test1-sort.bam', '/sbgenomics/Projects/77c39d4b-61a9-48f4-a5b4-0657ff6977e4/workspace/4f6b2df1-401a-464b-9ae8-13292ba51cf3/root_alignment_2_s_merge_split_alignments/align/Test2/Test2-sort.bam']
EDIT:
From the latest docker image, in runfn.py, in def _resolve_null_vals(key, vals, reci, num_recs):
allowed_uneven = set(["summary__qc"])
While on the bcbio-nextgen github page it has:
allowed_uneven = set(["concat_batch_variantcalls", "multiqc_summary"])
Hope this helps :)
Sorry about the issue, with the latest bcbio test CWL files (https://github.com/bcbio/test_bcbio_cwl) we've moved to using smaller Docker containers instead of the one big one:
https://quay.io/organization/bcbio
and these have a fix for this problem. Are you running with the latest test CWL and Docker or in some other way? Happy to provide what you need to test and run and can rebuild the full container if that helps.
Longer term, I'm looking at switching over to use Luka's InitialWorkDirRequirement
idea to avoid this but hopefully the fixed containers get you going in the short term.
Thanks for the quick reply. Yeah, I tried running with the latest version of the test CWL, and also docker images from quay repo.
However, it seems that this fix is not included in the latest quay.io/bcbio/bcbio-vc image, as the files are different than those on https://github.com/chapmanb/bcbio-nextgen/blob/master.
I started to manually update some files in a local image (runfn.py, multiprocess.py, genotype.py) to be as in bcbio-nextgen master, and got a few steps forward EDIT: and got it running :)
Brilliant, glad you managed to get it running cleanly. Apologies on the out of date docker containers, I bumped those now so they contain the compatibility fix. Please let me know if you run into anything else at all. Thanks again for looking at this.
Great, thanks a lot! I'll put an update here if something comes up.
To continue discussion from #92