googlegenomics / pipelines-api-examples

Examples for the Google Genomics Pipelines API.
BSD 3-Clause "New" or "Revised" License
50 stars 27 forks source link

Enable WDL imports #51

Closed vdauwera closed 6 years ago

vdauwera commented 7 years ago

Cromwell now has functionality to allow users to import WDL code within a WDL workflow, either of tasks, or even complete workflows (-> subworkflows). This is very useful so it would be great to update the wdl_runner to take advantage of it. The main change needed is to add a parameter for the user to provide a zip file of the imports, and plug that into Cromwell.

Here's an overview of how WDL imports work.

At the command line: you specify your "master" WDL as input and you make a zip file with any dependent WDLs that contain either WDL tasks or entire workflows. The relevant docs are here: https://github.com/broadinstitute/cromwell/blob/23/README.md#imports

The main two use cases are:

1) You have a library of single-task WDLs that you want to import rather than copy into your workflows. This would be perfect application of the GATK wrappers we wrote to enable people to call tools without rewriting everything themselves. Here's a worked out example from the doc:

For example, consider you have a directory of WDL files:

my_WDLs
└──cgrep.wdl
└──ps.wdl
└──wc.wdl

If you zip that directory to my_WDLs.zip, you have the option to pass it in as the last parameter in your run command and be able to reference these WDLs as imports in your primary WDL. For example, your primary WDL can look like this:

import "ps.wdl" as ps
import "cgrep.wdl"
import "wc.wdl" as wordCount

workflow threestep {

call ps.ps as getStatus
call cgrep.cgrep { input: str = getStatus.x }
call wordCount { input: str = ... }

}

The command to run this WDL, without needing any inputs, workflow options or metadata files would look like:

$ java -jar cromwell.jar run threestep.wdl - - - /path/to/my_WDLs.zip

2) OR you want to tie together multiple workflows, for example if you have one that reverts BAMs to unmapped BAMs, then our single-sample pipeline that takes uBAMs to make GVCFs per sample, then a third that runs joint genotyping on all the GVCFs. Sometimes you want to run them separately, sometimes all in a row, but you don't want to have one massive WDL that replicates the code from each in case you need to update individual segments (code drift alert!). So you use subworkflows, meaning you write one "master" WDL that is basically a container that ties the three separate workflows together into a single runnable WDL, using import statements to load in entire workflows. There's a worked-out example in the doc at https://github.com/broadinstitute/cromwell/blob/23/README.md#sub-workflows.

CarlosBorroto commented 7 years ago

Any updates on this issue?

I'm not sure if @vdauwera second case made clear if you are using subworkflows you do not have the option to put both on the same WDL, cromwell requires only one workflow statement in the main WDL file. WDL also doesn't support nested scatter and gather, subworkflows is the only option if this is something you need. Without support for WDL imports, subworkflows and "nested scatter/gather" workaround are not possible.

I'm willing to give this a try and submit a pull request if this is something not been worked at the time.

mbookman commented 7 years ago

Hi Carlos,

I am not currently working on this. It seems fairly straightforward if implemented where the ZIP file is already in Google Cloud Storage.

I think this will need to be committed to the wdl repository after the PR that @vdauwera has out.

-Matt

CarlosBorroto commented 7 years ago

Hi Matt,

Thanks for the quick response. I'll keep an eye on the PR at WDL repo. In the meantime I will work on an implementation against the PR branch, build my own docker wdl runner image and start testing. The ZIP file already in Google Cloud Storage is a great advice.

-Carlos

IsanEmory commented 7 years ago

Hi guys,

This is working in cromwell-29. Great job. You can close this issue.

Regards.

mbookman commented 6 years ago

Closing this issue here, since the wdl_runner is now part of the openwdl repo.