Generates CWL files from the GATK documentation
First, install the module
git clone https://github.com/wtsi-hgi/gatk-cwl-generator
cd gatk-cwl-generator
python setup.py install
You may also want to install cwltool to run the generated CWL files
usage: gatk_cwl_generator [-h] [--version VERSION] [--verbose] [--out OUTPUT_DIR]
[--include INCLUDE] [--dev] [--use_cache [CACHE_LOCATION]]
[--no_docker] [--docker_image_name DOCKER_IMAGE_NAME]
[--gatk_command GATK_COMMAND]
Generates CWL files from the GATK documentation
optional arguments:
-h, --help show this help message and exit
--version VERSION, -v VERSION
Sets the version of GATK to parse documentation for.
Default is 3.5-0
--verbose Set the logging to be verbose. Default is False.
--out OUTPUT_DIR, -o OUTPUT_DIR
Sets the output directory for generated files. Default
is ./gatk_cmdline_tools/<VERSION>/
--include INCLUDE Only generate this file (note, CommandLinkGATK has to
be generated for v3.x)
--dev Enable --use_cache and overwriting of the generated
files (for development purposes). Requires
requests_cache to be installed
--use_cache [CACHE_LOCATION]
Use requests_cache, using the cache at CACHE_LOCATION,
or 'cache' if not specified. Default is False.
--no_docker Make the generated CWL files not use docker
containers. Default is False.
--docker_image_name DOCKER_IMAGE_NAME, -c DOCKER_IMAGE_NAME
Docker image name for generated cwl files. Default is
'broadinstitute/gatk3:<VERSION>' for version 3.x and
'broadinstitute/gatk:<VERSION>' for 4.x
--gatk_command GATK_COMMAND, -l GATK_COMMAND
Command to launch GATK. Default is 'java -jar
/usr/GenomeAnalysisTK.jar' for gatk 3.x and 'java -jar
/gatk/gatk.jar' for gatk 4.x
This has been tested on versions 3.5-0 to 3.8-0 and 4.beta.6.
The parameters generated are the same as you would need to specify on the command line, with "--" stripped from the beginning.
To add tags to arguments that have a file type, add to the parameter <NAME>_tags
. e.g. to output the parameter --variant:vcf path\to\file
, use the input:
variant:
class: File
path: path\to\file
variant_tags: [vcf]
For convenience, you can also specify any array input argument as a single element.
The cwl files will be outputted to gatk_cmdline_tools/<VERSION>/cwl
and the JSON files given by the documentation to gatk_cmdline_tools/<VERSION>/json
.
<NAME>Output
To test the generated CWL files, provided are inputs to the HaplotypeCaller tool. To test assuming you have used the default options and have installed everything as above, run:
cwl-runner gatk_cmdline_tools/3.5/HaplotypeCaller.cwl examples/HaplotypeCaller_inputs.yml
The generated CWL files can also be found in the releases
Install the tests requirements, then run the tests. Note: docker must be installed in order to run the tests (the cwl files are tested during the tests):
pip install -r test_requirements.txt
pytest gatkcwlgenerator
You can also run the tests in parallel with -n
to improve performance
To create a gatk_cmdline_tools.zip
zip file containing all the generated cwl files for gatk versions 3.5, 3.6, 3.7, 3.8 and 4.0.0.0, run bash build.sh
. This file is uploaded as a release on GitHub for every new release of this package.