wtsi-hgi / gatk-cwl-generator

Generates CWL files from the GATK documentation
MIT License
7 stars 1 forks source link
cwl gatk


Generates CWL files from the GATK documentation


First, install the module

git clone https://github.com/wtsi-hgi/gatk-cwl-generator
cd gatk-cwl-generator
python setup.py install

You may also want to install cwltool to run the generated CWL files



usage: gatk_cwl_generator [-h] [--version VERSION] [--verbose] [--out OUTPUT_DIR]
                          [--include INCLUDE] [--dev] [--use_cache [CACHE_LOCATION]]
                          [--no_docker] [--docker_image_name DOCKER_IMAGE_NAME]
                          [--gatk_command GATK_COMMAND]

Generates CWL files from the GATK documentation

optional arguments:
  -h, --help            show this help message and exit
  --version VERSION, -v VERSION
                        Sets the version of GATK to parse documentation for.
                        Default is 3.5-0
  --verbose             Set the logging to be verbose. Default is False.
                        Sets the output directory for generated files. Default
                        is ./gatk_cmdline_tools/<VERSION>/
  --include INCLUDE     Only generate this file (note, CommandLinkGATK has to
                        be generated for v3.x)
  --dev                 Enable --use_cache and overwriting of the generated
                        files (for development purposes). Requires
                        requests_cache to be installed
  --use_cache [CACHE_LOCATION]
                        Use requests_cache, using the cache at CACHE_LOCATION,
                        or 'cache' if not specified. Default is False.
  --no_docker           Make the generated CWL files not use docker
                        containers. Default is False.
  --docker_image_name DOCKER_IMAGE_NAME, -c DOCKER_IMAGE_NAME
                        Docker image name for generated cwl files. Default is
                        'broadinstitute/gatk3:<VERSION>' for version 3.x and
                        'broadinstitute/gatk:<VERSION>' for 4.x
  --gatk_command GATK_COMMAND, -l GATK_COMMAND
                        Command to launch GATK. Default is 'java -jar
                        /usr/GenomeAnalysisTK.jar' for gatk 3.x and 'java -jar
                        /gatk/gatk.jar' for gatk 4.x

This has been tested on versions 3.5-0 to 3.8-0 and 4.beta.6.

The parameters generated are the same as you would need to specify on the command line, with "--" stripped from the beginning.

To add tags to arguments that have a file type, add to the parameter <NAME>_tags. e.g. to output the parameter --variant:vcf path\to\file, use the input:

   class: File
   path: path\to\file

variant_tags: [vcf]

For convenience, you can also specify any array input argument as a single element.

The cwl files will be outputted to gatk_cmdline_tools/<VERSION>/cwl and the JSON files given by the documentation to gatk_cmdline_tools/<VERSION>/json.

Generated CWL files


To test the generated CWL files, provided are inputs to the HaplotypeCaller tool. To test assuming you have used the default options and have installed everything as above, run:

cwl-runner gatk_cmdline_tools/3.5/HaplotypeCaller.cwl examples/HaplotypeCaller_inputs.yml

The generated CWL files can also be found in the releases


Install the tests requirements, then run the tests. Note: docker must be installed in order to run the tests (the cwl files are tested during the tests):

pip install -r test_requirements.txt
pytest gatkcwlgenerator

You can also run the tests in parallel with -n to improve performance


Creating a new version

To create a gatk_cmdline_tools.zip zip file containing all the generated cwl files for gatk versions 3.5, 3.6, 3.7, 3.8 and, run bash build.sh. This file is uploaded as a release on GitHub for every new release of this package.