wtsi-hgi / gatk-cwl-generator

Generates CWL files from the GATK documentation
MIT License
7 stars 1 forks source link

Add variant-index, variant-md5, bam-index, bam-md5 to the main outputs secondaryFiles #17

Closed ThomasHickman closed 6 years ago

ThomasHickman commented 6 years ago

It would be better if the index, md5 for varients and bams were included as secondaryFiles with the main output file

ThomasHickman commented 6 years ago

Note: moving things to a file's secondaryFiles can only be done in javascript. I've submitted a issue to make this happen by default when outputBindings are specified here: https://github.com/common-workflow-language/common-workflow-language/issues/622. If this is added to the specification, this can be used

sersorrel commented 6 years ago

Currently the generated CWL looks like this:

- id: variant-index
  doc: index file generated if create-output-variant-index is true
  type: File?
  outputBinding:
    glob:
    - $(inputs['create-output-variant-index'] + '.idx')
    - $(inputs['create-output-variant-index'] + '.tbi')
- id: output
  doc: Output file from corresponding to the input argument output-filename
  type: File
  outputBinding:
    glob: $(inputs['output-filename'])

Presumably the goal is to have it look more like this:

- id: output
  doc: Output file from corresponding to the input argument output-filename
  type: File
  outputBinding:
    glob: $(inputs['output-filename'])
  secondaryFiles:
  - $(inputs['create-output-variant-index']?self.basename+'.idx':[])
  - $(inputs['create-output-variant-index']?self.basename+'.tbi':[])

However, looking at this WDL file, it looks like the variant-index file created will be either a .idx file or a .tbi file, not both, which is not expressible with secondaryFiles at the moment. Maybe it will be possible in CWL v1.1 though: common-workflow-language/common-workflow-language#717

mr-c commented 6 years ago

@anowlcalledjosh Can you guess ahead of time if you are getting a idx or an tbi? If so then you can use an expression inside secondaryFiles

sersorrel commented 6 years ago

Hmm, it looks like if the output file's extension is .vcf, then we get an idx, but if the output is .vcf.gz then the index is .tbi. Presumably then something like this would work (with a polyfill for endsWith, since it's ES6):

secondaryFiles:
- $(inputs['create-output-variant-index']? self.basename+(inputs['output-filename'].endsWith('.gz')? '.tbi' : '.idx') : [])

or, more legibly:

secondaryFiles:
- ${
    if (inputs['create-output-variant-index']) {
      if (inputs['output-filename'].endsWith('.gz')) {
        return self.basename + '.tbi'
      } else {
        return self.basename + '.idx'
      }
    } else {
      return []
    }
  }

On another note: I'm not sure if the current output is actually correct for BAM files - it looks like the correct index extension there is .bai. (Here support for optional secondaryFiles would be really nice, because then we could write ^.bai? rather than having to change .bam to .bai in JS I was mistaken, if you have something.bam, the corresponding index file is something.bam.bai, not something.bai.)