wustl-oncology / analysis-wdls

Scalable genomic analysis pipelines, written in WDL
MIT License
5 stars 11 forks source link

expose variant filtering params up to immuno #97

Closed malachig closed 8 months ago

malachig commented 1 year ago

It seems that currently I can't configure these values in my YAML?

Tumor VAF cutoff applied to individual variant caller results (mutect and strelka) prior to creating a merged VCF https://github.com/wustl-oncology/analysis-wdls/blob/7d0c18d0ab961e1f18df0b03422157c9f1cd5ad8/definitions/tools/fp_filter.wdl#L14

LLR threshold applied to the merge VCF https://github.com/wustl-oncology/analysis-wdls/blob/7d0c18d0ab961e1f18df0b03422157c9f1cd5ad8/definitions/detect_variants.wdl#L67

malachig commented 1 year ago

I would like to be able to do this in my immuno YAML:

#Reduce tumor VAF cutoff for FP filter applied to mutect and strelka calls (default is 0.05)
immuno.min_var_freq: 0.03

#Reduce tumor VAF cutoff use by varscan (default is 0.05)
immuno.varscan_min_var_freq: 0.03

#Reduce LLR threshold for filtering of the multi caller merged VCF (default is 5)
immuno.filter_somatic_llr_threshold: 2

But currently I think I can only do the varscan one?

malachig commented 1 year ago

Some notes on this issue:

The only real application I can see to having separate variables for varscan and fp_filter is where you set the varscan_filter at a more stringent threshold (e.g. 0.1) but then apply a more relaxed filter in fp_filter (e.g. 0.05). This would allow lower VAF variants to come from Strelka/Mutect than that allowed from VarScan. I'm not entirely convinced that this narrow use case is worth the complexity it creates to have separate variables. However, it does seem that was the intent of how the pipeline was created.

I am experimenting with a PR that will implement the following approach:

malachig commented 8 months ago

This now complete (https://github.com/wustl-oncology/analysis-wdls/pull/136) and working as expected.