expose variant filtering params up to immuno

malachig commented 1 year ago

It seems that currently I can't configure these values in my YAML?

Tumor VAF cutoff applied to individual variant caller results (mutect and strelka) prior to creating a merged VCF https://github.com/wustl-oncology/analysis-wdls/blob/7d0c18d0ab961e1f18df0b03422157c9f1cd5ad8/definitions/tools/fp_filter.wdl#L14

LLR threshold applied to the merge VCF https://github.com/wustl-oncology/analysis-wdls/blob/7d0c18d0ab961e1f18df0b03422157c9f1cd5ad8/definitions/detect_variants.wdl#L67

malachig commented 1 year ago

I would like to be able to do this in my immuno YAML:

#Reduce tumor VAF cutoff for FP filter applied to mutect and strelka calls (default is 0.05)
immuno.min_var_freq: 0.03

#Reduce tumor VAF cutoff use by varscan (default is 0.05)
immuno.varscan_min_var_freq: 0.03

#Reduce LLR threshold for filtering of the multi caller merged VCF (default is 5)
immuno.filter_somatic_llr_threshold: 2

But currently I think I can only do the varscan one?

malachig commented 1 year ago

Some notes on this issue:

There are two general ways that minimum variant frequency (VAF) filters are used in the pipelines: supplied to tools/varscan_somatic.wdl/tools/varscan_germline.wdl, and to tools/fp_filter.wdl
Defaults for these filters are set to 0.05 or 0.1 for varscan and 0.05 for fp_filter
Even though we call it varscan_germline it is NOT actually used for germline variant calling. Rather, it is used for tumor only variant calling. The default threshold reflects that. Or germline variant calling uses GATK.
fp_filter is applied to variant calls from varscan, pindel, strelka and mutect. This means two rounds of filtering on VAF are applied to the VarScan results.
At present the defaults are set many, places throughout the pipeline. The first one encountered (or one you set in your inputs YAML) takes precedent.

The only real application I can see to having separate variables for varscan and fp_filter is where you set the varscan_filter at a more stringent threshold (e.g. 0.1) but then apply a more relaxed filter in fp_filter (e.g. 0.05). This would allow lower VAF variants to come from Strelka/Mutect than that allowed from VarScan. I'm not entirely convinced that this narrow use case is worth the complexity it creates to have separate variables. However, it does seem that was the intent of how the pipeline was created.

I am experimenting with a PR that will implement the following approach:

Name the two min_var_freq variables according their usage in varscan or fp_filter to make it easier to trace which is being used where
Do NOT set defaults for these variables anywhere except in the final tool WDL.
Make sure they can be passed in and set in the YAML when one runs immuno.wdl or any of the sub-workflows that involve variant detection.

malachig commented 8 months ago

This now complete (https://github.com/wustl-oncology/analysis-wdls/pull/136) and working as expected.

wustl-oncology / analysis-wdls

expose variant filtering params up to immuno #97