wustl-oncology / analysis-wdls

Scalable genomic analysis pipelines, written in WDL
MIT License
5 stars 11 forks source link

Bugfix and optimiziation for VCF filtering. #99

Closed tmooney closed 1 year ago

tmooney commented 1 year ago
tmooney commented 1 year ago

I ran a small test through and the VCF came out identically to the old version when the two steps in question are skipped entirely.

malachig commented 1 year ago

It seems like this PR involves two things. One about the "CLE filter" which removes variants unless they match certain transcript effect types.

The second relates to the "known variants filter". Where a list of validated variants is supplied in VCF form and used to annotate the final VCF that comes out of the pipeline. Related to this:

Don't we need to pass the TBI file in as well here? https://github.com/wustl-oncology/analysis-wdls/blob/47fd1fa04c018a27e2c7db1c38cdadbaf0e7deb1/definitions/detect_variants.wdl#L234

As an aside, I think it add to the confusion to call this step "filterKnownVariants" when it seems to be annotating variant records. Nothing is filtered out.

malachig commented 1 year ago

In testing this we identified an unrelated issue that seems to be caused by localization_optional usage in VarScan. However, I think this particular PR is working as expected.