openvax / neoantigen-vaccine-pipeline

Bioinformatics pipeline for selecting patient-specific cancer neoantigen vaccines
Apache License 2.0
75 stars 25 forks source link

Annotate VAFs in the all-passing-variants file #161

Closed timodonnell closed 1 month ago

timodonnell commented 1 month ago

This PR modifies the pipeline to additionally output a file with a name like:

annotated.all-passing-variants_mhcflurry_mutect-strelka-mutect2.csv

This file contains everything in the usual all-passing-variants_mhcflurry_mutect-strelka-mutect2.csv file but adds columns giving read counts for the re / alt, total depth, and VAFs, for normal dna / tumor dna / tumor rna.

The results mostly agree with what we see in IGV but you should check the results yourself rather than totally trusting these values. I added a disclaimer output message to the script for that reason:

Note: this script is a useful first-pass for annotating VAFs, but it is simplistic. It works directly with the
alignments in the BAM and does not attempt to do any kind of realignment. That means it's not "seeing" the realigned version
that e.g. mutect2 will be operating from. Especially for indels or variants with unexpectedly low VAFs you should
manually check the results yourself in IGV. Also note that there may be discrepancies between what this script outputs
and what you see in IGV due to differences in filters. This script counts reads with mapping quality at least 10 that
are not marked as duplicates.
timodonnell commented 1 month ago

@julia326 mind reviewing this?