wustl-oncology / analysis-wdls

Scalable genomic analysis pipelines, written in WDL
MIT License
5 stars 11 forks source link

Create a tool that aggregates and prioritizes/filters fusion neoantigens #151

Open malachig opened 2 months ago

malachig commented 2 months ago

Objectives of the tool:

Minimal Files needed to perform review:

Steps

  1. Create copy of all files above in a new subdirectory

  2. Create a prioritized candidate list based on the FI coding effect TSV file. Filtering criteria. Must meet all of the following criteria: Read Support (a) Junction + Spanning counts > 5; (b) junction >= 1;

NOT a readthrough. Defined as: [Left Chr and Right Chr are different] OR [chromosome are the same BUT Left Strand and Right Strand are different] OR [chromosome and strand are the same BUT ABS(Left Pos - Right Pos) < 1,000,000] OR [Fusion GeneA Name OR Fusion GeneB Name matches a known fusion driver gene]

Anchor support (?) Require ​​LargeAnchorSupport == YES

  1. Eliminate candidates that do not give rise to neoantigens at all Calculate a count of unique peptides (“Best Peptide”) from the pVACfuse aggregated epitopes that match the GeneA_GeneB pairing (e.g. “KANK4_ALK”). If this count is 0, the candidate fusion event will not be marked in “REVIEW” Tier, otherwise assigned to “POOR” Tier.

  2. Extract 51-mer peptide sequence centered on gene fusion junction. This was being done manually by use of BLAT and string match for junction peptide sequences. Figure out a way to do this in a more automated fashion with info from files from: FusionInspector, StarFusion, and pVACfuse.

  3. Create the final review table file (analogous to pVACview main table).

    • All/Most of the columns from FusionInspector
    • FFPM value
    • Full fusion amino acid sequence (with GeneA/GeneB junction marked somehow)
    • Unique neoantigen peptide count (total, and per HLA allele). Where multiple transcript pairs give rise to the same exact peptides, define a transcript-pair set, and pick a representative transcript pair.
    • Other info from pVAFfuse: IC50 MT, %ile MT, Expr, Read Support

One time tasks

Define a reference list of known fusion driver genes Cancer Gene Census entries where “Role In Cancer” includes fusion. Could be refined to consider tumor type. Or manually curated to a higher confidence list.

Abbreviations: FI = FusionInspector

susannasiebert commented 2 months ago

Reading over this list I'm wondering if some of these steps should happen in pvacfuse itself. In my mind, the pVACfuse aggregated report is what should be used for protizing fusion neoantigens, so if the format and available fields don't match what is needed, we should update pVACfuse to create a better aggregated report instead of using the aggregated report + some other inputs to create yet another report. This would be especially important since we would eventually like pVACview to support output files from pVACfuse so that candidates can be evaluated in the same interface.

A few items that stick out to me: