ylab-hi / ScanNeo2

Snakemake-based computational workflow for neoantigen prediction from diverse sources
MIT License
10 stars 1 forks source link

cleave peptide for fusion gene #25

Open nttg8100 opened 1 month ago

nttg8100 commented 1 month ago

I tested on the test dataset of nextneopi (https://github.com/icbi-lab/nextNEOpi). I has a similar session for using the arriba to get the fusion genes. From those fusion genes, it can get the peptides that are possible to be the neoantigens.

There peptides with 8 amino acids: PTEN - AC063965.1(21548),MED6P1(31892) MFSGGTCm FSGGTCmg SGGTCmgr GGTCmgrc GTCmgrcm TCmgrcmq Cmgrcmqt mgrcmqty grcmqtyp rcmqtypk cmqtypkv mqtypkvq qtypkvqg typkvqgs#Fusion-out-of-frame#high#yes#chr10:87952259#chr10:88016243#11#1#0#.#.

Your peptides with 8 amino acids: MFSGGTCm

Is there anything wrong related to my test. Or your pipeline is focused on getting only this peptide rather than getting too much peptides sequence to achieve 37/38 active neoantigens on TELSA dataset?

riasc commented 1 month ago

Hi, thanks for this information. I haven't tried it with their test dataset but its maybe worth to look into it. So you applied only the fusion detection? Do you have the intermediate results from Arriba and would you mind sharing it with me. So I could compare that at least the input from Arriba is the same.

I guess you haven't changed the parameters for Arriba? The default in ScanNeo2 is to filter out the ones with less than 2 supporting reads.

Also, ScanNeo2 filters out potential neoantigens that are identical with the corresponding wildtype sequence which for fusion genes is essentially the sequence of the first gene. (https://github.com/ylab-hi/ScanNeo2/blob/52a3818ec3189af502f18eba6b6a1a69b9b3a8c3/workflow/scripts/prioritization/effects.py#L298C3-L303C25). It could be that this filters out some of them. But I can't say for sure.

But overall the TESLA validated neoantigens are based on SNV/indels so I wouldn't expect to detect those with gene fusion events, although haven't checked that in detail.

Thanks for bringing this to my attention

nttg8100 commented 1 month ago

What I mean is about how to get the peptide from the output of the fusion analysis using arriba. In the nextneopi. For example, we have this fusion gene peptide sequence:

LFHKMMFETIPMFSGGTC|mgrcmqtypkvqgs*

What I ran on your script, when I require to have the 8 amino acids. It gives me only one sequence:

MFSGGTC|m

While nextneopi gives me more by changing the whole sequence from left to right one aa

MFSGGTC|m 
FSGGTC|mg
SGGTC|mgr
...
C|mgrcmqt

I think what you already implement on this pipeline can be improved by adding this to get more sequence.

riasc commented 1 month ago

Thats exactly what I'm doing.. it basically scans the position where the breakpoint appears and use this with a sliding window to determine all possible peptides that include the fusion. So I'm surprised it only gives you one. But thanks again. I will look into it.