griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
144 stars 59 forks source link

Help with MT-WT epitope match for inframe mutations #1152

Open KhacDuyNguyen0 opened 1 month ago

KhacDuyNguyen0 commented 1 month ago

Dear the authors, I appriciate your workings to produce such a useful tool.

During the analysis, I identified a case involving an inframe insertion variant in the PIK3R1 gene at position 454, where T changes to TQFQEKS. For more details,

I understand that the matched wildtype epitope should share at least half of its length with the mutant epitope. For the mutant epitope FQEKSQFQE, I believe that FQEKSREYD would be an appropriate matched wildtype epitope, but the algorithm selected (NA) for this case. Similar issues appear with other mutant epitopes shown in the picture below. image

Another similar case occurs with mutations in the NCOR2 gene at position 1833-1834, where mutant and wildtype nmer as follows:

Best regards, Duy

susannasiebert commented 1 month ago

This is an interesting case. I agree that I would expect these to match as you describe them. There might be a bug in our logic. Would you be able to share a VCF file with just these two variants in them so that I can try to debug in further on my end?

KhacDuyNguyen0 commented 1 month ago

I am sorry for late response, here are my VCF files for these two mutations. inframe_mutations.zip

susannasiebert commented 1 month ago

A short update: I found the reason for this behavior. When we create the fasta file for making binding predictions, we only include n-1 flanking amino acids so that each n-length substring of the peptide overlaps the mutation position. However, with these particular examples, the insertion is actually a duplication of a longer region and the presumed mutation position T is not where the mutated amino acids start (which is at the end of the duplicated region). So not enough flanking amino acids were included in the fasta file pVACseq creates. You can see this reflected by looking at the .fasta file in the MHC_Class_I subfolder of your run. I'm working on fixing this error by including a longer subsequence for the WT of inframe insertions to account for duplicating insertions.

KhacDuyNguyen0 commented 1 month ago

Thank you so much for your support.