pdxgx / neoepiscope

predicts neoepitopes from phased somatic mutations detected using tumor/normal DNA-seq data
Other
26 stars 17 forks source link

Support substitutions longer than 1 #18

Open ThijsMaas opened 3 years ago

ThijsMaas commented 3 years ago

Hi! I'm working with a heavily modified version of your tool for a client and found an issue that originated from your code. The client wanted to support dinucleotide substitutions, which would be found on the VCF file as "AA" -> "TT". Currently, neoepiscope reads this as a SNP which have a hardcode length of 1 in the annotated_seq() function. The resulting annotated sequence will be wrong, the reference sequence after the mutated one will start 1 nucleotide to early, resulting in a frame-shifted neopeptide. I was able to fix this by changing line 1783 and 1784 of transcript.py: last_index += fill + 1 -> last_index += fill + len(snv[0]) last_pos += fill + 1 -> last_pos += fill + len(snv[0])

And this can be tested like this:

self.fwd_transcript.edit("CA", 450533)
peptides = self.fwd_transcript.neopeptides().keys()
self.assertEqual(len(peptides), 42)
self.assertEqual(sorted(peptides)[0], "AGRASLDK")
self.assertEqual(sorted(peptides)[-1], "VPAGRASLDKP")