Hi!
I'm working with a heavily modified version of your tool for a client and found an issue that originated from your code.
The client wanted to support dinucleotide substitutions, which would be found on the VCF file as "AA" -> "TT".
Currently, neoepiscope reads this as a SNP which have a hardcode length of 1 in the annotated_seq() function.
The resulting annotated sequence will be wrong, the reference sequence after the mutated one will start 1 nucleotide to early, resulting in a frame-shifted neopeptide.
I was able to fix this by changing line 1783 and 1784 of transcript.py:
last_index += fill + 1 -> last_index += fill + len(snv[0])last_pos += fill + 1 -> last_pos += fill + len(snv[0])
Hi! I'm working with a heavily modified version of your tool for a client and found an issue that originated from your code. The client wanted to support dinucleotide substitutions, which would be found on the VCF file as "AA" -> "TT". Currently, neoepiscope reads this as a SNP which have a hardcode length of 1 in the annotated_seq() function. The resulting annotated sequence will be wrong, the reference sequence after the mutated one will start 1 nucleotide to early, resulting in a frame-shifted neopeptide. I was able to fix this by changing line 1783 and 1784 of transcript.py:
last_index += fill + 1
->last_index += fill + len(snv[0])
last_pos += fill + 1
->last_pos += fill + len(snv[0])
And this can be tested like this: