Open susannasiebert opened 5 years ago
We are planning on adding this feature in the 1.6.0 feature release.
Thanks for your patience @ShahiRB, we agree that stop loss variants are potentially a very useful source of neoantigens.
It looks like the VEP downstream plugin does not support stop_lost
variants. So we don't have the mutant protein sequence from which to extract neopeptides as we would for other variant types.
I made an issue on the Ensembl VEP_plugins GitHub repo: https://github.com/Ensembl/VEP_plugins/issues/286
On the above issue the reply was:
Are you running VEP in offline mode (using the flag --offline)?
If VEP is run in offline mode using the flag --offline, a FASTA file is required to get the sequences for the 3' UTR.
Sequence may be incomplete without a FASTA file or database connection
Can we rerun the stop_lost VCF with our current VEP CWL + Docker container and the same settings used for the immunotherapy trials?
@huimingx reannotated these variants and they now don't have XXX in the DownstreamProtein field. However, not all stop_lost
variants have a DownStreamProtein. From Ensembl/VEP_plugins#286 the example we sent them was a case where the variant resulted in just another stop codon. It appears that, at least in our example, all of the stop_lost
variants that don't result in a new stop codon are also annotated with a consequence of frameshift_variant
and thus already handled. We need to take a closer look at the remaining variants that are stop_lost
only (i.e. don't have a DownstreamProtein). We might need to run the following:
With the release of Ensembl 100 (officially released this afternoon), we have introduced the option --shift_3prime into VEP, where insertions and deletions within repeated regions will be shifted as far as possible in the 3' direction before consequence calculation. In the example provided by @huimingx above, this will now correctly provide a downstream consequence for your variant - see: http://rest.ensembl.org/vep/human/region/1:212360768-212360769/T?shift_3prime=1&content-type=application/json&minimal=1
Would it be possible to post an example of a stop lost
variant that is NOT marked frameshift, and does NOT result in a new stop codon?
Are these cases where the variant is a SNV that breaks the stop codon directly (rather than an upstream frameshift that by passes the usual stop)? Then without that stop, there are no alternative stops in the same frame in the remaining 3' UTR sequence of the transcript?
1 158095120 1_158095120_G/T G T . . CSQ=T|stop_lost|HIGH|KIRREL1|ENSG00000183853|Transcript|ENST00000359209.10|protein_coding|15/15||ENST00000359209.10:c.2274G>T|ENSP00000352138.6:p.Ter758TyrextTer85|2341|2274|758|*/Y|taG/taT|||1||1|SNV|HGNC|HGNC:15734|YES|1|P1|CCDS1172.2|ENSP00000352138|Q96J84||UPI0000443FBD|||||||||||||||||||||||||||||||||||||MLSLLVWILTLSDTFSQGTQTRFSQEPADQTVVAGQRAVLPCVLLNYSGIVQWTKDGLALGMGQGLKAWPRYRVVGSADAGQYNLEITDAELSDDASYECQATEAALRSRRAKLTVLIPPEDTRIDGGPVILLQAGTPHNLTCRAFNAKPAATIIWFRDGTQQEGAVASTELLKDGKRETTVSQLLINPTDLDIGRVFTCRSMNEAIPSGKETSIELDVHHPPTVTLSIEPQTVQEGERVVFTCQATANPEILGYRWAKGGFLIEDAHESRYETNVDYSFFTEPVSCEVHNKVGSTNVSTLVNVHFAPRIVVDPKPTTTDIGSDVTLTCVWVGNPPLTLTWTKKDSNMVLSNSNQLLLKSVTQADAGTYTCRAIVPRIGVAEREVPLYVNGPPIISSEAVQYAVRGDGGKVECFIGSTPPPDRIAWAWKENFLEVGTLERYTVERTNSGSGVLSTLTINNVMEADFQTHYNCTAWNSFGPGTAIIQLEEREVLPVGIIAGATIGASILLIFFFIALVFFLYRRRKGSRKDVTLRKLDIKVETVNREPLTMHSDREDDTASVSTATRVMKAIYSSFKDDVDLKQDLRCDTIDTREEYEMKDPTNGYYNVRAHEDRPSSRAVLYADYRAPGPARFDGRPSSRLSHSSGYAQLNTYSRGPASDYGPEPTPPGPAAPAGTDTTSQLSYENYEKFNSHPFPGAAGYPTYRLGYPQAPPSGLERTPYEAYDPIGKYATATRFSYTSQHSDYGQRFQQRMQTHV||||||||||||||||||
Looking at this example, the G to T mutation results in TAG to TAT change, I do not see a downstream stop codon immediately either, but VEP(v97) doesn't seem to annotate with downstream protein.
I'm probably confused but this looks like a downstream protein should be possible?
I think this is the 3' UTR. It seems like maybe the next version 11 of this transcript has a much larger UTR. Further complicating things (edited)
>KIRREL1-201 utr3:protein_coding
GGGCCAGAGCCTGGCTGGGGCATCTCTGCGGGGCAGAGGAGAAGGCTTTCACAGCTGTTCCCTGATATTCAGGGGCATTGCTCATTGCTCCCTTCTCGGACCAGCCTTCTTCCTCCCACCATGGCAGGTGGGGAGCAGGTCTCCCAGAAACACCCCGTCCCGAGGATGGTGCTCTGTGCATGCCCCAGCCTCCTGGGCCTGCCCTTCCCTCTTCTTCGGGAGGATGTGTCTCTTCTGACCTGCACTCTTGCCTGACCCTAGAATGGGGACAGGGAAAGTGAAGGTTAGGGAAAGCAGAGGGGGGCACTTTTTAGCATTCCCTTTCTATCCCACCCCTCTGATCTCCCATAAGTGGAAATGGGGGTACCCAGGGATGGGCAGGCTTTGGCCTAGGGACATGAAGTATGGGAGTGGGTGGCTGTGGCACAGACAGGTGGAAAACGGGATAGCCTGGCCAGTCCCTCTGTTGTCTGCATTCGTGCCCTGGGTGCCTCTCTCCTTCCTCAGGGTACTGCAGAAGGGAGCGAACAGGG
To me it looks like there are plenty of inframe stops that could be used? Adding in the lost stop TAG->TAT. And then continuing in that frame.
TAT GGG CCA GAG CCT GGC TGG GGC ATC TCT GCG GGG CAG AGG AGA AGG CTT TCA CAG CTG TTC CCT GAT ATT CAG GGG CAT TGC TCA TTG CTC CCT TCT CGG ACC AGC CTT CTT CCT CCC ACC ATG GCA GGT GGG GAG CAG GTC TCC CAG AAA CAC CCC GTC CCG AGG ATG GTG CTC TGT GCA TGC CCC AGC CTC CTG GGC CTG CCC TTC CCT CTT CTT CGG GAG GAT GTG TCT CTT CTG ACC TGC ACT CTT GCC [TGA] CCC [TAG] AAT GGG GAC AGG GAA AGT GAA GGT [TAG] GGA AAG CAG AGG GGG GCA CTT TTT AGC ATT CCC TTT CTA TCC CAC CCC TCT GAT CTC CCA [TAA] GTG GAA ATG GGG GTA CCC AGG GAT GGG CAG GCT TTG GCC [TAG] GGA CAT GAA GTA TGG GAG TGG GTG GCT GTG GCA CAG ACA GGT GGA AAA CGG GAT AGC CTG GCC AGT CCC TCT GTT GTC TGC ATT CGT GCC CTG GGT GCC TCT CTC CTT CCT CAG GGT ACT GCA GAA GGG AGC GAA CAG GG
Seems like you could just translate to the first stop (TGA in this case). No need to go beyond the UTR. (edited)
Probably just missing something here.
Mike mentioned this plugin as an alternative to the Downstream plugin: https://github.com/butkiem/COCOS
Unfortunately, we weren't able to produce any output with COCOS. Huiming made an issue here: https://github.com/butkiem/COCOS/issues/1
I am eagerly looking forward to this issue be solved soon. I have recently come across an article (in press) where they found a stop-lost variant with strong immunogenicity in colorectal cancer.