griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
139 stars 59 forks source link

Investigate supporting stop_lost variants #428

Open susannasiebert opened 5 years ago

ShahiRB commented 5 years ago

I am eagerly looking forward to this issue be solved soon. I have recently come across an article (in press) where they found a stop-lost variant with strong immunogenicity in colorectal cancer.

susannasiebert commented 5 years ago

We are planning on adding this feature in the 1.6.0 feature release.

malachig commented 5 years ago

Thanks for your patience @ShahiRB, we agree that stop loss variants are potentially a very useful source of neoantigens.

malachig commented 4 years ago

It looks like the VEP downstream plugin does not support stop_lost variants. So we don't have the mutant protein sequence from which to extract neopeptides as we would for other variant types.

susannasiebert commented 4 years ago

I made an issue on the Ensembl VEP_plugins GitHub repo: https://github.com/Ensembl/VEP_plugins/issues/286

susannasiebert commented 4 years ago

On the above issue the reply was:

Are you running VEP in offline mode (using the flag --offline)?

If VEP is run in offline mode using the flag --offline, a FASTA file is required to get the sequences for the 3' UTR.

Sequence may be incomplete without a FASTA file or database connection

Can we rerun the stop_lost VCF with our current VEP CWL + Docker container and the same settings used for the immunotherapy trials?

susannasiebert commented 4 years ago

@huimingx reannotated these variants and they now don't have XXX in the DownstreamProtein field. However, not all stop_lost variants have a DownStreamProtein. From Ensembl/VEP_plugins#286 the example we sent them was a case where the variant resulted in just another stop codon. It appears that, at least in our example, all of the stop_lost variants that don't result in a new stop codon are also annotated with a consequence of frameshift_variant and thus already handled. We need to take a closer look at the remaining variants that are stop_lost only (i.e. don't have a DownstreamProtein). We might need to run the following:

With the release of Ensembl 100 (officially released this afternoon), we have introduced the option --shift_3prime into VEP, where insertions and deletions within repeated regions will be shifted as far as possible in the 3' direction before consequence calculation. In the example provided by @huimingx above, this will now correctly provide a downstream consequence for your variant - see: http://rest.ensembl.org/vep/human/region/1:212360768-212360769/T?shift_3prime=1&content-type=application/json&minimal=1

malachig commented 4 years ago

Would it be possible to post an example of a stop lost variant that is NOT marked frameshift, and does NOT result in a new stop codon?

Are these cases where the variant is a SNV that breaks the stop codon directly (rather than an upstream frameshift that by passes the usual stop)? Then without that stop, there are no alternative stops in the same frame in the remaining 3' UTR sequence of the transcript?

huimingx commented 4 years ago

1 158095120 1_158095120_G/T G T . . CSQ=T|stop_lost|HIGH|KIRREL1|ENSG00000183853|Transcript|ENST00000359209.10|protein_coding|15/15||ENST00000359209.10:c.2274G>T|ENSP00000352138.6:p.Ter758TyrextTer85|2341|2274|758|*/Y|taG/taT|||1||1|SNV|HGNC|HGNC:15734|YES|1|P1|CCDS1172.2|ENSP00000352138|Q96J84||UPI0000443FBD|||||||||||||||||||||||||||||||||||||MLSLLVWILTLSDTFSQGTQTRFSQEPADQTVVAGQRAVLPCVLLNYSGIVQWTKDGLALGMGQGLKAWPRYRVVGSADAGQYNLEITDAELSDDASYECQATEAALRSRRAKLTVLIPPEDTRIDGGPVILLQAGTPHNLTCRAFNAKPAATIIWFRDGTQQEGAVASTELLKDGKRETTVSQLLINPTDLDIGRVFTCRSMNEAIPSGKETSIELDVHHPPTVTLSIEPQTVQEGERVVFTCQATANPEILGYRWAKGGFLIEDAHESRYETNVDYSFFTEPVSCEVHNKVGSTNVSTLVNVHFAPRIVVDPKPTTTDIGSDVTLTCVWVGNPPLTLTWTKKDSNMVLSNSNQLLLKSVTQADAGTYTCRAIVPRIGVAEREVPLYVNGPPIISSEAVQYAVRGDGGKVECFIGSTPPPDRIAWAWKENFLEVGTLERYTVERTNSGSGVLSTLTINNVMEADFQTHYNCTAWNSFGPGTAIIQLEEREVLPVGIIAGATIGASILLIFFFIALVFFLYRRRKGSRKDVTLRKLDIKVETVNREPLTMHSDREDDTASVSTATRVMKAIYSSFKDDVDLKQDLRCDTIDTREEYEMKDPTNGYYNVRAHEDRPSSRAVLYADYRAPGPARFDGRPSSRLSHSSGYAQLNTYSRGPASDYGPEPTPPGPAAPAGTDTTSQLSYENYEKFNSHPFPGAAGYPTYRLGYPQAPPSGLERTPYEAYDPIGKYATATRFSYTSQHSDYGQRFQQRMQTHV||||||||||||||||||

Looking at this example, the G to T mutation results in TAG to TAT change, I do not see a downstream stop codon immediately either, but VEP(v97) doesn't seem to annotate with downstream protein.

malachig commented 4 years ago

I'm probably confused but this looks like a downstream protein should be possible?

I think this is the 3' UTR. It seems like maybe the next version 11 of this transcript has a much larger UTR. Further complicating things (edited)

>KIRREL1-201 utr3:protein_coding
GGGCCAGAGCCTGGCTGGGGCATCTCTGCGGGGCAGAGGAGAAGGCTTTCACAGCTGTTCCCTGATATTCAGGGGCATTGCTCATTGCTCCCTTCTCGGACCAGCCTTCTTCCTCCCACCATGGCAGGTGGGGAGCAGGTCTCCCAGAAACACCCCGTCCCGAGGATGGTGCTCTGTGCATGCCCCAGCCTCCTGGGCCTGCCCTTCCCTCTTCTTCGGGAGGATGTGTCTCTTCTGACCTGCACTCTTGCCTGACCCTAGAATGGGGACAGGGAAAGTGAAGGTTAGGGAAAGCAGAGGGGGGCACTTTTTAGCATTCCCTTTCTATCCCACCCCTCTGATCTCCCATAAGTGGAAATGGGGGTACCCAGGGATGGGCAGGCTTTGGCCTAGGGACATGAAGTATGGGAGTGGGTGGCTGTGGCACAGACAGGTGGAAAACGGGATAGCCTGGCCAGTCCCTCTGTTGTCTGCATTCGTGCCCTGGGTGCCTCTCTCCTTCCTCAGGGTACTGCAGAAGGGAGCGAACAGGG

To me it looks like there are plenty of inframe stops that could be used? Adding in the lost stop TAG->TAT. And then continuing in that frame.

TAT GGG CCA GAG CCT GGC TGG GGC ATC TCT GCG GGG CAG AGG AGA AGG CTT TCA CAG CTG TTC CCT GAT ATT CAG GGG CAT TGC TCA TTG CTC CCT TCT CGG ACC AGC CTT CTT CCT CCC ACC ATG GCA GGT GGG GAG CAG GTC TCC CAG AAA CAC CCC GTC CCG AGG ATG GTG CTC TGT GCA TGC CCC AGC CTC CTG GGC CTG CCC TTC CCT CTT CTT CGG GAG GAT GTG TCT CTT CTG ACC TGC ACT CTT GCC [TGA] CCC [TAG] AAT GGG GAC AGG GAA AGT GAA GGT [TAG] GGA AAG CAG AGG GGG GCA CTT TTT AGC ATT CCC TTT CTA TCC CAC CCC TCT GAT CTC CCA [TAA] GTG GAA ATG GGG GTA CCC AGG GAT GGG CAG GCT TTG GCC [TAG] GGA CAT GAA GTA TGG GAG TGG GTG GCT GTG GCA CAG ACA GGT GGA AAA CGG GAT AGC CTG GCC AGT CCC TCT GTT GTC TGC ATT CGT GCC CTG GGT GCC TCT CTC CTT CCT CAG GGT ACT GCA GAA GGG AGC GAA CAG GG

Seems like you could just translate to the first stop (TGA in this case). No need to go beyond the UTR. (edited)

Probably just missing something here.

susannasiebert commented 4 years ago

Mike mentioned this plugin as an alternative to the Downstream plugin: https://github.com/butkiem/COCOS

susannasiebert commented 4 years ago

Unfortunately, we weren't able to produce any output with COCOS. Huiming made an issue here: https://github.com/butkiem/COCOS/issues/1