openvax / topiary

Predict mutated T-cell epitopes from sequencing data
Apache License 2.0
27 stars 9 forks source link

Make sure we'll correctly translating chr13 g.5864876_5864877insG (mm10) #52

Closed iskandr closed 8 years ago

iskandr commented 8 years ago

In the CIMT poster "Neo-epitopes generated by insertions, deletions, and gene fusions as target candidates for personalized tumor vaccination" they translate the mutation mm9:chr13 g.5864121_5864122insG as RG**GEEGGIRTEDF***

In John F's sequencing of B16 he found mm10:chr13 g.5864876_5864877insG, from which Topiary predicts the following epitopes:

4    H-2-Kb    KVYTKSSHL    9    50.24    0.17    0.880628986    2.180770599    Klf6    ENSMUSG00000000078    chr13 g.5864876_5864877insG    p.E107fs    FrameShift    Klf6-201    ENSMUST00000000080    16.651443    15.277563    15.964503    38    137    29    123    0.373779637    2.180770599    netMHCcons    319    106    319    244    15.277563    TRUE    TRUE    FALSE    0    9
26    H-2-Kb    ISSSFNYNL    9    227.27    1    0.693854117    2.180770599    Klf6    ENSMUSG00000000078    chr13 g.5864876_5864877insG    p.E107fs    FrameShift    Klf6-201    ENSMUST00000000080    16.651443    15.277563    15.964503    38    137    29    123    0.373779637    2.180770599    netMHCcons    319    106    319    126    15.277563    TRUE    TRUE    FALSE    0    9
33    H-2-Kb    FNYNLETNSL    10    285.24    1.5    0.606287496    2.180770599    Klf6    ENSMUSG00000000078    chr13 g.5864876_5864877insG    p.E107fs    FrameShift    Klf6-201    ENSMUST00000000080    16.651443    15.277563    15.964503    38    137    29    123    0.373779637    2.180770599    netMHCcons    319    106    319    130    15.277563    TRUE    TRUE    FALSE    0    10
iskandr commented 8 years ago

Confirmed that Variant("chr13", 5864876, "", "G", "GRCm38").effects() gives a single frame shift that continues for 213 amino acids starting with "GEKKEESELKISSSPPEDSLISSSFNYNLETNSLNSDVSSESSDSSEELSPTTK..."

iskandr commented 8 years ago

The reference sequence on the poster seems to match what I get in PyEnsembl on Klf6-201 starting from offset 459.

GCT_CGG_GGG_GAG_AAG_AAG_GAG_GAA_TCA_GAA_CTG_AAG_ATT_TCT_TCT_AGT_CCC_CCA
-A- -R- -G- -E- -K- -K- -E- -E- -S- -E- -L- -K- -I- -S- -S- -S- -P- -P-

With the insertion of a G after position 462 we get a sequence of:

GCT_CGG_GGG_GGA_GAA_GAA_GGA_GGA_ATC_AGA_ACT_GAA_GAT_TTC_TTC_TAG
-A- -R- -G- -G- -E- -E- -G- -G- -I- -R- -T- -E- -D- -F- -F- *

...which also matches the poster. So, how do we get a different sequence from Varcode? Opening an issue there.

iskandr commented 8 years ago

See: https://github.com/hammerlab/varcode/issues/151

iskandr commented 8 years ago

Fixed by https://github.com/hammerlab/varcode/issues/151