openvax / topiary

Predict mutated T-cell epitopes from sequencing data
Apache License 2.0
27 stars 9 forks source link

Question: How are frameshifts handled? #55

Closed kippakers closed 8 years ago

kippakers commented 8 years ago

I'm getting interesting results w/r/t frameshifts in my data. A large percentage of my self-ligandome identical epitopes are from frameshifts (3511/3648), which was surprising.

My guess from a bit of reverse engineering is that you take the cDNA sequence, add the in/del, then run it through translation software, looking for the largest ORF. Is that right? I think that makes good sense too, I just want to understand.

Thanks, Kipp

iskandr commented 8 years ago

Hey Kipp,

There was a pretty tragic bug in Varcode which mis-translated some frameshifts: https://github.com/hammerlab/varcode/issues/151

I would upgrade Varcode before trusting any translated frameshifts. I'm sorry for the hassle!

As for how the translation happens: you have it pretty much right except that there's no need to select the longest ORF, a unique ORF should still be discernable after applying an indel to the reference sequence.

kippakers commented 8 years ago

Ha! I missed that update by ~ 2 hours. I'll give it a rerun and see how the frameshifts look. Thanks!