Open psathyrella opened 7 years ago
If you decide to implement this, the program should be aware of the order of various D segments on the chromosome to cut down on false positives. That is, you could have D1-1 followed by D1-7, but not vice versa. See Figure 5 of Briney et al Immunology 2012 for more details.
On Tue, Feb 14, 2017 at 2:47 PM, Duncan Ralph notifications@github.com wrote:
Might not be too hard, and in cases where there's really two Ds it'll of course make a huge difference to annotation accuracy.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/psathyrella/partis/issues/226, or mute the thread https://github.com/notifications/unsubscribe-auth/AGZe4MGx404sPTJwMOfzx2eg4qtnREtBks5rcgTAgaJpZM4MA6X- .
ooh, excellent, thanks for the tip.
I also just realized this should be easy to implement -- we just need smith-waterman to look for them during parameter caching, and if it finds any it just adds the smooshed-together double D to the germline set that gets passed to the hmm.
Would that work? There would typically be N-nucleotides inserted between the two D genes...
huh, no, not if there's insertions between the Ds. I don't think there's much chance of adding two Ds to the hmm, that'd be crazy complicated. But the sw annotations are only a little less accurate than the hmm ones, anyway, so we could just look for double Ds there.
Might not be too hard, and in cases where there's really two Ds it'll of course make a huge difference to annotation accuracy.