matsengrp / cft

Clonal family tree
5 stars 3 forks source link

Add CDR1/2 region annotations #254

Open metasoarous opened 5 years ago

metasoarous commented 5 years ago

This came up on matsengrp/olmsted#73, to which @psathyrella replied:

Partis only outputs cdr3 info. This is because it's the only cdr with a really unambiguous definition. There's a longer thread on b-t.cr where folks were discussing this, I could probably dig it up, but the gist is that cdr1 and cdr2 don't have as immutable of a definition, and instead depend on what alignment of the different V genes you want to use. CDR3 is different because most everyone agrees that if you don't have a cyst and tryp/phen where they usually are at start/end of cdr3, it's very uncommon to get a functional BCR.

That said it's fairly easy to get the cdr1/2 info, I just don't want to have it by default, because it requires a specific alignment (probably imgt), which keeps changing, and which you have to decide how to extend for new genes. The quick and dirty way to get alignments is pass this as the argument to --aligned-germline-fname (this is designed to be used for writing presto output, since presto requires imgt alignments). That of course doesn't solve the problem of adding new genes to that alignment file, or deciding if that is the alignment you want... but those issues are why it's not really a supported option.

I could be convinced that cdr1/2 should be added if it helps a lot, but I'm inclined to say it's better if you tack in on afterwards, since it's a property of the germline set, and doesn't have to do with the annotations or partitions (for instance, that ^ aligned file probably has all the info you need, and you'd just need to add a quick alignment for any new V genes).