psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
55 stars 34 forks source link

cdr3 info in annotation output #195

Closed jotwin closed 8 years ago

jotwin commented 8 years ago

Is it possible to get the location of the cdr3 region, or the cdr3 sequence itself from annotation?

psathyrella commented 8 years ago

Yeah, although it's implicit. In the code that gets run for the --view-annotations option to view an existing annotation file, the implicit info all gets added here. If you add the following line after this call to add_implicit_info() it'll print the cdr3 sequence:

print line['seqs'][0][line['cyst_position'] : line['tryp_position'] + 3]

jotwin commented 8 years ago

I figured out how to add cyst_position and tryp_position to the annotation csv. I'm finding cases where tryp_position > string length of the sequence. How is that possible?

psathyrella commented 8 years ago

ah, great. And sorry you have to add it, but I'm trying to hold the line on adding too much information that's already implicit, to keep the file size down.

I would have to see the individual case, but typically that happens in sequences with large j "3' deletions" (i.e. the read doesn't extend all the way through the j). The easiest way to see what's going on is to run with --debug 1, specifying only those queries with --queries QUERY_A:QUERY_B:...