psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
55 stars 34 forks source link

cdr3_length in light chain samples #215

Closed krdav closed 7 years ago

krdav commented 7 years ago

It looks like there is something wrong with the cdr3_length annotation when annotating light chain samples. First I get some very long cdr3 length, second they are not a multiple of three e.g. a cdr3_length of 100.

psathyrella commented 7 years ago

hm. I didn't notice this exact thing, but I suspect I may have fixed it a few days ago. I was noticing that when clustering it was sometimes deciding to put the phen way out in the N-padding. I'll look into it in more depth, though, since I'm not sure why it would only affect light chain.

Do you have some sequences that it happens on?

psathyrella commented 7 years ago

oh, sorry, a better way to phrase that: could you send me some sequences that it happens on?

krdav commented 7 years ago

Yep.

Here are 3 examples from the same partition:

>102855A284488L2P2H0
CCAGACTCCCTGGCTGTGTCTCTGGGCGAGAGGGCCACCATCAACTGCAAGTCCAGCCAGAGTGTTTTATACAGCTCCAACAATAAGAACTACTTAGCTTGGTACCAGCAGAAACCAGGACAGTCTCCTAAGCTGCTCATTTACTGGGCATCTACCCGGGAATCCGGGGTCCCTGACCGATTCAGTGGCAGCGGGTCTGGGACAGATTTCACTCTCACCATCAGCAGCCTGCAGGCTGAAGATGTGGCAGTTTATTACTGTCAGCAATATTATAGCATTCAGCTCACTTTCGGCGGAGGGACCAAGGTGGAGATTAAGCGAACTGTGGCTGCACCA
>37505A547884L2P2H0
CCAGACTCCCTGGCTGTGTCTCTGGGCGAGAGGGCCACCATCAACTGCAAGTCCAGCCAGAGTATCTTGTACATCTCCAACAATAAGAACTATTTAGCTTGGTACCAGCAGAAACCAGGACAGCCTCCTAAGCTGCTCATTTACCGGGCATCTACCCGGGAATCCGGGGTCCCTGACCGATTCAGTGGCAGCGGGTCTGGGACAGATTTCACTCTCACCATCAGCAGCCTGCAGGCTGAAGATGTGGCAGTTAATTACTGTCAGCAATATTATAGTACTCCATTCACTTTCGGCCCTGGGACCAAAGTGGATATCAAACGTACGGTGGCTGCACCA
>138804A619711L2P2H0
CCAGACTCCCTGGCTGTGTCTCTGGGCGAGAGGGCCACCATCAACTGCAAGTCCAGCCAGAGTTTTTTATACAGCTACAACAACAAGAACTACTTAGCTTGGTACCAGCAGAAACCAGGACAGCCTCCTAAGCTGCTCATTTACTGGGCATCTACCCGGGATTCCGGGGTCCCTGACCGATTCAGTGGCAGCGGGTCTGGGACAGATTTCATTTTCACCATCAGCAGCCTGCAGGCTGAAGATGTGGCAGTTTATTACTGTCAGCAATATTATCGTACTCCTCCGACGTTCGGCCAAGGGACACGACTGGAGATTAAACGAACTGTGGCTGCACCA

They got the annotation of cdr3_length=152 from the --print-cluster-annotations.

psathyrella commented 7 years ago

hm, those all give me a cdr3 length of 33, and sensible-looking annotations. So either it's dependent on the rest of the sample, or it's something to do with the old version. Feel free to send me the rest of the sample, and I'll run on it to check.

krdav commented 7 years ago

I will try the updated version first, also testing the new speed/memory optimizations.

krdav commented 7 years ago

Haven't seen any problems like this since a new version was tried.