psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
54 stars 34 forks source link

Add option to force a stop-codon-less naive sequence #238

Closed krdav closed 5 years ago

krdav commented 7 years ago

With a single input sequence it is often possible to get back a naive sequence containing a stop codon e.g. for this input: CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTGGTGCCTACTACTGGACCTGGCTCCGCCAGCACCCAGGGAAGGGCCTGGAGTGGATTGGGTACATCTATTACAGTGGCAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTTACCAAATCAGTCGACACGTCTAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGCGACTGCCGCGGACACGGCCGTGTCGTACTGTGCGAGAGATCGGGAGCAGCTGGGCCGCTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCA

A useful addition to partis is a way to condition the inferred naive sequence on not having a stop codon i.e. the MLE naive sequence, given no stop codons.

psathyrella commented 5 years ago

I'm going to declare that actually implementing an option to restrict to productive naive sequences at the bcrham level would be prohibitively difficult. And, more to the point, probably not necessary, since now the --calculate-alternative-annotations option and view-alternative-annotations action let you easily view all inferred naive sequences that it views as at all likely (with a hueristic measure of their likelihood). So you should be able to get a largely equivalent effect by just choosing the productive naive sequences from this list. There will probably be cases where this will get different answers than if we could teach bcrham about codons, but for the most part it should do ok.