psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
54 stars 34 forks source link

Allele finding and primer bias #235

Open krdav opened 7 years ago

krdav commented 7 years ago

The problem is that it is common practice to use primers (especially in the FR1 region) that are deliberately changing a couple of nucleotides. This might be for several reasons e.g. a) to only use a sparse set of generate primers, b) to introduce a restriction site for future subcloning etc.

The situation could look like this: primer_introduced_shm

The introduction of this change will look as if it was caused by SHM and will be found in the majority (like 99%) of all the reads. Therefore it could easily be confused to be a new allele. Adding to the problem; when running data from other labs it might not be clear switch primers they have used. Additions to the code should be added to deal with this problem: 1) Switch to disable allele finding on the first (and/or last) X nucleotides. 2) Option to revert a primer introduced change (otherwise detected as a new allele) back to its germline identity. 3) Primer finding and warning by searching for synonymous mutations in the first and last 30 nt. of a read. If non are observed it is probably because a primer have reverted these.

psathyrella commented 7 years ago

dear self: add option to override automatic 5' and 3' allelefinder exclusion zones.