psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
55 stars 34 forks source link

Improve simulation of very highly mutated sequences #167

Closed psathyrella closed 3 years ago

psathyrella commented 8 years ago

It is not trivial to correct for "multiple hits", i.e. convert from observed base changes to branch lengths, because mutations are super concentrated in a relatively small fraction of the positions. This has the result that the simulation doesn't accurately reproduce really high (30-35%) mutation levels (it shifts them to somewhat lower levels).