matsengrp / linearham

A Bayesian Phylo-HMM for B cell receptor sequence analysis
http://matsengrp.github.io/linearham
6 stars 4 forks source link

Inferred ancestral sequences have mutations in ambiguous regions #89

Open psathyrella opened 2 years ago

psathyrella commented 2 years ago

If I pass in seqs that all have Ns for, say, the first 50 bases of V (because that area wasn't sequenced), the inferred ancestral sequences that linearham gives me have mutations within this region. Which maybe is ok, but also seems kind of weird? Like on the one hand, in reality the real biological sequences probably had some mutation there, and since we know what the naive bases are there, it makes sense to have them in the naive sequence -- and then maybe also the inferred intermediates? But otoh it's confusing to see mutations listed that correspond to an unsampled region of the sequences, and that, well, are really just made-up mutations that have no relation to the data.

top: data bottom: inferred ancestors p