jeromekelleher / sc2ts

Infer a succinct tree sequence from SARS-COV-2 variation data
MIT License
4 stars 3 forks source link

is_arg_hmm_path_length_consistent too stringent? #128

Open hyanwong opened 1 year ago

hyanwong commented 1 year ago

We might want to list all the breakpoints in the ARG and exclude those which were only reported in the HMM metadata. At the moment I don't think there is any way to do this from the export_recombinant_breakpoints function. The nearest is is_arg_hmm_path_length_consistent, but removing breakpoints with that set to false will also remove breakpoints on which the ARG and the HMM agree, but which are contained in a recombination node with e.g. different numbers of parents in the ARG and the HMM.

We really need a per-breakpoint measure for this, not a per recombination-node measure, I guess. It's unlikely to be many breakpoints, but I have observed some like this.