Open hyanwong opened 1 year ago
My suspicion is that the HMM info metadata does not reporting the matching pattern which ends up in the final tree sequence? Is this because of post-processing of the ts after the HMM pass?
If that's true, and the post-processing can remove breakpoints, then we shouldn't report information about MRCAs on breakpoints mentioned in the node metadata but subsequently removed from the tree sequence.
I've just checked, and about 11% of the breakpoints reported in the metadata for the forward HMM don't have parents which agree with the parents of that node in that position in the tree sequence. https://github.com/jeromekelleher/sc2ts/pull/123 is a PR which adds this info to the breakpoints report
Riiiight. These are all 3-or-more break recombinants where the likelihood of 3 breaks and x mutations is very close to the likely hood of 2-breaks and x+3 mutations. Looks like the HMM params might have been very slightly different between the tree-building and the subsequent analysis to add the HMM metadata.
Our lives really would be much simpler if we could restrict to 1-break HMM consistent recombinants (for now)...
In the long ARG (
upgma-mds-1000-md-30-mm-3-2022-06-30-recinfo-il.ts.tsz
):It seems to be wrongly reporting a node 560878 as one of the parents (and only showing 2 parents - I think 427216 and 560878, when the forward HMM pass actually finds 3: 427216, 582220, and 564823)