Open ktmeaton opened 2 years ago
Maybe the lineage assignment was done using a previous = slightly different tree? @AngieHinrichs should know, I'm just guessing
Thanks for the idea, that makes sense so I'll try to reproduce this analysis.
Interestingly, I get the same results using the public tree and the web browser (https://genome.ucsc.edu/cgi-bin/hgPhyloPlace).
Identical node placement (link):
But different lineage assignments:
The "true" assignment should be the "misc" lineage based on the GISAID tree. So the sample with lower genomic quality is incorrect. But I'm still curious why their assignments differ.
That is really strange @ktmeaton! Was the auspice JSON for your first image generated using matUtils? After using usher to add the Canadian sequences to the public tree?
In your hgPhyloPlace view, I notice that the placement of the two sequences splits the branch from XM to miscBA1BA2Post17k in the public tree. (In your first image, there is one more "miscBA1BA2Post17k" that looks a little out of place but I can't tell what sequence that is.) usher would have had to place one sequence first (splitting the branch), then the other (adjacent to the first sequence), and it's possible that somehow that would cause a difference in how their nearest-neighbor-for-purpose-of-guessing-lineage would be found. I will have to look at the code to figure out what's really going on there.
But if you're using hgPhyloPlace, since those two samples are already in the non-public tree, there's a kind of roundabout way to check their assignment in the non-public tree. I pasted in the name of a nearby sequence (Denmark/DCGC-474438/2022) to get the branch with those two Canadian sequences (without the annoying "uploaded sample" labels for all attributes, sorry about those): https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/usher-236.json?c=pango_lineage_usher&label=nuc%20mutations:T19955C,G20055A If you zoom in to the branch with the red sequence, you can see that the two Canadian sequences are solidly part of the miscBA1BA2Post17k branch of the non-public tree:
That's not a very satisfying answer to give, sorry about that. I can share the full tree privately with registered GISAID users if you would like to try that instead of the public tree -- if so, email angie at soe dot ucsc dot edu.
That is really strange @ktmeaton! Was the auspice JSON for your first image generated using matUtils? After using usher to add the Canadian sequences to the public tree?
Yup, that's exactly what I did! Nextclade align ->faToVcf -> UShER -> matUtils extract
(In your first image, there is one more "miscBA1BA2Post17k" that looks a little out of place but I can't tell what sequence that is.)
Those two aren't public yet, but are also Canadian sequences with odd lineage assignments in this junction.
But if you're using hgPhyloPlace, since those two samples are already in the non-public tree, there's a kind of roundabout way to check their assignment in the non-public tree.
I've been doing that sometimes to check assignments, I'm glad to hear that's a good approach!
Is it fair to summarize this issue as:
The placement is at the junction of XM and miscBA1BA2Post17k (which I think implies some uncertainty). By what mechanism would the lineage assignments be different in this case? And any suggestions on how to detect/quantify uncertainty in this case (perhaps by the number of placements)?
Thanks!