Closed babarlelephant closed 3 years ago
To Nextstrain people: see on live tree here, at top of this part (can't zoom further in URL): https://nextstrain.org/ncov?branchLabel=clade&c=gt-nuc_15324&label=clade:A2a
Now solved. Someone did anything special ?
For me it is easy to notice those kind of bugs because I'm copying then modifying the tree to make some more epidemiological clades (every country I mention have > 1000 deaths, not all the clades are perfectly reliable so I need to check if the new sequences fit with it or not)
I didn't end up doing anything to fix this, but I'm glad it got sorted
Thanks for pointing this out @acx01b . I don't know what caused this, and it apparently fixed itself during one of our builds. If you spot it again please do let us know and we can take another look!
@lmoncla The bug is back :-) at the same place
It doesn't seem to appear elsewhere in the tree at least not in a such obvious form.
No idea if it is related but I found that every sequence with travel history, and only them, don't have any "branch_attrs" field in the JSON file.
There are now two appeareances of the bug, both in the European lineage @jameshadfield
https://nextstrain.org/ncov?c=gt-nuc_15324&label=clade:A2a https://nextstrain.org/ncov?c=gt-nuc_20268&label=clade:A2a
We should inspect the IQTREE tree (i.e. before refine
) to see if this structure is there. I suspect it is, but I don't have an obvious reason why they're not monophyletic.
No idea if it is related but I found that every sequence with travel history, and only them, don't have any "branch_attrs" field in the JSON file.
Unrelated -- but this should also be improved. For travel history isolates we split the branch in 2 to improve the DTA but we can't duplicate the mutations etc (branch_attrs) without causing double counting in the entropy panel.
There is clearly a problem see https://nextstrain.org/ncov/?c=gt-nuc_26144,14805,17247&m=div where the iranian cluster became a subtree of a lineage having 3 unrelated mutations, I'm quite sure the divergence tree would be cleaner with a basic minimum spanning tree algorithm
At least in my basic script the bug seems gone, probably related with that many identical sequences (from australia,uk,usa,iceland,netherlands..) have been removed.
Ok, if it pops up again, please do shout and I'll tag someone else in who might have a better idea. But I don't want to call him in if we don't have an example :) As a tip, you can link to a dated tree by putting the date: https://nextstrain.org/ncov/global/2020-04-08 (or sub 'global' with one of the regions) which means that it'll hopefully persist even after the current live build has moved on. However, we do update multiple times a day, so this works best at the end of the US day...
Tks there is one https://nextstrain.org/ncov/2020-04-02?c=gt-nuc_15324,20268&label=clade:A2a&m=div half-resolved two days later https://nextstrain.org/ncov/2020-04-04?c=gt-nuc_15324,20268&label=clade:A2a&m=div but the worst one (for one day the Iranian cluster became a subtree of a seemingly unrelated one) is not archived.
Thank you for the great examples! Yes, unfortunately if a build is updated multiple times in the day, the earlier ones aren't archived.
@rneher Would you have insight into why this is happening? Is this a known problem, or something new we should make an issue about? Thanks!
Closing this issue as no longer relevant.
Hi, in the european lineage today there are two edges with the same mutation and very similar sequences inside http://www.paris8.free.fr/bug_nextstrain.png
https://nextstrain.org/ncov?c=gt-nuc_15324&label=clade:A2a @lmoncla
2 April at 3h GMT+1 there wasn't such a problem, at 14h the problem was there so I think it appeared in thisone