nextstrain / ncov

Nextstrain build for novel coronavirus SARS-CoV-2
https://nextstrain.org/ncov
MIT License
1.35k stars 403 forks source link

[BUG] tree missing some obvious merges #330

Closed babarlelephant closed 3 years ago

babarlelephant commented 4 years ago

Hi, in the european lineage today there are two edges with the same mutation and very similar sequences inside http://www.paris8.free.fr/bug_nextstrain.png

https://nextstrain.org/ncov?c=gt-nuc_15324&label=clade:A2a @lmoncla

2 April at 3h GMT+1 there wasn't such a problem, at 14h the problem was there so I think it appeared in thisone

emmahodcroft commented 4 years ago

To Nextstrain people: see on live tree here, at top of this part (can't zoom further in URL): https://nextstrain.org/ncov?branchLabel=clade&c=gt-nuc_15324&label=clade:A2a

babarlelephant commented 4 years ago

Now solved. Someone did anything special ?

For me it is easy to notice those kind of bugs because I'm copying then modifying the tree to make some more epidemiological clades (every country I mention have > 1000 deaths, not all the clades are perfectly reliable so I need to check if the new sequences fit with it or not)

lmoncla commented 4 years ago

I didn't end up doing anything to fix this, but I'm glad it got sorted

emmahodcroft commented 4 years ago

Thanks for pointing this out @acx01b . I don't know what caused this, and it apparently fixed itself during one of our builds. If you spot it again please do let us know and we can take another look!

babarlelephant commented 4 years ago

@lmoncla The bug is back :-) at the same place

It doesn't seem to appear elsewhere in the tree at least not in a such obvious form.

No idea if it is related but I found that every sequence with travel history, and only them, don't have any "branch_attrs" field in the JSON file.

babarlelephant commented 4 years ago

There are now two appeareances of the bug, both in the European lineage @jameshadfield

https://nextstrain.org/ncov?c=gt-nuc_15324&label=clade:A2a https://nextstrain.org/ncov?c=gt-nuc_20268&label=clade:A2a

jameshadfield commented 4 years ago

We should inspect the IQTREE tree (i.e. before refine) to see if this structure is there. I suspect it is, but I don't have an obvious reason why they're not monophyletic.

No idea if it is related but I found that every sequence with travel history, and only them, don't have any "branch_attrs" field in the JSON file.

Unrelated -- but this should also be improved. For travel history isolates we split the branch in 2 to improve the DTA but we can't duplicate the mutations etc (branch_attrs) without causing double counting in the entropy panel.

babarlelephant commented 4 years ago

There is clearly a problem see https://nextstrain.org/ncov/?c=gt-nuc_26144,14805,17247&m=div where the iranian cluster became a subtree of a lineage having 3 unrelated mutations, I'm quite sure the divergence tree would be cleaner with a basic minimum spanning tree algorithm

babarlelephant commented 4 years ago

At least in my basic script the bug seems gone, probably related with that many identical sequences (from australia,uk,usa,iceland,netherlands..) have been removed.

emmahodcroft commented 4 years ago

Ok, if it pops up again, please do shout and I'll tag someone else in who might have a better idea. But I don't want to call him in if we don't have an example :) As a tip, you can link to a dated tree by putting the date: https://nextstrain.org/ncov/global/2020-04-08 (or sub 'global' with one of the regions) which means that it'll hopefully persist even after the current live build has moved on. However, we do update multiple times a day, so this works best at the end of the US day...

babarlelephant commented 4 years ago

Tks there is one https://nextstrain.org/ncov/2020-04-02?c=gt-nuc_15324,20268&label=clade:A2a&m=div half-resolved two days later https://nextstrain.org/ncov/2020-04-04?c=gt-nuc_15324,20268&label=clade:A2a&m=div but the worst one (for one day the Iranian cluster became a subtree of a seemingly unrelated one) is not archived.

emmahodcroft commented 4 years ago

Thank you for the great examples! Yes, unfortunately if a build is updated multiple times in the day, the earlier ones aren't archived.

@rneher Would you have insight into why this is happening? Is this a known problem, or something new we should make an issue about? Thanks!

trvrb commented 3 years ago

Closing this issue as no longer relevant.