yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
120 stars 40 forks source link

Whole tree mode for Usher #340

Open aviczhl2 opened 1 year ago

aviczhl2 commented 1 year ago

As discussed in (https://github.com/cov-lineages/pango-designation/issues/1940), I suggest "whole tree" mode for those who wish to browse through the whole usher tree (and see its various data arranged together) without uploading sequences (and only see the tree around the 5000 most related seqs).

This may be something incredibly useful, as people can see the structures of trees and hence some ill-defined(filp-flop etc) branches with >5000 seqs can be easily detected. Currently people tend to ignore medium-sized branches unless it starts to attract new sequences, however at that time maybe the scale may be too large to be close to 5000, making them hard to detect under the 5000 upper limit.

For the whole tree mode, if a sub-tree has >5000 seqs, just display the number of seqs is fine, and only when browsing into sub-trees with <5000 seqs one can see every seq displayed, this prevents the server from being overwhelmed, also this doesn't violate the GISAID rule that sequences shall not be publicized in large scales but can be referred at small scales.

corneliusroemer commented 1 year ago

The full Genbank Usher tree can be browsed via @theosanderson's taxonium: https://taxonium.org/?backend=https://api.cov2tree.org

For the GISAID tree you need to ask @AngieHinrichs - it's theoretically available but due to worries about GISAID's TOU it's not shared publicly.

Even if one had access to the full tree via taxonium, I agree that there may be scope for an intermediate mode by Usher. Taxonium is a fine tree viewer, but it lacks many of the features of Auspice that can be very useful when studying trees.