yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
121 stars 41 forks source link

ENH: Speed up Usher webversion #263

Closed corneliusroemer closed 1 year ago

corneliusroemer commented 1 year ago

Usher is such an amazing tool. Unfortunately, the run time of analyses using the phyloplace website are quite long. So long that I don't use Usher as much as I should.

To place say 50 sequences with 1000 context it does run for ~5min or more.

Does the number of context samples make it run much longer? Are there ways I can get it to be faster?

I'm curious what is the bottleneck for the analyses. I should probably try to run Usher locally to see whether this is faster and feasible for my use case.

If tree size is a problem: I'm only really interested in BA.2/4/5/.75 right now - have you considered making a version that ignores old stuff that's no longer relevant?

I do have the feeling that runtimes used to be quite a bit faster in the past.

yatisht commented 1 year ago

Thanks for the feedback, @corneliusroemer.

@AngieHinrichs are we using usher-sampled-server for the web version now? That should significantly speed up the tool by preloading the MAT and placing multiple samples in parallel.

corneliusroemer commented 1 year ago

It's a pity that Usher is so slow even if it just needs to extract a subtree - like when I upload two EPI_ISLs that should already be on the main tree.

Naively, I'd expect that all that's required is: a) find where that sequence is on the tree (shouldn't be too hard with lookup in an index) and b) preparing some stats and nextstrain output.

A fast version of Usher could work on a prebuilt tree only, not doing any placement. Basically just querying. That way you could prepare a lot of the steps. When a request comes in, all you have to do is do a lookup and output a simple Nextrain JSON.

That would make Usher so much more useful. Maybe I'm just particularly impatient compared to average users, but waiting 5 minutes to see where a single pre-existing sequence in the tree lies hurts 😬

Getting from here:

image

to here should take seconds not minutes:

image
AngieHinrichs commented 1 year ago

Since both fasta sequence placement and name lookup have been sped up, can we close this now? 😄

russcd commented 1 year ago

Yup. This is resolved.