yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
121 stars 41 forks source link

Web UShER unhandled failure #239

Closed rkdover closed 1 year ago

rkdover commented 2 years ago

Hi! When trying to run a large number of sequences (around 4000) in the web client I encounter an error. The issue seems to have something to do with the number of filtered sequences as there is no error message, instead the body of the page fills with log output.

The log is well formatted except at the very end:

Sequence 96_- has too few bases (0 excluding 29906 Ns at beginning and 0 Ns at end), must have at least 10000); skipping.
oo many N bases (23323 out of 29693 > 0.50); skipping.
Sequence 47_- has too many N bases (27310 out of 29085 > 0.50); skipping.
Sequence 71_- has too many N bases (16925 out of 29267 > 0.50); skipping.
Sequence 80_- has too many N bases (17022 out of 29120 > 0.50); skipping.
Sequence 93_- has too few bases (0 excluding 29903 Ns at beginning and 0 Ns at end), must have at least 10000); skipping.
Sequence 94_- has too few bases (0 excluding 29908 Ns at beginning and 0 Ns at end), must have at least 10000); skipping.
Sequence name '95_-' has already been used; ignoring subsequent usage (29897 bases, 1944 N's, 3 ambiguous).
Sequence 97_- has too few bases (0 excluding 29906 

Here two rows are inexplicably truncated. The dataset contains many low quality sequences and control samples which obviously ought to be filtered out, and this can be done locally before running the web client, but if not done it results in a failure without any sort of clarification as to what went wrong.

rkdover commented 2 years ago

The log is output as seen in #227, with a length of 386 lines.

AngieHinrichs commented 2 years ago

Hi @rkdover - I'm afraid 4000 sequences is too many for the web interface, and causes a timeout which results in truncated output. It's safest to upload a much smaller number like 100 or 200 (and probably worthwhile to filter out sequences that are all or mostly Ns).

If you need to routinely process thousands of sequences, then it might make sense to run usher workflows locally instead of using the web interface.

rkdover commented 2 years ago

Hi, and thank you for your comment @AngieHinrichs. That makes sense, though I have managed to get output for far larger runs! Maybe some error handling might still be useful?

Thank you for the tip about running local workflows as that is what I ended up looking at, though I encountered an issue there as well (#238). I'll try to set up a conda environment and see if I have more luck.