nextstrain / ncov

Nextstrain build for novel coronavirus SARS-CoV-2
https://nextstrain.org/ncov
MIT License
1.35k stars 403 forks source link

[BUG] Tree Building Failed #361

Closed TrentBrick closed 4 years ago

TrentBrick commented 4 years ago

Current Behavior

I have run nextstrain build ncov/ with no problems for a few iterations of the GISAID data. Today using the most up to date sequences, the code is failing with the following error:

ERROR: b'/bin/bash: line 1: 18148 Killed                  iqtree -ninit 2 -n 2 -me 0.05 -nt 1 -s results/subsampled_alignment-delim.fasta -m GTR > results/subsampled_alignment-delim.iqtree.log\n'
shell exited 137 when running: iqtree -ninit 2 -n 2 -me 0.05 -nt 1 -s results/subsampled_alignment-delim.fasta -m GTR  > results/subsampled_alignment-delim.iqtree.log

ERROR: TREE BUILDING FAILED
Please see the log file for more details: results/subsampled_alignment-delim.iqtree.log

Building original tree took 537.5226821899414 seconds
[Tue Apr 14 13:20:35 2020]
Error in rule tree:
    jobid: 8
    output: results/tree_raw.nwk
    shell:

        augur tree             --alignment results/subsampled_alignment.fasta             --output results/tree_raw.nwk             --nthreads 1

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /nextstrain/build/.snakemake/log/2020-04-14T124028.174414.snakemake.log

I deleted the repo, re-cloned it and ran nextstrain update to ensure I had the most up to date version of the CLI. Running it all again (including the step to clean the GISAID download with ./scripts/normalize_gisaid_fasta.sh data/gisaid_cov2020_sequences.fasta data/sequences.fasta) I am still getting the same error.

Your environment: if running Nextstrain locally

emmahodcroft commented 4 years ago

Hi Trent, Thanks for reaching out! Hopefully we can help. First, can you check you have IQTree installed? This is needed to run the command. It should be, but sometimes we discover later that it isn't! Just trying to run iqtree on the command-line should give you the answer. If you don't have it, there are a couple different options to install it: http://www.iqtree.org/doc/Quickstart

Alternatively, if you do have it, can you look for an IQTree log file in your results folder? Can you paste any error messages from that into this thread?

TrentBrick commented 4 years ago

Thanks for the speedy response! If I run nextstrain shell then iqtree runs without any problems.

Here is the output of that log file: (I had to delete a lot of the sequences listed in the file to be able to post it here else it was too long).

`IQ-TREE multicore version 1.6.6 for Linux 64-bit built Jul 1 2018 Developed by Bui Quang Minh, Nguyen Lam Tung, Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams.

Host: 1e700267229b (AVX2, FMA3, 1 GB RAM) Command: iqtree -ninit 2 -n 2 -me 0.05 -nt 1 -s results/subsampled_alignment-delim.fasta -m GTR Seed: 797847 (Using SPRNG - Scalable Parallel Random Number Generator) Time: Tue Apr 14 13:11:42 2020 Kernel: AVX+FMA - 1 threads (6 CPU cores detected)

HINT: Use -nt option to specify number of threads because your CPU has 6 cores! HINT: -nt AUTO will automatically determine the best number of threads to use.

Reading alignment file results/subsampled_alignment-delim.fasta ... Fasta format detected Alignment most likely contains DNA/RNA sequences WARNING: 184 sites contain only gaps or ambiguous characters. Alignment has 5833 sequences with 29903 columns, 19672 distinct patterns 1333 parsimony-informative, 2446 singleton sites, 26124 constant sites Gap/Ambiguity Composition p-value 1 Iceland_X_X_263_X_X_2020 0.62% passed 100.00% 2 Canada_X_X_BC_4143868_X_X_2020 1.52% passed 99.89% 3 Canada_X_X_BC_4143842_X_X_2020 1.52% passed 99.89%

For your convenience alignment with unique sequences printed to results/subsampled_alignment-delim.fasta.uniqueseq.phy

Create initial parsimony tree by phylogenetic likelihood library (PLL)... 31.573 seconds

NOTE: Switching to memory saving mode using 1.845 GB (95% of normal mode) NOTE: Use -mem option if you want to restrict RAM usage further NOTE: 1889 MB RAM (1 GB) is required! WARNING: Memory required per CPU-core (1.84545 GB) is higher than your computer RAM per CPU-core (0 GB), thus multiple runs may exceed RAM! Estimate model parameters (epsilon = 0.500)`

emmahodcroft commented 4 years ago

Ahh, yes - IQTree is certainly there, but it looks like it's having a memory error, which would have been my second guess :)

I'm not sure why it's estimating your RAM per CPU at 0GB, this seems odd to me (and probably incorrect!). This may be related to running the CLI - I'm less familiar with this myself, so I'm afraid I can't advise on how best to change the memory settings so that it believes you have more than 0 RAM on your computer!

However, I am tagging @tsibley in hopes that he can advise you on how best to proceed! He's on Seattle time, so it may be a few hours until he can get back to you, but he hopefully can put you right!

Thanks for your patience!

TrentBrick commented 4 years ago

Thanks for the follow up too. If thats the problem I will try restarting my computer to clear the RAM and run it again (was running some other memory heavy things in parallel before). Will report back if that works. It is strange though because like I said I have run the pipeline with no issues twice before this.

Thanks again, Trenton

tsibley commented 4 years ago

Thanks for tagging me in @emmahodcroft. :-)

@TrentBrick From the first logs you posted, it looks like Linux's out-of-memory (OOM) killer in the container is killing the iqtree process when memory is overcommitted:

ERROR: b'/bin/bash: line 1: 18148 Killed iqtree …
shell exited 137 when running: …

When a process is killed by a signal, the shell exits with 128+n where n is the signal number. 137 - 128 = 9, which is SIGKILL (see output of kill -l). This matches the "bash: Killed iqtree" message which is Bash telling you that it noticed iqtree was killed by something.

Can you run nextstrain check-setup and paste the output here? On macOS, Docker runs Linux containers inside a VM. The VM by default only has access to a pretty limited amount of your computer's memory, and thus the containers are restricted by that same limit. (On Linux, Docker doesn't have to use a VM, so by default containers have access to all the memory.) nextstrain check-setup will attempt to diagnose the amount of memory available to the Nextstrain container, so its output will be useful to see.

TrentBrick commented 4 years ago

Thanks for the reply @tsibley here is the output and it looks like you were right:


Testing your setup…

docker is supported

✔ yes: docker is installed ✔ yes: docker run works ⚑ warning: containers have access to >2 GiB of memory

Containers appear to be limited to 1.9 GiB of memory. This may not be enough for some Nextstrain builds. On Windows or a Mac, you can increase the memory available to containers in the Docker preferences. ✔ yes: image is new enough for this CLI version

native is not supported

✘ no: snakemake is installed ✘ no: augur is installed ✘ no: auspice is installed

aws-batch is not supported

✘ no: job description "nextstrain-job" exists ✘ no: job queue "nextstrain-job-queue" exists ✘ no: S3 bucket "nextstrain-jobs" exists

tsibley commented 4 years ago

@TrentBrick Thanks! I expect that increasing the memory available to Docker containers on your computer will let you run the build successfully. I unfortunately can't tell you how much to increase as I'm not sure at the moment what the overall maximum memory consumption of the build is, but some trial and error should it.

I'm going to close this issue, but please feel free to chime back in to let us know if you get the build to work or not. :-)

TrentBrick commented 4 years ago

I set Docker to 7Gb and it worked without any issues. Thanks for the rapid and precise help. I don't know how common this issue is but you may want to flag it in some way for future users?

What was insidious about it was that everything ran well when GISAID had fewer sequences! Either way thanks again.