rrwick / Unicycler

hybrid assembly pipeline for bacterial genomes
GNU General Public License v3.0
535 stars 132 forks source link

Error during "Aligning reads" #140

Open nbargues opened 5 years ago

nbargues commented 5 years ago

Hi,

I try to run unicycler for a hybrid assembly and when I arrive to the step "Aligning read" after the step "SPAdes assemblies", the run is stuck to 100% :

Aligning reads (2018-09-06 18:17:33) 1,724,203 / 1,724,205 (100.0%)

The log don't show any error so do you have a clue with the problem?

Regards

nbargues commented 5 years ago

here is the end of the unicycler.log:

Determining low score threshold (2018-09-06 18:16:02)

Before conducting semi-global alignment of the long reads to the assembly graph, Unicycler must determine a minimum alignment score threshold such that nonsense alignments are excluded. To choose a thres

hold automatically, it examines alignments between random sequences and selects a score a few standard deviations above the mean.

Automatically choosing a threshold using random alignment scores.

Random alignment mean score: 61.66 standard deviation: 1.31 Low score threshold: 61.66 + (7 x 1.31) = 70.86

Aligning reads (2018-09-06 18:17:33)

jrober84 commented 5 years ago

I have the exact same issue but previously my datasets assembled just fine with v. 3.0b, 4.1 and 4.4. However, in 4.6 the assembly never finishes and gets stuck at the aligning reads stage. I am using bioconda for each version on Centos 7.3.

PertuyF commented 5 years ago

@rrwick since version 0.4.6 unicycler available at Bioconda is built using the new anaconda compilers (gcc-7, as part of an ongoing change in conda-forge/Bioconda). A few adjustments were required to compile (as experimentally using seqan 2.4 instead of 2.3), so I hope this is not a reason for this issue.

And we're having troubles with the latest PR (not merged yet) trying to update to 0.4.7.

It would be very nice if you had time to give some advise to make sure things are done right.

PertuyF commented 5 years ago

Unicycler 0.4.7 should now be available from Bioconda. Could you try again @jrober84?

RaverJay commented 5 years ago

Just ran into the same problem. Running v 0.4.7 from bioconda.

Determining low score threshold (2018-10-11 17:25:20)
    Before conducting semi-global alignment of the long reads to the assembly graph, Unicycler
must determine a minimum alignment score threshold such that nonsense alignments are excluded. To
choose a threshold automatically, it examines alignments between random sequences and selects a
score a few standard deviations above the mean.

Automatically choosing a threshold using random alignment scores.

Random alignment mean score: 61.66
         standard deviation:  1.31
        Low score threshold: 61.66 + (7 x 1.31) = 70.86

Aligning reads (2018-10-11 17:25:33)
181,983 / 181,984 (100.0%)

Two threads are still running at 100% cpu core usage - but nothing is progressing.

RaverJay commented 5 years ago

Running the same setup still, for a very similar sample (different barcode from the same nanopore run - and matching illumina data) the problem did not occur and unicycler is currently in the pilon polishing step. Very strange.

RaverJay commented 5 years ago

Reran the failed sample, "Aligning reads" went though without problems.

Guess the fix is: Try until it works ¯\_(ツ)_/¯

PertuyF commented 5 years ago

@RaverJay I know two people that ran into the same problem with 0.4.7 from Bioconda. An new PR is ongoing to tackle this issue.

PertuyF commented 5 years ago

@RaverJay new PR abandoned as it solved nothing. Current Bioconda version is indeed working in my hands for hybrid assemblies, although with some delay during "aligning reads". Trying to investigate why. Where you using a large amount of short reads when you encountered this issue?

RaverJay commented 5 years ago

@PertuyF 16,633,798 short reads were used. Shouldn't really matter though, because they get assembled to contigs as the first step anyway?

tpshea2 commented 5 years ago

Hello- In the event it helps with troubleshooting I can offer a few examples where I ran into this same problem:

Aligning reads (2018-10-29 21:18:12) 20,373 / 20,374 (100.0%)
Aligning reads (2018-10-29 19:54:43) 11,158 / 11,159 (100.0%)
Aligning reads (2018-10-29 21:15:17) 20,847 / 20,848 (100.0%)
Aligning reads (2018-10-29 21:07:57) 19,712 / 19,714 (100.0%)

Unicycler v0.4.6 was used.

All 4 assemblies are hybrid assemblies (Illumina paired reads + Oxford Nanopore reads from rapid kit) coming from ~5.5 Mb bacterial genome which has 5 plasmids (range in size from 3.5 kb up to ~180 kb). The ONT data differ in how the reads were downsample from original data set using filtlong. In all 4 cases about 300Mb of total bases went into the assembly.

Other data sets (downsample data from same genome) and about the same number of total ONT reads /bases get past this step and Unicycler completes. So there does not seem to be a predictable pattern on why some assemblies get stuck here while other can complete.

Here is a "top" while 3 of the assemblies were still running but stuck:

Tasks: 878 total, 1 running, 877 sleeping, 0 stopped, 0 zombie Cpu(s): 10.0%us, 0.0%sy, 0.0%ni, 89.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 529263848k total, 33093456k used, 496170392k free, 1035712k buffers Swap: 4194300k total, 66880k used, 4127420k free, 17406840k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16299 tshea 20 0 9404m 3.6g 5972 S 200.8 0.7 1677:11 unicycler
16020 tshea 20 0 9223m 3.5g 5972 S 101.0 0.7 1066:00 unicycler
16571 tshea 20 0 9266m 3.5g 5924 S 100.6 0.7 1161:45 unicycler

Note: Using the same input data and all the same depencies I tested one of the above stuck assemblies using Unicycler version 0.4.4 and it got through this read alignment step without getting stuck.

mradz19 commented 5 years ago

Having the same problem. Installed the latest version of unicycler from bioconda and am running it in an environment. Runs great until the aligning reads step reaches 100%, then it just stalls indefinitely. Any fix for this yet?

apeltzer commented 5 years ago

Same here with some tests ....

apeltzer commented 5 years ago

So apparently the 0.4.6 release added something to the recipe on bioconda:

# TODO: remove when seqan is updated upstream
rm -rf unicycler/include/seqan
mv seqan/include/seqan unicycler/include/seqan

The Unicycler repository shows an unknown seqan version to be used when compiling unicycler that is "2 years old" (according to GitHub), whereas the bioconda recipe tries using the version available on bioconda if I'm not mistaken here.

https://github.com/bioconda/bioconda-recipes/blob/2d0477865d97d58decf572c126443ce7ab941fc2/recipes/unicycler/build.sh#L12

Might this be the reason for the weird stalling errors we see here?

apeltzer commented 5 years ago

The GitHub commit history says this change was done on the 0.4.6 release of Unicycler, so 0.4.6 and 0.4.7 are broken (which is exactly what people are currently experiencing / seeing as the former version was used beforehand).

PertuyF commented 5 years ago

@apeltzer Answer is we don't know what is causing this issue. Both Conda-forge and Bioconda migrated to using GCC7+ last year, and Unicycler is currently using an old version of seqan that does not support this version of GCC. Seqan 2.4.0 was substituted to Unicycler's 2.3.0 as documented in the discussion of PR https://github.com/bioconda/bioconda-recipes/pull/10510.

This migration occurred around release of 0.4.6, and so did this issue. I actually attempted to use another version of Seqan (2.3.2), which is a fix on 2.3 to be GCC7-compatible, to solve this, without success, as documented in PR https://github.com/bioconda/bioconda-recipes/pull/11288. As locally compiled Unicycler 0.4.7 does not appears to have issue, there is a strong assumption that the Bioconda fix is involved. But it is a compromise: either this fix, or no Unicyler.

The one true solution would be an upstream fix.

apeltzer commented 5 years ago

The one true solution would be an upstream fix.

I agree on that one, but what can I do to get this fixed?

PertuyF commented 5 years ago

You can try to discuss this with @rrwick .

If this is a blocking issue, you can work with 0.4.4 from bioconda which does not display the issue. Else you can build it from source if you don't care about the reproducibility of a bioconda package.

PertuyF commented 5 years ago

Looks like @apeltzer was well inspired to test a rebuild, this issue may be solved with current PR. Could anyone help testing package indicated here: https://github.com/bioconda/bioconda-recipes/pull/15605#issuecomment-496219028. Just follow the instructions to install, then test it on some dataset that exhibited the issue and report back here, or directly in the PR.

Having multiple people testing this would allow us to make sure the issue is gone. @RaverJay @mradz19 @tpshea2 @sovp @lowandrew @caspargross @soda460 @erthrall

samlipworth commented 5 years ago

Had this same issue - worked fine when compiled from source

sovp commented 4 years ago

It did not solve the stall problem for me. I built a fresh conda environment with Unicycler 0.4.7 but the stalling still occurs frequently.

apeltzer commented 4 years ago

Same here. - only the 0.4.4 release works fine :-(

xlinxlin commented 4 years ago

Hi, I want to report the same issue (Unicycler 0.4.7, installation from Anaconda), works fine with 0.4.4.

tillea commented 4 years ago

I confirm that the issue also exists for the Debian package (which is definitely a different build than on conda - so would qualify as "another build from source").

PertuyF commented 4 years ago

Thanks for reporting @tillea ! Can you give more information on the package (version, repo) for the record?

tillea commented 4 years ago

On Mon, Nov 11, 2019 at 03:14:53AM -0800, Fabien Pertuy wrote:

Can you give more information on the package (version, repo) for the record?

Here you can see the information about the package in general: https://tracker.debian.org/pkg/unicycler Here you can find the build log for the package (where you can also see the version of libseqan2-dev that was used for the build: https://buildd.debian.org/status/fetch.php?pkg=unicycler&arch=amd64&ver=0.4.7%2Bdfsg-2&stamp=1540747392&raw=0 Hope this helps, Andreas.