mikolmogorov / Flye

De novo assembler for single molecule sequencing reads using repeat graphs
Other
769 stars 166 forks source link

Polishing fails on ppc64le #15

Open jonhultqvist opened 7 years ago

jonhultqvist commented 7 years ago

Hi,

Thanks for developing this great software!

I have encountered an error in the polishing step when running Abruijn on a mixed/metagenomic 1D Nanopore-dataset (many organisms with varying coverage). I have assembled similar data sets before without large issues. Oddly enough Abruijn assembles the data and manages to polish a first iteration, but it then fails in the second iteration with the following error message. The requistite files appears to be present (i e bubbles_2.fasta). Not sure what is going on here. If you have suggestion to what has gone wrong and how I could avoid this happening in the future it would be great!

[13:34:45] INFO: Polishing genome (1/2) [13:34:50] INFO: Running BLASR [14:37:51] INFO: Separating draft genome into bubbles [16:38:52] INFO: Correcting bubbles 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [15:25:21] INFO: Polishing genome (2/2) [15:25:30] INFO: Running BLASR [16:27:39] INFO: Separating draft genome into bubbles [19:09:34] INFO: Correcting bubbles 0% [19:48:05] ERROR: Error: Error while running polish binary: Command '['abruijn-polish', '-t', '16', '/scratch2/jon/MinION/BMAN/assemblies/abruijn/BMAN_Abruijn/bubbles_2.fasta', '/scratch2/software/Python-2.7.13/lib/python2.7/site-packages/abruijn/resource/nano_substitutions.mat', '/scratch2/software/Python-2.7.13/lib/python2.7/site-packages/abruijn/resource/nano_homopolymers.mat', '/scratch2/jon/MinION/BMAN/assemblies/abruijn/BMAN_Abruijn/consensus_2.fasta']' returned non-zero exit status -11

mikolmogorov commented 7 years ago

Hi,

Thank you for the feedback! Could you send me "abruijn.log" file at fenderglass@gmail.com, so I can take a look on it?

Mikhail

diriano commented 5 years ago

Hi, I am having the same problem, see error:

[2019-04-08 15:34:37] root: INFO: >>>STAGE: polishing [2019-04-08 15:34:37] root: INFO: Polishing genome (1/3) [2019-04-08 15:34:37] root: INFO: Running minimap2 [2019-04-08 16:38:51] root: INFO: Separating alignment into bubbles [2019-04-08 17:13:47] root: DEBUG: Generated 15651910 bubbles [2019-04-08 17:13:47] root: DEBUG: Split 23723 long bubbles [2019-04-08 17:13:47] root: DEBUG: Skipped 2464 empty bubbles [2019-04-08 17:13:47] root: DEBUG: Skipped 199 bubbles with long branches [2019-04-08 17:13:47] root: INFO: Alignment error rate: 0.177628474767 [2019-04-08 17:13:47] root: INFO: Correcting bubbles [2019-04-08 18:33:37] root: ERROR: Command '['flye-polish', '-t', '100', '/strg1/groups/GENCLIMA/Vellozia/FLYE/Vintermedia/FirstTryVintermedia/40-polishing/bubbles_1.fasta', '/data1/bioinfo/anaconda2-5.3.0/envs/flye/lib/python2.7/site-packages/flye/config/bin_cfg/pacbio_substitutions.mat', '/data1/bioinfo/anaconda2-5.3.0/envs/flye/lib/python2.7/site-packages/flye/config/bin_cfg/pacbio_homopolymers.mat', '/strg1/groups/GENCLIMA/Vellozia/FLYE/Vintermedia/FirstTryVintermedia/40-polishing/consensus_1.fasta']' returned non-zero exit status -6

This is from two plant genomes of around 500Mbp, sequenced with PacBio Sequel.

Any hint will be very much appreciated . Thanks, Diego

mikolmogorov commented 5 years ago

Hi,

Looks like some kind of a corner case resulted into an error during polishing. Do you think you can send me the file with bubbles (40-polishing/bubbles_1.fasta') so I can reproduce the problem? Feel free to write me at fenderglass@gmail.com.

diriano commented 5 years ago

Thanks @fenderglass, the file is 19G, I will make it available and send a link to you. I have two different species, both gave the same error. I will share only one of them at this time.

mikolmogorov commented 5 years ago

Thank you! Strangely, I was able to process this file without any issues on my machine.

I suspect that it might be an issue with the threads - some servers don't like when a process is using too many. I suggest to try to restart the polishing stage (add --resume-from polishing to your command line) with less threads (say, 30).

Could you also send me the full flye.log file of the failed run in the mean time?

mikolmogorov commented 5 years ago

Closing due to inactivity - feel free to reopen if the issue remains.

matthewstuartedwards commented 5 years ago

I got a similar error in the most recent version, but it also shows "Wrong homopolymer".

[2019-08-18 14:28:32] INFO: Starting Flye 2.5-g315122d [2019-08-18 14:28:32] INFO: Resuming previous run [2019-08-18 14:28:32] INFO: >>>STAGE: polishing [2019-08-18 14:28:32] INFO: Polishing genome (1/1) [2019-08-18 14:28:32] INFO: Running minimap2 [2019-08-18 15:39:34] INFO: Separating alignment into bubbles [2019-08-18 16:43:50] INFO: Alignment error rate: 0.185754577083 [2019-08-18 16:43:50] INFO: Correcting bubbles terminate called after throwing an instance of 'std::runtime_error' what(): Wrong homopolymer [2019-08-18 16:43:50] ERROR: Command '['flye-polish', '--bubbles', '/work/matthew/flye/40-polishing/bubbles_1.fasta', '--subs-mat', '/opt/Flye/flye/config/bin_cfg/nano_r94_substitutions.mat', '--hopo-mat', '/opt/Flye/flye/config/bin_cfg/nano_r94_homopolymers.mat', '--out', '/work/matthew/flye/40-polishing/consensus_1.fasta', '--threads', '128']' returned non-zero exit status -6

I was running with 170 threads. Re-ran with 128 and with the suggested 30 threads but still was unable to complete polishing. Trying it with no threads specified now.

mikolmogorov commented 5 years ago

Hi,

There are two possible reasons for this error: either (i) a config file "/opt/Flye/flye/config/bin_cfg/nano_r94_homopolymers.mat" is corrupted or (ii) polisher code is hitting some kind of edge case. I think the issue with config is more likely because the polisher crashed at the very beginning.

What OS are you using? Could you try to reinstall Flye or use bioconda release? You will be able to do --resume anyway. If it does not help, could you send me the first 1000 lines of this file: "flye_out_dir/40-polishing/bubbles_1.fasta"?

Best, Mikhail

matthewstuartedwards commented 5 years ago

I'm running Ubuntu 18.04.3 LTS on a Power9 box. I am not able to install from bioconda because there is no ppc64le release available. I was able to compile it myself with minor difficulty once I got minimap2 going. Also this is a pretty new install and has only been run on this data so I'm not sure if a reinstall will help. I'll give it a shot when I have a bit of time.

Here's the first 1000 lines of the 40-polishing/bubbles_1.fasta.

mikolmogorov commented 5 years ago

Hmm, this file looks ok, with the exception that there are Windows-style line breaks (e.g. "\r\n" instead of "\n"). Did you modify this file on a Windows machine? Otherwise, this might be an issue, if the files are formatted like that on the Ubuntu system.

matthewstuartedwards commented 5 years ago

No Windows machine has touched any of these files. I think the \r\n is coming from PasteBin. I checked the file on my computer and it only has \n.

I guess I'll get around to trying a reinstall and see if that works.

mikolmogorov commented 5 years ago

Just as a sanity quick check, you can try to run polisher in a standalone mode using --polish-target. Try it on an any other dataset (for which you have a sequence to "polish"), this way you will see if the problem is not specific to your original run.

matthewstuartedwards commented 5 years ago

I haven't had a chance to run on any other datasets yet, but I ran Flye on an intel machine using the same datasets and parameters. The intel machine never got this error. So it seems to be a ppc64le architecture issue.

mikolmogorov commented 5 years ago

Good to know, thanks! Closing the issue then - don't have access to any ppc machines.

ruzhuchen commented 4 years ago

[For ppc64le platform] The issue can be solved by adding "-fsigned-char" compiler option to the Makefile. For lib/minimap2, please add CPPFLAGS: -DHAVE_KALLOC -DNO_WARN_X86_INTRINSICS -DSSE2.

matthewstuartedwards commented 4 years ago

[For ppc64le platform] The issue can be solved by adding "-fsigned-char" compiler option to the Makefile. For lib/minimap2, please add CPPFLAGS: -DHAVE_KALLOC -DNO_WARN_X86_INTRINSICS -DSSE2.

Thanks Ruzhu! I was able to successfully compile everything with these instructions, and the Wrong Homopolymer error has disappeared.

matthewstuartedwards commented 4 years ago

I've forked the repository and made changes specific to compiling for ppcle64. The repository is located at https://github.com/zovoilis-lab/Flye_ppcle64.

I'm not very good at working with Make, but it should be pretty easy to integrate these changes into the makefile of the original repository by checking the architecture. The only command that needs to be entered before compiling is an export line making sure specific IBM gcc is on the path. This information is listed in the INSTALL.md of the forked repository.

mikolmogorov commented 4 years ago

Thanks - I will take a look. I don't have access to any ppc machines for tests though.