marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
655 stars 179 forks source link

Mhap precompute jobs failed #1925

Closed NAQS-DRW closed 3 years ago

NAQS-DRW commented 3 years ago

Hi, I'm new to Linux and Canu. I have been reading issues #1336 #1108 #909, #901 as I am having the same problem. My system is Ubuntu 18. I installed canu using $conda install -c conda-forge -c bioconda -c defaults canu The version is canu branch HEAD +0 changes (r10117 5638f7d9a5379373310ab62c28aa0cdbd864722d)

conda1.log

Then $conda create --name canu python=3

conda2.log

$conda activate canu

$(canu) shaun@shaun-HP-Z6-G4-Workstation:~$ conda install -c conda-forge -c bioconda -c defaults canu

conda3.log

canu appears to be working:

conda4.log

I am keeping the command basic at the moment while I trouble shoot these issues: (canu) shaun@shaun-HP-Z6-G4-Workstation:~$ canu -d ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/ -p RID7524 -genomeSize=2.3k -nanopore-raw ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/RID7524.gz

conda5.log

I then tried to update canu: $(canu) shaun@shaun-HP-Z6-G4-Workstation:~$ conda update canu

conda6.log

I then tried with -corConcurreny=10

conda7.log

Installed a new Java into the canu environment (https://anaconda.org/bioconda/java-jdk)

conda8.log

ran the command:

conda9.log

So then I cahnged $gedit /home/shaun/miniconda3/envs/canu/bin/../lib/site_perl/canu/Execution.pm and removed -d64 so it looks like this: my $javaOpt = "" if (defined(getGlobal("javaUse64Bit")) && getGlobal("javaUse64Bit") == <this is line 144>

ran it again with the changes above, same errors:

conda10.log

This is what the failed pecompute file looks like:

conda11.log

this is what a successful precompute file looks like:

conda12.log

the Mhap precompute failed files seem to be random.

Any help with this would be much appreciated.

skoren commented 3 years ago

I'm not going to comment on the conda install logs, I'm not sure why it would need python3 as Canu doesn't have any python code or python requirements. I expect the dependency tree from conda is not quite correct but we also do not recommend conda as an installation method for Canu.

I don't see any JVM/java related errors in the logs you posted so I don't think there's any reason to install a different JVM or to changing code. The error is coming when Canu tries to write some reads as fasta sequences to disk. Is it possible you're out of quota? There was a bug in the random downsampling of data to a 200x maximum before the 2.1.1 release but I don't trust conda installations to get the version right. I'd suggest downloading the linux binaries from the release page and trying with that version to see if the error re-occurs. You could also try the conda version but add maxInputCoverage=10000 but if that works that would indicate you've got an old version from conda and not 2.1.1.

NAQS-DRW commented 3 years ago

Thanks for getting back to me Skoren. I wasn't sure if canu needed phython so I loaded the environment with it anyway. I have about 1TB of spree space on the workstation so I don' think this was the problem. I have followed your suggestion and downloaded the binary and installed canu 2.1.1. Everything seems to have installed correctly: canu1.log canu2.log

and I have run canu using: $/home/shaun/canu-2.1.1/build/bin/canu -d ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/ -p RID7524 -genomeSize=2.3k -nanopore-raw ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/RID7524.gz -minReadLength=500 -maxInputCoverage=10000 I had a few errors:

canu3.log

I deleted the output files and started again. A new error:

canu6.log

First I tried shaun@shaun-HP-Z6-G4-Workstation:~$ /home/shaun/canu-2.1.1/build/bin/canu -d ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/ -p RID7524 -genomeSize=2.3k -nanopore-raw ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/RID7524.gz -minReadLength=500 -maxInputCoverage=10000 -maxMemory=63

Which produced the same error as in canu6.log

without deleting the outputs from the previous attempt, I tried shaun@shaun-HP-Z6-G4-Workstation:~$ /home/shaun/canu-2.1.1/build/bin/canu -d ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/ -p RID7524 -genomeSize=2.3k -nanopore-raw ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/RID7524.gz -minReadLength=500 -maxInputCoverage=10000 -minMemory=63

which worked:

canu7.log

So just because I'm curious...I deleted the outputs and did it again with /home/shaun/canu-2.1.1/build/bin/canu -d ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/ -p RID7524 -genomeSize=2.3k -nanopore-raw ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/RID7524.gz -minReadLength=500 -maxInputCoverage=10000 -minMemory=63

It failed: canu8.log

So I went back to this shaun@shaun-HP-Z6-G4-Workstation:~$ /home/shaun/canu-2.1.1/build/bin/canu -d ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/ -p RID7524 -genomeSize=2.3k -nanopore-raw ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/RID7524.gz -minReadLength=500 -maxInputCoverage=10000

with the intention of making sure this in combination with shaun@shaun-HP-Z6-G4-Workstation:~$ /home/shaun/canu-2.1.1/build/bin/canu -d ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/ -p RID7524 -genomeSize=2.3k -nanopore-raw ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/RID7524.gz -minReadLength=500 -maxInputCoverage=10000 -minMemory=63 would complete the assembly.

To my surprise I didn't need to add "-minMemory=63", it worked without that option.

canu9.log

As you can see in the log, it seems to end at the trimming stage, without producing .contigs.fasta, .unitigs.fasta, and .unassembled.fasta

I then re-ran shaun@shaun-HP-Z6-G4-Workstation:~$ /home/shaun/canu-2.1.1/build/bin/canu -d ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/ -p RID7524 -genomeSize=2.3k -nanopore-raw ~/Cairns_NSDM_data/RID7524/fastq_pass_RID7524/RID7524.gz -minReadLength=500 -maxInputCoverage=10000

without deleting the previous outputs

canu10.log

Would I be right in assuming that the stringency of the overlap is too high?

There seems to be no error to fix?

Is there something I need to add to the command to generate these output files?

brianwalenz commented 3 years ago

There's a lot going on here, phew.

Technically, the options with equals signs do not have dashes in front of them: minMemory=63 instead of -minMemory=63 is correct. That the 'dash' form works is probably a bug (that won't be fixed).

The bogart failures were due to canu configuring it to use a smaller memory size than your (very) deep reads require. Setting minMemory=63 increases memory allowed for everything to 63 GB. batMemory=63 would increase just the memory allowed for bogart.

I can't explain why the one run stopped after trimming. I've never seen that before. Then again, I've never seen the precompute failure before either. I have seen strange errors like this when two runs of canu are invoked at the same time using the same -d directory -- they end up stepping on each others toes.

After one of these failures, what files are in the 1-overlapper/blocks/ directory? Can you run the sqStoreDumpFASTQ command that is run by the precompute.sh script successfully by hand? Can you run the precompute.sh script by hand successfully (./precompute.sh <jobnumber>)?

NAQS-DRW commented 3 years ago

Thanks Brian First I deleted the "-" from the options. This didn't work so I added these "-" back. As long as it works then I'm happy. I tried the minMemory=63, but the run failed at bogart again. So I added batMemory=63 and had some success.

canu2.log

Now that I have this working (I think?), I will start to play with the options to refine the alignments.

I removed batMemory=63 to re-create the error so I could have a look in 1-overlapper/blocks/ dir.

canu3.log

As you can see from canu3.log I couldn't find dir /blocks. I'll look out for this next time I have errors.

Any other words tips or tricks that you can offer would be much appreciated!

skoren commented 3 years ago

Did the run stop because it was missing the blocks folder or were you looking after the assembly got to bogart? Those get cleaned up as the run proceeds since they're intermediate output that's not needed later. It seems like installing the binary version from the release page along with batMemory fixed your run and you were able to run Canu.