marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

Canu at 'cormhap' step for > 10 days for a 25M genome #942

Closed Wen-Juan closed 6 years ago

Wen-Juan commented 6 years ago

Hi!

I aim to assemble a small fungi genome '(~25M)' using MinION long reads. The total input is ~7G bp datasets. I launched the assembly 12 days ago, and it has stayed at 'cormhap' step for 10 days. I do see the results is generating all the time.

I am using up to 64 cpus and occupy the whole cluster server for the past 12 days. I am just wondering whether this is normal to take so long at this step for such a small genome, and is there any way i can speed this up?

Here is the script I use for assembly: (Canu version 1.6)

genomeSize=25m \
-nanopore-raw reads/Msalviae_BM25-2B.fastq \
useGrid=false \
gridOptions=false \
minReadLength=400 \
minOverlapLength=400

Here is the most recent log file output:

screen shot 2018-06-05 at 11 19 40

Many thanks, and looking forward to your answer.

Best.

skoren commented 6 years ago

You've got a lot of coverage (about 300x) and you're setting the read length and overlap length lower than the default. You probably want to increase minimum overlap/read length not decrease it, you could also take just the longest 100-fold reads.

If you really want to speed it up, let Canu use the cluster which will get your more than 64-cores to run on.

Wen-Juan commented 6 years ago

Thanks!

I did tried increased minReadLength, when using

minReadLength=3000 \
minOverlapLength=500

This takes the same time, even slower process comparing to the one I posted, by checking the current results, perhaps this still has high coverage.
However, when I tried

minReadLength=30000 \
minOverlapLength=1000

This gave me errors and Canu stopped running after 10 hours. I think the intermediate files were not generated. Not sure of the cause. I did not keep the file though, as my folder quota is reaching the limit.

OK! I will try to use the longest 100-fold reads. 1) what overlapping length would be recommended for longest 100-fold reads? 2) Is there any parameters in Canu which I can set for this threshold? or I should use other script to get these reads?

Thanks.

skoren commented 6 years ago

While you're restarting update to Canu 1.7. Canu has a fastqSample program which you can run without parameters to see the usage or you can use another program to get the reads. Defaults should be fine for 100-fold coverage.

It's possible you have some high-coverage contaminant or other simple repeat sequence in your data. In that case the parameters: corMhapFilterThreshold=0.0000000002 corMhapOptions="--threshold 0.80 --num-hashes 256 --num-min-matches 3 --ordered-sketch-size 1000 --ordered-kmer-size 14 --min-olap-length 2000 --repeat-idf-scale 50" mhapMemory=60g mhapBlockSize=500 may help but before you try that I'd suggest useGrid=true instead to use your cluster rather than a single node.

If you're able to share the data I can take a quick look and run it locally.

Wen-Juan commented 6 years ago

OK. Thanks! I will try them out!

Wen-Juan commented 6 years ago

If i stopped the job, and relaunch the script again, 1) would it be problem if I change the min read length and overlap length below? e.g. I change to minReadLength=30000 minOverlapLength=1000 2) or I should keep the same script but add the parameters you suggested above?

Thanks.

Wen-Juan commented 6 years ago

I re-launched a Canu assembly with 124x coverage, but as i described before, it failed again at the correction 'cor' step.

The scripts i used: canu -d canu_output4 -p inter_files4 \ genomeSize=25m \ -nanopore-raw /scratch/beegfs/weekly/wjma/microbotryum/input/reads/Msalviae_BM25-2B.fastq \ useGrid=false \ corConcurrency=4 \ stopOnReadQuality=false \ minReadLength=35000 \ minOverlapLength=2000 \ corMhapFilterThreshold=0.0000000002 corMhapOptions="--threshold 0.80 --num-hashes 256 --num-min-matches 3 --ordered-sketch-size 1000 --ordered-kmer-size 14 --min-olap-length 2000 --repeat-idf-scale 50" mhapMemory=60g mhapBlockSize=500

I do not find the 1-overlapper/results/ folder, and I suspected this step went wrong. Here is the latest log file info:

screen shot 2018-06-06 at 09 13 27

Thanks.

skoren commented 6 years ago

The 1-overlapper/results folder is removed once that step completes so if it is not there that doesn't necessarily mean it didn't run. There should be a log from each of the failed files in correction/2-correction/*.out, can you post one of those? Can you also post the full Canu log written to stderr/stdout?

Wen-Juan commented 6 years ago

here is one of /2-correction/*.out:

Running job 1 based on command line options.
./correctReads.sh: line 150: 29434 Done(141)               ( $bin/generateCorrectionLayouts -b $bgn -e $end -rl ./inter_files4.readsToCorrect -G $gkpStore -O ../inter_files4.ovlStore -S ./inter_files4.globalScores -C 80 -F && touch ./correction_outputs/$jobid.dump.success )
     29435 Segmentation fault      | $bin/falcon_sense --min_idt 0.5 --min_len 35000 --max_read_len 743494 --min_ovl_len 2000 --min_cov 4 --n_core 2 > ./correction_outputs/$jobid.fasta.WORKING 2> ./correction_outputs/$jobid.err
Read layout generation failed.
mv: cannot stat './correction_outputs/0001.fasta': No such file or directory

Here is the full Canu log file: log_canu.txt

Thanks.

skoren commented 6 years ago

I don't see any errors upstream of the correction errors so I think overlapping ran and you have overlaps, from the summary log:

--  OVERLAPS:
--  --------
--  
--  IGNORED:
--  
--             0 (< 0.0000 fraction error)
--             0 (> 0.4095 fraction error)
--             0 (< 0 bases long)
--             0 (> 2097151 bases long)
--  
--  FILTERED:
--  
--      30913619 (too many overlaps, discard these shortest ones)
--  
--  EVIDENCE:
--  
--       2333547 (longest overlaps)
--  
--  TOTAL:
--  
--      33247166 (all overlaps)
--  
--  READS:
--  -----
--  
--             0 (no overlaps)
--           910 (no overlaps filtered)
--          3022 (<  50% overlaps filtered)
--         29516 (<  80% overlaps filtered)
--         45236 (<  95% overlaps filtered)
--         57699 (< 100% overlaps filtered)
--  

So most of the reads have overlaps. I would guess you're running of memory so lowering corConcurrency to 1 might help. This code has changed significantly from 1.6 to 1.7 and I don't want to debug 1.6 so I'd suggest restarting from scratch with version 1.7 as I suggested earlier with the same parameters and see if it runs.

Wen-Juan commented 6 years ago

OK, i see. Thanks!

I will ask our cluster/server staff to update the Canu version, and re-launch it.

I will let you know if it successfully runs :)

Wen-Juan commented 6 years ago

Hi again, as we discussed, I updated to Canu newest version v1.7, and re-launched the same scripts. It aborted at the almost similar place, the correction step.

Here is the last few lines of log file:
-- Finished on Fri Jun  8 17:54:06 2018 (41 seconds) with 4152.238 GB free disk space
----------------------------------------
--
-- Read correction jobs failed, tried 2 times, giving up.
--   job 2-correction/results/0001.cns FAILED.
--   job 2-correction/results/0002.cns FAILED.
--   job 2-correction/results/0003.cns FAILED.
--   job 2-correction/results/0004.cns FAILED.
--   job 2-correction/results/0005.cns FAILED.
--   job 2-correction/results/0006.cns FAILED.
--   job 2-correction/results/0007.cns FAILED.
--   job 2-correction/results/0008.cns FAILED.
--   job 2-correction/results/0009.cns FAILED.
--   job 2-correction/results/0010.cns FAILED.
--   job 2-correction/results/0011.cns FAILED.
--   job 2-correction/results/0012.cns FAILED.
--

ABORT:
ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

Here is canu_output5/correction/2-correction/correctReads.000001.out:

Running job 1 based on command line options.
./correctReads.sh: line 131:  8400 Segmentation fault      (core dumped) $bin/falconsense -G $gkpStore -C ../inter_files5.corStore -b $bgn -e $end -r ./inter_files5.readsToCorrect -t 4 -cc 4 -cl 35000 -oi 0.5 -ol 2000 -p ./results/$jobid.WORKING > ./results/$jobid.err 2>&1

Thanks!

Wen-Juan commented 6 years ago

It seems the segmentation fault in ./correcttReads.sh. Is there someway we can solve this?

Wen-Juan commented 6 years ago

any further thoughts? looking forward to your suggestions!

Wen-Juan commented 6 years ago

Hi again, here is the error message from this file: /canu_output5/correction/2-correction/results/0001.err (the last sentence in this error is found in all *err file).

read38980 #4 location 26 to template 0-23831 length 23831 diff 0.416474
read38980 #4 location 27 to template 0-23832 length 23832 diff 0.416457
mapped 38980     0-28708 to template      0- 23806 trimmed by      0-    25 CCACGGGGAG CCACG---AG
ALIGN to 0-24138 length 50503
read43520 #14 location 0 to template 260-21854 length 21594 diff 0.260026
read43520 #14 location 1 to template 260-21855 length 21595 diff 0.260014
falconsense: overlapInCore/libedlib/edlib.C:347: void edlibAlignmentToStrings(const unsigned char*, int, int, int, int, int, const char*, const char*, char*, char*): Assertion `strlen(qry_aln_str) == alignmentLength && strlen(tgt_aln_str) == alignmentLength' failed.

Failed with 'Aborted'; backtrace (libbacktrace):

Failed with 'Segmentation fault'; backtrace (libbacktrace):

Thanks for getting back to me!

skoren commented 6 years ago

Do you know if you have Ns in your input data or non-ACGT characters (canu will convert those to Ns)? This was a bug in the 1.7 release (see issues #880 and #881). Same suggestions as there apply, you could remove this read from your list of reads to correct but if a lot of your data has Ns, then you'd need to remove a lot of your input. You could also try the tip but again you'd have to run that from scratch.

Wen-Juan commented 6 years ago

I just checked my reads, it does not seem to have Ns in the input data. Any other possible errors to induce this?

skoren commented 6 years ago

It doesn't have to be an N, any non-ACGT character would cause the issue. Otherwise this code has been relatively stable. Same suggested fixed still apply, either remove the offending read (38980) from the list to correct (from the file correction/2-correction/asm.readsToCorrect) or try the latest code. If you are able to, you can also send the input file to us to see if we can reproduce the error locally (see http://canu.readthedocs.io/en/latest/faq.html#how-can-i-send-data-to-you).

Wen-Juan commented 6 years ago

ok, I will first remove the offending read (38980) from the list to correct as you suggested, and see how this goes.

thanks.

Wen-Juan commented 6 years ago

I just re-ran the code after removing the read 38980, and it failed again.

skoren commented 6 years ago

What is the error it reports?

Wen-Juan commented 6 years ago
--
-- Read correction jobs failed, tried 2 times, giving up.
--   job 2-correction/results/0001.cns FAILED.
--   job 2-correction/results/0002.cns FAILED.
--   job 2-correction/results/0003.cns FAILED.
--   job 2-correction/results/0004.cns FAILED.
--   job 2-correction/results/0005.cns FAILED.
--   job 2-correction/results/0006.cns FAILED.
--   job 2-correction/results/0007.cns FAILED.
--   job 2-correction/results/0008.cns FAILED.
--   job 2-correction/results/0009.cns FAILED.
--   job 2-correction/results/0010.cns FAILED.
--   job 2-correction/results/0011.cns FAILED.
--   job 2-correction/results/0012.cns FAILED.
--

ABORT:
ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
skoren commented 6 years ago

No, I mean in the actual error log file like you posted earlier (e.g falconsense: overlapInCore/libedlib/edlib.C:347: void edlibAlignmentToStrings(const unsigned char*, int, int, int, int, int, const char*, const char*, char*, char*): Assertion `strlen(qry_aln_str) == alignmentLength && strlen(tgt_aln_str) == alignmentLength' failed.).

Wen-Juan commented 6 years ago

the /canu_output5/correction/2-correction/results/0001.err has similar error as previously reported:

ALIGN to 0-18957 length 45361
read35573 #22 location 0 to template 338-21318 length 20980 diff 0.295091
falconsense: overlapInCore/libedlib/edlib.C:347: void edlibAlignmentToStrings(const unsigned char*, int, int, int, int, int, const char*, const char*, char*, char*): Assertion `strlen(qry_aln_str) == alignmentLength && strlen(tgt_aln_str) == alignmentLength' failed.

Failed with 'Aborted'; backtrace (libbacktrace):

Failed with 'Segmentation fault'; backtrace (libbacktrace):
skoren commented 6 years ago

So you have multiple reads that are failing in the same way, that makes me think it is the same issue as #880 and #881 with special characters or some other bad input in the fastq file. Your options are then to try the latest code or share the data with us.

Wen-Juan commented 6 years ago

what do you mean by 'try the latest code'?

skoren commented 6 years ago

Rather than using a release, checkout the latest from GitHub, compile and run (http://canu.readthedocs.io/en/latest/index.html#Install), you have to start this from scratch.

Wen-Juan commented 6 years ago

so this recompile from GitHub solves the issue with strange characters? It will take at least a day for the department IT to change this. Can i send you my data and try to solve this today?

skoren commented 6 years ago

Yes, the recompile solves the bad characters issue. I included the link to the instructions to send us data so you're welcome to do it but either way it's not going to get fixed today because I'm not sure when we'll have a chance to run and debug it.

Wen-Juan commented 6 years ago

ok, thanks for letting me know. I will retry the latest code tomorrow and give you an update.

Wen-Juan commented 6 years ago

Hey, just want to update you that the newest Github compile codes Canu 1.7b worked for me!

I in fact just finished the whole assembly this morning, we could assemble the whole genome with all (if not almost) chromosome levels (a fungi with 23Mb genome)👍

Very happy with the assembly result!

You can close this issue without doubt.

Thanks a lot!

skoren commented 6 years ago

OK it sounds like there were some special characters in the input causing the error.