Persistent failure of a subset of Mhap jobs.

zephyris commented 2 months ago

I'm running canu on Windows subsystem for Linux, and I'm getting stochastic Mhap failures. Multiple restarts tend to narrow it down to one or two persistent Mhap failures...

$ canu -p LmexC9T7 -d canu_haplosmash correctedErrorRate=0.15 genomeSize=30m maxInputCoverage=100 -nanopore nanopore.fq
-- canu 2.2

...

-- Finished on Sun Sep 01 12:35:52 2024 (202 seconds) with 243.75 GB free disk space
----------------------------------------
--
-- Mhap overlap jobs failed, tried 2 times, giving up.
--   job correction/1-overlapper/results/000002.ovb FAILED.
--   job correction/1-overlapper/results/000004.ovb FAILED.
--

ABORT:
ABORT: canu 2.2
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

The input nanopore reads come from Minknow super accurate basecalling, through porechop, then filtered for minimum length 10 kb - using a pipeline that's worked for other genome assemblies.

I've tried doing clean restarts, ie. completely removing the output directory and restarting, but persistently get some failures. Any tips for troubleshooting would be very appreciated.

skoren commented 2 months ago

What are the logs for the failed jobs (something like correction/1-overlapper/*00[48]*err and out) files?

zephyris commented 2 months ago

Thanks @skoren. I'm not getting any err files... I am getting mhap.*.out and precompute.*.out files though

This is mhap.000004.out:

Found perl:
   /usr/bin/perl
   This is perl 5, version 34, subversion 0 (v5.34.0) built for x86_64-linux-gnu-thread-multi

Found java:
   /usr/bin/java
   openjdk version "11.0.23" 2024-04-16

Found canu:
   /home/richard/canu/bin/canu
   canu 2.2

Running job 4 based on command line options.
Fetch blocks/000003.dat
Fetch blocks/000004.dat

Running block 000002 in query 000004

mkfifo: cannot create fifo '000004-pipe': Operation not supported
ERROR:  invalid arg '000004-pipe'
usage: /home/richard/canu/bin/mhapConvert -S seqStore -o output.ovb input.mhap[.gz]
  Converts mhap native output to ovb
    -minlength X    discards overlaps below X bp long.
ERROR:  no overlap files supplied
Running with these settings:
--filter-threshold = 1.0E-7
--help = false
--max-shift = 0.2
--min-olap-length = 500
--min-store-length = 0
--no-rc = false
--no-self = false
--no-tf = false
--num-hashes = 256
--num-min-matches = 3
--num-threads = 12
--ordered-kmer-size = 14
--ordered-sketch-size = 1000
--repeat-idf-scale = 10.0
--repeat-weight = 0.9
--settings = 0
--store-full-id = true
--supress-noise = 0
--threshold = 0.8
--version = false
-f = 
-h = false
-k = 16
-p = 
-q = queries/000004
-s = ./blocks/000002.dat

Processing files for storage in reverse index...
Current # sequences loaded and processed from file: 5000...
Current # sequences loaded and processed from file: 10000...
Current # sequences loaded and processed from file: 15000...
Current # sequences loaded and processed from file: 20000...
Current # sequences loaded and processed from file: 25000...
Current # sequences loaded and processed from file: 30000...
Current # sequences stored: 5000...
Current # sequences stored: 10000...
Current # sequences stored: 15000...
Current # sequences stored: 20000...
Current # sequences stored: 25000...
Current # sequences stored: 30000...
Stored 32400 sequences in the index.
Processed 32400 unique sequences (fwd and rev).
Time (s) to read and hash from file: 3.1387024410000004
Time (s) to score and output to self: 8.650575920000001
Opened fasta file /mnt/e/Dropbox/23.08.2024_Lmajor/2024.04.26_Build/canu_haplosmash/correction/1-overlapper/blocks/000003.dat.
Current # sequences loaded and processed from file: 5000...
Current # sequences loaded and processed from file: 10000...
Current # sequences loaded and processed from file: 15000...
Processed 16200 to sequences.
Time (s) to score, hash to-file, and output: 10.517318940000001
Opened fasta file /mnt/e/Dropbox/23.08.2024_Lmajor/2024.04.26_Build/canu_haplosmash/correction/1-overlapper/blocks/000004.dat.
Current # sequences loaded and processed from file: 5000...
Current # sequences loaded and processed from file: 10000...
Current # sequences loaded and processed from file: 15000...
Processed 16200 to sequences.
Time (s) to score, hash to-file, and output: 9.851086698000001
Total scoring time (s): 29.039751962
Total time (s): 32.179275803
MinHash search time (s): 51.373670914
Total matches found: 1696106
Average number of matches per lookup: 34.89930041152263
Average number of table elements processed per lookup: 4476.517427983539
Average number of table elements processed per match: 128.26954624298244
Average % of hashed sequences hit per lookup: 7.636986803332825
Average % of hashed sequences hit that are matches: 1.4104239398713314
Average % of hashed sequences fully compared that are matches: 87.51882104654823

Both failed and successful jobs have lines like this:

mkfifo: cannot create fifo '000004-pipe': Operation not supported

But only failed jobs have:

usage: /home/richard/canu/bin/mhapConvert -S seqStore -o output.ovb input.mhap[.gz]
  Converts mhap native output to ovb
    -minlength X    discards overlaps below X bp long.
ERROR:  no overlap files supplied

edit I'm doing some reading, seems mkfifo does not work on drvfs-mounted drives - and I'm running this on a drive mounted this way. However, some jobs successfully complete despite the mkfifo not working?

skoren commented 2 months ago

Yes canu uses mkfifo by default, this seems similar to #2333, I suggest using mhapPipe=false to avoid using pipes, you should completely remove the 1-overlapper folder and re-start the assembly. I also committed a fix for this so it will fail if mkfifo returns an error.

zephyris commented 2 months ago

I've also confirmed that running assembly on the wsl system drive, where mkfifo works, also prevents the error - as you'd expect.

marbl / canu

Persistent failure of a subset of Mhap jobs. #2338