Closed zephyris closed 2 months ago
What are the logs for the failed jobs (something like correction/1-overlapper/*00[48]*err
and out) files?
Thanks @skoren. I'm not getting any err
files... I am getting mhap.*.out
and precompute.*.out
files though
This is mhap.000004.out
:
Found perl:
/usr/bin/perl
This is perl 5, version 34, subversion 0 (v5.34.0) built for x86_64-linux-gnu-thread-multi
Found java:
/usr/bin/java
openjdk version "11.0.23" 2024-04-16
Found canu:
/home/richard/canu/bin/canu
canu 2.2
Running job 4 based on command line options.
Fetch blocks/000003.dat
Fetch blocks/000004.dat
Running block 000002 in query 000004
mkfifo: cannot create fifo '000004-pipe': Operation not supported
ERROR: invalid arg '000004-pipe'
usage: /home/richard/canu/bin/mhapConvert -S seqStore -o output.ovb input.mhap[.gz]
Converts mhap native output to ovb
-minlength X discards overlaps below X bp long.
ERROR: no overlap files supplied
Running with these settings:
--filter-threshold = 1.0E-7
--help = false
--max-shift = 0.2
--min-olap-length = 500
--min-store-length = 0
--no-rc = false
--no-self = false
--no-tf = false
--num-hashes = 256
--num-min-matches = 3
--num-threads = 12
--ordered-kmer-size = 14
--ordered-sketch-size = 1000
--repeat-idf-scale = 10.0
--repeat-weight = 0.9
--settings = 0
--store-full-id = true
--supress-noise = 0
--threshold = 0.8
--version = false
-f =
-h = false
-k = 16
-p =
-q = queries/000004
-s = ./blocks/000002.dat
Processing files for storage in reverse index...
Current # sequences loaded and processed from file: 5000...
Current # sequences loaded and processed from file: 10000...
Current # sequences loaded and processed from file: 15000...
Current # sequences loaded and processed from file: 20000...
Current # sequences loaded and processed from file: 25000...
Current # sequences loaded and processed from file: 30000...
Current # sequences stored: 5000...
Current # sequences stored: 10000...
Current # sequences stored: 15000...
Current # sequences stored: 20000...
Current # sequences stored: 25000...
Current # sequences stored: 30000...
Stored 32400 sequences in the index.
Processed 32400 unique sequences (fwd and rev).
Time (s) to read and hash from file: 3.1387024410000004
Time (s) to score and output to self: 8.650575920000001
Opened fasta file /mnt/e/Dropbox/23.08.2024_Lmajor/2024.04.26_Build/canu_haplosmash/correction/1-overlapper/blocks/000003.dat.
Current # sequences loaded and processed from file: 5000...
Current # sequences loaded and processed from file: 10000...
Current # sequences loaded and processed from file: 15000...
Processed 16200 to sequences.
Time (s) to score, hash to-file, and output: 10.517318940000001
Opened fasta file /mnt/e/Dropbox/23.08.2024_Lmajor/2024.04.26_Build/canu_haplosmash/correction/1-overlapper/blocks/000004.dat.
Current # sequences loaded and processed from file: 5000...
Current # sequences loaded and processed from file: 10000...
Current # sequences loaded and processed from file: 15000...
Processed 16200 to sequences.
Time (s) to score, hash to-file, and output: 9.851086698000001
Total scoring time (s): 29.039751962
Total time (s): 32.179275803
MinHash search time (s): 51.373670914
Total matches found: 1696106
Average number of matches per lookup: 34.89930041152263
Average number of table elements processed per lookup: 4476.517427983539
Average number of table elements processed per match: 128.26954624298244
Average % of hashed sequences hit per lookup: 7.636986803332825
Average % of hashed sequences hit that are matches: 1.4104239398713314
Average % of hashed sequences fully compared that are matches: 87.51882104654823
Both failed and successful jobs have lines like this:
mkfifo: cannot create fifo '000004-pipe': Operation not supported
But only failed jobs have:
usage: /home/richard/canu/bin/mhapConvert -S seqStore -o output.ovb input.mhap[.gz]
Converts mhap native output to ovb
-minlength X discards overlaps below X bp long.
ERROR: no overlap files supplied
edit I'm doing some reading, seems mkfifo
does not work on drvfs
-mounted drives - and I'm running this on a drive mounted this way. However, some jobs successfully complete despite the mkfifo
not working?
Yes canu uses mkfifo by default, this seems similar to #2333, I suggest using mhapPipe=false
to avoid using pipes, you should completely remove the 1-overlapper folder and re-start the assembly. I also committed a fix for this so it will fail if mkfifo returns an error.
I've also confirmed that running assembly on the wsl system drive, where mkfifo
works, also prevents the error - as you'd expect.
I'm running canu on Windows subsystem for Linux, and I'm getting stochastic Mhap failures. Multiple restarts tend to narrow it down to one or two persistent Mhap failures...
...
The input nanopore reads come from Minknow super accurate basecalling, through porechop, then filtered for minimum length 10 kb - using a pipeline that's worked for other genome assemblies.
I've tried doing clean restarts, ie. completely removing the output directory and restarting, but persistently get some failures. Any tips for troubleshooting would be very appreciated.