Open sdjebali opened 4 years ago
Thanks Sarah, do you have an average read length? Its likely but unfortunate that some of your 2nd patch reads are very long.. Thanks Fritz
Indeed there seems to be a big read length difference between the two batches.
I ran Nanoplot on them and here are the results :
First 1 Million reads:
General summary:
Mean read length: 4,722.5
Mean read quality: 4.4
Median read length: 906.0
Median read quality: 4.2
Number of reads: 1,000,000.0
Read length N50: 14,404.0
Total bases: 4,722,479,679.0
Number, percentage and megabases of reads above quality cutoffs
Q5: 367454 (36.7%) 3015.3Mb Q7: 8 (0.0%) 0.1Mb Q10: 0 (0.0%) 0.0Mb Q12: 0 (0.0%) 0.0Mb Q15: 0 (0.0%) 0.0Mb Top 5 highest mean basecall quality scores and their read lengths 1: 7.0 (17272) 2: 7.0 (9848) 3: 7.0 (25242) 4: 7.0 (12091) 5: 7.0 (25093) Top 5 longest reads and their mean basecall quality score 1: 2210466 (3.6) 2: 1850945 (3.8) 3: 1772717 (3.6) 4: 1685671 (3.9) 5: 1563326 (3.9)
second 1 Million reads
General summary:
Mean read length: 13,668.0
Mean read quality: 11.1
Median read length: 13,451.0
Median read quality: 11.8
Number of reads: 1,000,000.0
Read length N50: 16,657.0
Total bases: 13,668,019,254.0
Number, percentage and megabases of reads above quality cutoffs
Q5: 963153 (96.3%) 13574.0Mb Q7: 937982 (93.8%) 13387.4Mb Q10: 781757 (78.2%) 10950.3Mb Q12: 446035 (44.6%) 6333.8Mb Q15: 165 (0.0%) 1.6Mb Top 5 highest mean basecall quality scores and their read lengths 1: 16.3 (2090) 2: 16.2 (243) 3: 16.1 (362) 4: 16.1 (570) 5: 16.1 (1509) Top 5 longest reads and their mean basecall quality score 1: 884004 (3.7) 2: 274368 (5.2) 3: 187850 (4.8) 4: 150969 (3.8) 5: 124444 (9.8)
so 13kb vs 4kb
If we still want to use NGMLR on these data, is there any option that can speed the process up?
Best, Sarah
Dear all,
First of all, thanks for this very nice development.
I just wanted to report the fact that on some quite heavy ONT runs from bovine, NGMLR followed by sort was very slow (about 4 days for 4 million reads).
And I was wondering if I was using the tool correctly (right parameters)?
I tried with the first 1 million reads like this: zcat $fastq | head -n 4000000 | ngmlr --presets ont -t 22 -r $genome | samtools sort -@ 6 -o $output and it took 5h23 to complete
I then tried with the second 1 million reads like this: zcat $fastq | tail -n+4000000 | head -n 4000000 | ngmlr --presets ont -t 22 -r $genome | samtools sort -@ 4 -o $output and it took 24h10 to complete
I am using NGMLR version 0.2.8 and samtools version 1.9, and here are the details about my machine : Linux tatum 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5+deb10u2 (2019-08-08) x86_64 GNU/ 24 processors Linuxprocessor : 0 vendor_id : GenuineIntel cpu family : 6 model : 45 model name : Intel(R) Xeon(R) CPU E5-4610 0 @ 2.40GHz
Any advice would be warmly welcome?
Best, Sarah