Closed lmolokin closed 3 years ago
The only other time we've seen that same error was issue #1732 and that was due to a bug with POSIX file naming (which was removed in Canu 2.1). So first question is, how was canu build/installed on your system? Are you able to share the input data to reproduce the error locally?
Not sure how they installed canu on the cluster that I'm using. Is there a quick way to find out? I could reach out to the admins if needed.
Input file is ~150Mb. What's the best way to share it?
Hold off on contacting admins for now.
gzip -9 it and upload via instructions at https://canu.readthedocs.io/en/latest/faq.html#how-can-i-send-data-to-you.
Just uploaded (s168.fastq.gz)
Thanks for the reads. Unfortunately, I haven't been able to reproduce the crash. I'm a little hesitant to 'request' that you upload your entire assembly directory s168_blasto as mine is almost 300 GB. If you want to upload it over the weekend, go ahead (gzip -1 will work well).
To increase the parallelization of the mhap step, I decreased parameter mhapBlockSize from the default of 3000 to 50. This gave me about 800 (!) mhap jobs which ran in about 30 minutes on our grid. With the default value, mhap is wanting about a day and a half; it's still running.
I have a guess at what is going wrong. Limiting input coverage to 50,000 or so should prevent the problem. Splitting this input into two pieces and running each separately should work, hopefully without hurting sensitivity too much.
Drop me an email. The corrected reads I got are only 1.4 MB when gzip -9'd -- about 39 Mbp -- and I can email them back.
Awesome, I can reproduce it. I don't need your assembly directory.
I will keep the mhapBlockSize parameter in mind for speed up and give the read splitting strategy a try. Glad to hear you reproduced it! Any more insight on the cause of the crash?
Thanks!
A little bit. Read 14575 is using around 68,000 other reads as evidence which is causing a per-base quality score to overflow. As to why it has so many evidence reads, I don't know. Most other reads have tens to hundreds of evidence reads; it's the third column in the log you shared.
Ah! There is a workaround!
File readsToCorrect is a list of the reads that we want to correct. It's just an ASCII list of read IDs. File .readsToCorrect.log has some details on the expected outcome of each read. In particular, it tells how many evidence reads will be used ("numOlaps").
readID numOlaps origLength corrLength memory used
---------- ------------- ------------- ------------- ------------- -----
1 0 1804 0 540480704 e - -
2 2 1796 0 540473084 e - -
3 104 1800 1799 543781856 e c -
4 0 1784 0 540301184 e - -
...
14573 8 1853 1833 541183886 e c -
14574 0 1793 0 540381968 e - - <- 'e' means it is used as evidence for correcting some other read
14575 68203 1795 1794 2723950022 e c - <- 'c' means it is corrected
14576 0 1789 0 540346064 e - -
14577 0 1791 0 540364016 e - -
...
The workaround is to sort this file on the second column (sort -k2nr *.readsToCorrect.log | less
) and remove any read that has more than 60,000 overlaps from the list in readsToCorrect.
There are quite a few 'deep' reads in this set:
> sort -k2nr test.readsToCorrect.log | head -n 20
14575 68203 1795 1794 2723950022 e c -
22039 61696 1865 1831 2525740766 e c -
77550 53235 1798 1797 2247845918 e c -
47565 48698 1801 1800 2105025098 e c -
8965 45274 1815 1814 2006596496 e c -
13378 44039 1798 1797 1952919230 e c -
71966 43641 1805 1804 1946143178 e c -
63099 40035 1848 1847 1834859900 e c -
23709 39484 1798 1797 1806636200 e c -
53823 38715 1805 1804 1787273234 e c -
85274 36669 1797 1796 1715976956 e c -
84097 35358 1796 1795 1673154062 e c -
79025 35133 1845 1832 1670254526 e c -
67161 30966 1794 1793 1531552286 e c -
1079 30396 1800 1799 1516679714 e c -
7213 29968 1797 1796 1501051322 e c -
89926 29780 1844 1831 1502931266 e c -
8270 29357 1800 1799 1483548266 e c -
39577 28860 1789 1788 1461747542 e c -
25417 26031 1862 1851 1379117726 e c -
Can confirm that splitting the reads in two results in successful runs.
Fixed in tip, workaround is to split very deep data into multiple partitions.
Hi, I'm attempting to correct very high coverage amplicon reads generated by minION and can't figure out the specific reason it keeps failing even after scouring the logs. I need to make sure all reads are corrected because we're looking for low abundance variants in our reads hence the high coverage.
The command I'm using is:
Based on what I've read in other threads here, it might be related to running out of memory or with how changing certain settings affects the partitioning of reads. I've had similar correction runs with other deep coverage amplicon datasets which did successfully finish due to some combination of restarting failed runs with more corMemory, ovsMemory, deleting the 2-correction folder, or the entire run folder. So I'm trying to pin down exactly which settings I need to use when working with deep coverage data like this.
I would greatly appreciate some help figuring out which parameters to probe. Thanks!
Relevant logs and outputs below.
Latest
canu.out
Abbreviated contents of
/2-correction/results/0001.err
showing the falconsense errorLatest job file
2-correction/correctReads.5412718_1.out:
Aleksey