marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
658 stars 179 forks source link

Unitigger failed #154

Closed relyanow closed 8 years ago

relyanow commented 8 years ago

I completed an assembly using errorRate=0.025 with no issue, but when I ran it with errorRate=0.035, I encountered the following error (in the unitigger.err file):

Graph error threshold = 0.105 (10.500%) Max error threshold = 0.105 (10.500%)

Minimum overlap length = 500 bases

number of threads = 16 (command line)

FragmentInfo()-- Loading fragment information for 1545070 fragments and 1 libraries from cache '/ltmp/software/canu/Linux-amd64/bin/sbal-erate-.035/unitigging/4-unitigger/sbal .fragmentInfo'

OverlapCache()-- limited to 258048MB memory (user supplied). PHYS_PAGES = 132354173 PAGE_SIZE = 4096 MEMORY = 542122692608 OverlapCache()-- 17MB for fragment data. OverlapCache()-- 23MB for best edges. OverlapCache()-- 41MB for unitig layouts. OverlapCache()-- 0MB for unitigs. OverlapCache()-- 11MB for id maps. OverlapCache()-- 17MB for overlap cache pointers. OverlapCache()-- 56MB for overlap cache initial bucket. OverlapCache()-- 512MB for overlap cache thread data. OverlapCache()-- 5MB for number of overlaps per read. OverlapCache()-- 0MB for other processes. OverlapCache()-- --------- OverlapCache()-- 686MB for data structures (sum of above). OverlapCache()-- 257361MB available for overlaps.

OverlapCache()-- Loading number of overlaps per fragment. OverlapCache()-- Initial guess at _maxPer=10916 (max of 14113) from (memLimit=269863356573 - memUsed=0) / (numFrags=1545070 * sizeof(OVL)=16) OverlapCache()-- _maxPer= 10916 (numBelow=1545002 numEqual=0 numAbove=68 totalLoad=369561878 -- 0 + 369561878 = 5912990048 <? 269863356573 OverlapCache()-- (251722 MB free, adjust by 242601439) OverlapCache()-- _maxPer= 14113 (numBelow=1545069 numEqual=1 numAbove=0 totalLoad=369601346 -- 0 + 369601346 = 5913621536 <? 269863356573

OverlapCache()-- blockSize = 67108864 (1024MB)

OverlapCache()-- _maxPer = 14113 overlaps/reads OverlapCache()-- numBelow = 1545069 reads (all overlaps loaded) OverlapCache()-- numEqual = 1 reads (all overlaps loaded) OverlapCache()-- numAbove = 0 reads (some overlaps loaded) OverlapCache()-- totalLoad = 369601346 overlaps (100.00%)

OverlapCache()-- availForOverlaps = 257361MB OverlapCache()-- totalMemory = 0MB for organization OverlapCache()-- totalMemory = 5639MB for overlaps OverlapCache()-- totalMemory = 5639MB used

OverlapCache()-- Loading overlap information OverlapCache()-- Loading overlap information: overlaps processed 1 (000.00%) loaded 1 (000.00%) (at read iid 1) OverlapCache()-- Loading overlap information: overlaps processed 239068051 (064.68%) loaded 238262607 (064.46%) (at read iid 1003335) OverlapCache()-- Loading overlap information: overlaps processed 369601346 (100.00%) loaded 368349490 (099.66%) setLogFile()-- Now logging to '/ltmp/software/canu/Linux-amd64/bin/sbal-erate-.035/unitigging/4-unitigger/sbal.002.bestOverlapGraph' setLogFile()-- Now logging to '/ltmp/software/canu/Linux-amd64/bin/sbal-erate-.035/unitigging/4-unitigger/sbal.004.ChunkGraph' setLogFile()-- Now logging to '/ltmp/software/canu/Linux-amd64/bin/sbal-erate-.035/unitigging/4-unitigger/sbal.005.buildUnitigs' setLogFile()-- Now logging to '/ltmp/software/canu/Linux-amd64/bin/sbal-erate-.035/unitigging/4-unitigger/sbal.006.placeContains' Computing arrival rates for 152440 unitigs using 16 threads. Computing error profiles for 152440 unitigs using 16 threads. Computing error profiles - FINISHED. setLogFile()-- Now logging to '/ltmp/software/canu/Linux-amd64/bin/sbal-erate-.035/unitigging/4-unitigger/sbal.007.popBubbles' Computing error profiles for 152440 unitigs using 16 threads. Computing error profiles - FINISHED. setLogFile()-- Now logging to '/ltmp/software/canu/Linux-amd64/bin/sbal-erate-.035/unitigging/4-unitigger/sbal.008.markRepeatReads' Computing error profiles for 152440 unitigs using 16 threads. Computing error profiles - FINISHED. Failed to place read 149220 at 34171-42194 Breakpoints 0 0- 1590 repeat 0 Breakpoints 1 1590- 42131 repeat 1 Breakpoints 2 41561- 52753 repeat 1 bogart: bogart/AS_BATMarkRepeatReads.C:239: uint32 splitUnitigs(UnitigVector&, Unitig, std::vector&, Unitig_, int32, uint32, uint32, bool): Assertion `rid != (4294967295U)' failed.

Failed with 'Aborted'

Backtrace (mangled):

/ltmp/software/canu/Linux-amd64/bin/bogart[0x5080dd] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7f1fe763a8d0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f1fe72b5067] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f1fe72b6448] /lib/x86_64-linux-gnu/libc.so.6(+0x2e266)[0x7f1fe72ae266] /lib/x86_64-linux-gnu/libc.so.6(+0x2e312)[0x7f1fe72ae312] /ltmp/software/canu/Linux-amd64/bin/bogart[0x44ca8e] /ltmp/software/canu/Linux-amd64/bin/bogart[0x4512fb] /ltmp/software/canu/Linux-amd64/bin/bogart[0x403890] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f1fe72a1b45] /ltmp/software/canu/Linux-amd64/bin/bogart[0x404589]

Backtrace (demangled):

[0] /ltmp/software/canu/Linux-amd64/bin/bogart() [0x5080dd] [1] /lib/x86_64-linux-gnu/libpthread.so.0::(null) + 0xf8d0 [0x7f1fe763a8d0] [2] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x37 [0x7f1fe72b5067] [3] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x148 [0x7f1fe72b6448] [4] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x2e266 [0x7f1fe72ae266] [5] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x2e312 [0x7f1fe72ae312] [6] /ltmp/software/canu/Linux-amd64/bin/bogart() [0x44ca8e] [7] /ltmp/software/canu/Linux-amd64/bin/bogart() [0x4512fb] [8] /ltmp/software/canu/Linux-amd64/bin/bogart() [0x403890] [9] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0xf5 [0x7f1fe72a1b45] [10] /ltmp/software/canu/Linux-amd64/bin/bogart() [0x404589]

brianwalenz commented 8 years ago

Interesting. If possible, add flushLog(); after line 237 in src/bogart/AS_BAT_markRepeatReads.C. It should look like this when done:

    if (rid == UINT32_MAX) {
      fprintf(stderr, "Failed to place read %u at %u-%u\n", frg.ident, frgbgn, frgend);
      for (uint32 ii=0; ii<BP.size(); ii++)
        fprintf(stderr, "Breakpoints %2u %8u-%8u repeat %u\n", ii, BP[ii]._bgn, BP[ii]._end, BP[ii]._isRepeat);
      flushLog();  //  ADD THIS LINE
    }
    assert(rid != UINT32_MAX);  //  We searched all the BP's, the read had better be placed!                                                                   

Then, recompile, rerun, and attach the markRepeatReadslog file in the 4-unitigger directory.

(I'd rather have you edit than pull a new version from the repository, just in case we changed something that impacts this.)

relyanow commented 8 years ago

I recompiled and attached the log file. I just restarted the run from where it failed. Should I rerun the entire assembly?

thanks

On Tue, Jun 7, 2016 at 9:36 PM brianwalenz notifications@github.com wrote:

Interesting. If possible, add flushLog(); after line 237 in src/bogart/AS_BAT_markRepeatReads.C. It should look like this when done:

if (rid == UINT32_MAX) {
  fprintf(stderr, "Failed to place read %u at %u-%u\n", frg.ident, frgbgn, frgend);
  for (uint32 ii=0; ii<BP.size(); ii++)
    fprintf(stderr, "Breakpoints %2u %8u-%8u repeat %u\n", ii, BP[ii]._bgn, BP[ii]._end, BP[ii]._isRepeat);
  flushLog();  //  ADD THIS LINE
}
assert(rid != UINT32_MAX);  //  We searched all the BP's, the read had better be placed!

Then, recompile, rerun, and attach the _markRepeatReads_log file in the 4-unitigger directory.

(I'd rather have you edit than pull a new version from the repository, just in case we changed something that impacts this.)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/marbl/canu/issues/154#issuecomment-224463513, or mute the thread https://github.com/notifications/unsubscribe/AD0o9JmWVF9RKnZuA4hJIrDEZ_xxQK-gks5qJhyngaJpZM4IwdEJ .

brianwalenz commented 8 years ago

Nope, no need to restart from scratch.

Unfortunately, the email attachments get lost. You need to go to the issue on the web and attach it there (supposedly, by just drag-n-drop, but I've never tried that).

relyanow commented 8 years ago

sbal.008.markRepeatReads.thr000.num000.log.zip

brianwalenz commented 8 years ago

Thanks for the log.

The last commit should resolve the problem, but it will make more small contigs than before. From what I saw in the log, this might split a fair number of good sized contigs in your assembly.

If you want to rerun the 0.025 assembly (to make an apples-apples comparison), remove:

and restart canu. It'll run just 'bogart' and consensus.

Is this low coverage? The contigs are quite short, and there are a bunch of 1x coverage regions.

relyanow commented 8 years ago

Thanks, I'll try that. Yes, it's pretty low coverage, about 10X.

relyanow commented 8 years ago

The assembly completed successfully. Thanks for your help.