mcveanlab / mccortex

De novo genome assembly and multisample variant calling
https://github.com/mcveanlab/mccortex/wiki
MIT License
113 stars 25 forks source link

Contig failing on ctx63 when path colors above 0 used #2

Closed er432 closed 10 years ago

er432 commented 10 years ago

I try running the following: $MCCORTEX contigs -m 490G -n 12G --colour 1 -p 0:Coelorachis.clean.ctp -p 1:Vossia.k63.clean.ctp refAndSamples.basalAndropogonae.inferredEdges.clean.ctx > Vossia.clean.k63.fa

And I get this: [16 Jun 2014 13:01:26-cEm][cmd] /programs/mccortex_5_30_14/bin/ctx63 contigs -m 490G -n 12G --colour 1 -p 0:Coelorachis.clean.ctp -p 1:Vossia.k63.clean.ctp refAndSamples.basalAndropogonae.inferredEdges.clean.ctx [16 Jun 2014 13:01:26-cEm][cwd] /local/workdir/er432/andropogonae/mccortex_out [16 Jun 2014 13:01:26-cEm][version] ctx=v0.0.3 zlib=1.2.3 htslib=0.2.0-rc8-6-gd49dfa6-dirty ASSERTS=ON CHECKS=ON k=33..63 [16 Jun 2014 13:01:26-cEm][memory] graph: 305GB [16 Jun 2014 13:01:26-cEm][memory] paths: 49.6GB [16 Jun 2014 13:01:26-cEm][memory] total: 354.6GB of 504.8GB RAM [16 Jun 2014 13:01:26-cEm][hashtable] Allocating table with 12,884,901,888 entries, using 192.5GB [16 Jun 2014 13:01:26-cEm][hashtable]  number of buckets: 268,435,456, bucket size: 48 [16 Jun 2014 13:02:50-cEm][graph] kmer-size: 63; colours: 3; capacity: 12,884,901,888 [16 Jun 2014 13:04:27-cEm][paths] Setting up path store to use 49.6GB main [16 Jun 2014 13:04:27-cEm] Loading file refAndSamples.basalAndropogonae.inferredEdges.clean.ctx [3 colours] into colours 0-2 [16 Jun 2014 13:04:27-cEm]  2,223,283,362 kmers, 64.2GB filesize [16 Jun 2014 13:04:27-cEm][CtxLoad] First col 0, into cols 0..2, file has 3 cols: refAndSamples.basalAndropogonae.inferredEdges.clean.ctx [16 Jun 2014 13:14:42-cEm] Loaded 2,223,283,362 / 2,223,283,362 (100.00%) of kmers parsed [16 Jun 2014 13:14:42-cEm][hash] buckets: 268,435,456 [2^28]; bucket size: 48; memory: 192.5GB; occupancy: 2,223,283,362 / 12,884,901,888 (17.25%) [16 Jun 2014 13:14:42-cEm]  collisions  0: 2223283362 [16 Jun 2014 13:14:42-cEm][PathFormat] With 2 files, require 11859397612 tmp memory [0 extra bytes] [16 Jun 2014 13:14:42-cEm] Loading file Coelorachis.clean.ctp [1 colour] into colour 0 [16 Jun 2014 13:14:42-cEm]  2,039,725,230 paths, 38.6GB path-bytes, 27,492,743 kmers, 39.2GB filesize [16 Jun 2014 13:16:45-cEm][paths] Setup tmp path memory to use 11GB [remaining 38.6GB] [16 Jun 2014 13:16:45-cEm] Loading file Vossia.k63.clean.ctp [1 colour] with colour filter: 0 into colour 1 [16 Jun 2014 13:16:45-cEm]  633,841,256 paths, 11GB path-bytes, 25,553,986 kmers, 11.6GB filesize [src/kmer/path_store.c:186] Error path_store_add_packed(): Out of memory for paths [16 Jun 2014 13:18:45-cEm] Fatal Error

Running only with the path for Vossia as follows: $MCCORTEX contigs -m 490G -n 12G --ncontigs 1000000 --print --colour 1 -p 1:Vossia.k63.clean.ctp refAndSampl es.basalAndropogonae.inferredEdges.clean.ctx > Vossia.clean.k63.fa

Gives this: [16 Jun 2014 12:43:19-cEm][cmd] /programs/mccortex_5_30_14/bin/ctx63 contigs -m 490G -n 12G --ncontigs 1000000 --colour 1 -p 1:Vossia.k63.clean.ctp refAndSamples.basalAndropogonae.inferredEdges.clean.ctx [16 Jun 2014 12:43:19-cEm][cwd] /local/workdir/er432/andropogonae/mccortex_out [16 Jun 2014 12:43:19-cEm][version] ctx=v0.0.3 zlib=1.2.3 htslib=0.2.0-rc8-6-gd49dfa6-dirty ASSERTS=ON CHECKS=ON k=33..63 [16 Jun 2014 12:43:19-cEm][memory] graph: 305GB [16 Jun 2014 12:43:19-cEm][memory] paths: 11GB [16 Jun 2014 12:43:19-cEm][memory] total: 316GB of 504.8GB RAM [16 Jun 2014 12:43:19-cEm][hashtable] Allocating table with 12,884,901,888 entries, using 192.5GB [16 Jun 2014 12:43:19-cEm][hashtable]  number of buckets: 268,435,456, bucket size: 48 [16 Jun 2014 12:44:45-cEm][graph] kmer-size: 63; colours: 3; capacity: 12,884,901,888 [16 Jun 2014 12:46:28-cEm][paths] Setting up path store to use 11GB main [16 Jun 2014 12:46:28-cEm] Loading file refAndSamples.basalAndropogonae.inferredEdges.clean.ctx [3 colours] into colours 0-2 [16 Jun 2014 12:46:28-cEm]  2,223,283,362 kmers, 64.2GB filesize [16 Jun 2014 12:46:28-cEm][CtxLoad] First col 0, into cols 0..2, file has 3 cols: refAndSamples.basalAndropogonae.inferredEdges.clean.ctx [16 Jun 2014 12:57:16-cEm] Loaded 2,223,283,362 / 2,223,283,362 (100.00%) of kmers parsed [16 Jun 2014 12:57:16-cEm][hash] buckets: 268,435,456 [2^28]; bucket size: 48; memory: 192.5GB; occupancy: 2,223,283,362 / 12,884,901,888 (17.25%) [16 Jun 2014 12:57:16-cEm]  collisions  0: 2223283362 [16 Jun 2014 12:57:16-cEm][PathFormat] With 1 files, require 0 tmp memory [0 extra bytes] [16 Jun 2014 12:57:16-cEm] Loading file Vossia.k63.clean.ctp [1 colour] with colour filter: 0 into colour 1 [16 Jun 2014 12:57:16-cEm]  633,841,256 paths, 11GB path-bytes, 25,553,986 kmers, 11.6GB filesize [src/kmer/path_format.c:476] Assert Failed paths_format_merge(): hdr->num_path_bytes == 0 || pstore->tmpstore != ((void *)0) [16 Jun 2014 12:57:16-cEm] Assert Error

However, I can successfully run when I only try to get contigs for color 0, as follows: $MCCORTEX contigs -m 490G -n 12G --ncontigs 1000000 --print --colour 0 -p 0:Coelorachis.clean.ctp refAndSamples.basalAndropogonae.inferredEdges.clean.ctx > Coelorachis.clean.k63.fa

noporpoise commented 10 years ago

Sorry it took so long to get to this. You should be able to run each sample one at a time:

$MCCORTEX contigs -m 490G -n 12G --ncontigs 1000000 --print --colour 0 -p Coelorachis.clean.ctp refAndSamples.basalAndropogonae.inferredEdges.clean.ctx > Coelorachis.clean.k63.fa

$MCCORTEX contigs -m 490G -n 12G --ncontigs 1000000 --print --colour 1 -p 1:Vossia.k63.clean.ctp refAndSamples.basalAndropogonae.inferredEdges.clean.ctx > Vossia.clean.k63.fa

But it looks like you're hitting a bug. We have had several similar issues so have been rewriting the way we store and represent paths (.ctp files). This work should allow for more flexibility and better interoperability of graph annotations files between programs. The develop branch includes this work which we'll soon make a release for. However this means that the file format for .ctp files has changed and you will need to regenerate them. I'm sorry this has been a hassle for you - McCortex is still under development and your feedback is helpful.

Isaac