mcveanlab / mccortex

De novo genome assembly and multisample variant calling
https://github.com/mcveanlab/mccortex/wiki
MIT License
113 stars 25 forks source link

Cleaning failing with Assert Failed db_graph_alloc(): num_of_cols > 0 #47

Closed Phelimb closed 7 years ago

Phelimb commented 7 years ago

Hi @noporpoise,

I'm running mccortex building and cleaning on a bunch of samples from the ENA and a small percentage are failing on the cleaning step without much information as to why.

Here's a dump of the log. Building works fine but then cleaning dies with Assert Failed db_graph_alloc(): num_of_cols > 0

Any ideas?

[29 Dec 2016 11:12:04-QaP][cmd] /users/iqbal/phelimb/apps/mccortex/bin/mccortex31 build -t 1 -m 7G -k 31 -s ERR233401 --seq /well/iqbal/projects/atlas/fastq/ERR233/ERR233401/ERR233401_1.fastq.gz --seq /well/iqbal/projects/atlas/fastq/ERR233/ERR233401/ERR233401_2.fastq.gz /well/iqbal/people/phelim/bins/k31/ERR233/ERR233401/uncleaned/ERR233401.ctx
[29 Dec 2016 11:12:04-QaP][cwd] /gpfs2/well/iqbal/people/phelim
[29 Dec 2016 11:12:04-QaP][version] mccortex=v0.0.3-539-g22e27b7 zlib=1.2.8 htslib=1.3.2-134-g1bc5c56 ASSERTS=ON hash=Lookup3 CHECKS=ON k=3..31
[29 Dec 2016 11:12:04-QaP] Saving graph to: /well/iqbal/people/phelim/bins/k31/ERR233/ERR233401/uncleaned/ERR233401.ctx
[29 Dec 2016 11:12:04-QaP][sample] 0: ERR233401
[29 Dec 2016 11:12:04-QaP][task] /well/iqbal/projects/atlas/fastq/ERR233/ERR233401/ERR233401_1.fastq.gz; FASTQ offset: auto-detect, threshold: off; cut homopolymers: off; remove PCR duplicates: no; colour: 0
[29 Dec 2016 11:12:04-QaP][task] /well/iqbal/projects/atlas/fastq/ERR233/ERR233401/ERR233401_2.fastq.gz; FASTQ offset: auto-detect, threshold: off; cut homopolymers: off; remove PCR duplicates: no; colour: 0
[29 Dec 2016 11:12:04-QaP][memory] 104 bits per kmer
[29 Dec 2016 11:12:04-QaP][memory] graph: 6.9GB
[29 Dec 2016 11:12:04-QaP][memory] total: 6.9GB of 94.6GB RAM
[29 Dec 2016 11:12:04-QaP] Writing 1 colour graph to /well/iqbal/people/phelim/bins/k31/ERR233/ERR233401/uncleaned/ERR233401.ctx
[29 Dec 2016 11:12:04-QaP][hasht] Allocating table with 570,425,344 entries, using 4.3GB
[29 Dec 2016 11:12:04-QaP][hasht]  number of buckets: 16,777,216, bucket size: 34
[29 Dec 2016 11:12:04-QaP][graph] kmer-size: 31; colours: 1; capacity: 570,425,344
[29 Dec 2016 11:12:04-QaP][hasht] buckets: 16,777,216 [2^24]; bucket size: 34;
[29 Dec 2016 11:12:04-QaP][hasht] memory: 4.3GB; filled: 0 / 570,425,344 (0.00%)
[29 Dec 2016 11:12:04-QaP][asyncio] Inputs: 2; Threads: 1
[29 Dec 2016 11:12:04-QaP][seq] Parsing sequence file /well/iqbal/projects/atlas/fastq/ERR233/ERR233401/ERR233401_2.fastq.gz
[29 Dec 2016 11:12:04-QaP][seq] Parsing sequence file /well/iqbal/projects/atlas/fastq/ERR233/ERR233401/ERR233401_1.fastq.gz
[29 Dec 2016 11:12:04-QaP] /well/iqbal/projects/atlas/fastq/ERR233/ERR233401/ERR233401_2.fastq.gz: Qual scores: Sanger (Phred+33) [offset: 33, range: [33,73], sample: [33,69]]
[29 Dec 2016 11:12:04-QaP] /well/iqbal/projects/atlas/fastq/ERR233/ERR233401/ERR233401_1.fastq.gz: Qual scores: Sanger (Phred+33) [offset: 33, range: [33,73], sample: [33,73]]
[29 Dec 2016 11:17:05-QaP][BuildGraph] Read 5,000,000 entries (reads / read pairs)
[29 Dec 2016 11:20:57-QaP][BuildGraph] Read 10,000,000 entries (reads / read pairs)
[29 Dec 2016 11:21:02-QaP][seq] Loaded 5,082,194 reads and 0 reads pairs (file: /well/iqbal/projects/atlas/fastq/ERR233/ERR233401/ERR233401_1.fastq.gz)
[29 Dec 2016 11:21:04-QaP][seq] Loaded 5,082,194 reads and 0 reads pairs (file: /well/iqbal/projects/atlas/fastq/ERR233/ERR233401/ERR233401_2.fastq.gz)
[29 Dec 2016 11:21:04-QaP][hasht] buckets: 16,777,216 [2^24]; bucket size: 34;
[29 Dec 2016 11:21:04-QaP][hasht] memory: 4.3GB; filled: 488,170,572 / 570,425,344 (85.58%)
[29 Dec 2016 11:21:04-QaP][hasht]  collisions  0: 477616581
[29 Dec 2016 11:21:04-QaP][hasht]  collisions  1: 9132135
[29 Dec 2016 11:21:04-QaP][hasht]  collisions  2: 1181354
[29 Dec 2016 11:21:04-QaP][hasht]  collisions  3: 195799
[29 Dec 2016 11:21:04-QaP][hasht]  collisions  4: 35793
[29 Dec 2016 11:21:04-QaP][hasht]  collisions  5: 7104
[29 Dec 2016 11:21:04-QaP][hasht]  collisions  6: 1436
[29 Dec 2016 11:21:04-QaP][hasht]  collisions  7: 297
[29 Dec 2016 11:21:04-QaP][hasht]  collisions  8: 60
[29 Dec 2016 11:21:04-QaP][hasht]  collisions  9: 11
[29 Dec 2016 11:21:04-QaP][hasht]  collisions 11: 2
[29 Dec 2016 11:21:04-QaP][task] input: /well/iqbal/projects/atlas/fastq/ERR233/ERR233401/ERR233401_1.fastq.gz colour: 0
[29 Dec 2016 11:21:04-QaP]  SE reads: 10,164,388  PE reads: 0
[29 Dec 2016 11:21:04-QaP]  good reads: 10,163,375  bad reads: 1,013
[29 Dec 2016 11:21:04-QaP]  dup SE reads: 0  dup PE pairs: 0
[29 Dec 2016 11:21:04-QaP]  bases read: 1,524,658,200  bases loaded: 1,520,553,440
[29 Dec 2016 11:21:04-QaP]  num contigs: 10,186,918  num kmers: 1,214,945,900 novel kmers: 488,170,572
[29 Dec 2016 11:21:04-QaP][task] input: /well/iqbal/projects/atlas/fastq/ERR233/ERR233401/ERR233401_2.fastq.gz colour: 0
[29 Dec 2016 11:21:04-QaP]  SE reads: 0  PE reads: 0
[29 Dec 2016 11:21:04-QaP]  good reads: 0  bad reads: 0
[29 Dec 2016 11:21:04-QaP]  dup SE reads: 0  dup PE pairs: 0
[29 Dec 2016 11:21:04-QaP]  bases read: 0  bases loaded: 0
[29 Dec 2016 11:21:04-QaP]  num contigs: 0  num kmers: 0 novel kmers: 0
[29 Dec 2016 11:21:04-QaP] Dumping graph...
[29 Dec 2016 11:21:04-QaP][graphwriter] Saving file to: /well/iqbal/people/phelim/bins/k31/ERR233/ERR233401/uncleaned/ERR233401.ctx
[29 Dec 2016 11:21:04-QaP][FileFilter] Writing graph  [1 src colour]
[29 Dec 2016 11:21:47-QaP][graphwriter] Dumped 488,170,572 kmers in 1 colour into: /well/iqbal/people/phelim/bins/k31/ERR233/ERR233401/uncleaned/ERR233401.ctx (ver: 6)
[29 Dec 2016 11:21:47-QaP][memory] We made 19 allocs
[29 Dec 2016 11:21:47-QaP] Done.
[29 Dec 2016 11:21:47-QaP][time] 583.00 seconds (9 mins 43 secs)
[29 Dec 2016 11:21:47-mIT][cmd] /users/iqbal/phelimb/apps/mccortex/bin/mccortex31 clean -m 6GB -B 2 -U -T -o /well/iqbal/people/phelim/bins/k31/ERR233/ERR233401/cleaned/ERR233401.ctx -c /well/iqbal/people/phelim/bins/k31/ERR233/ERR233401/cleaned/stats/ERR233401_before.csv -C /well/iqbal/people/phelim/bins/k31/ERR233/ERR233401/cleaned/stats/ERR233401_after.csv -l /well/iqbal/people/phelim/bins/k31/ERR233/ERR233401/cleaned/stats/ERR233401_lbefore.csv -L /well/iqbal/people/phelim/bins/k31/ERR233/ERR233401/cleaned/stats/ERR233401_lafter.csv /well/iqbal/people/phelim/bins/k31/ERR233/ERR233401/uncleaned/ERR233401.ctx
[29 Dec 2016 11:21:47-mIT][cwd] /gpfs2/well/iqbal/people/phelim
[29 Dec 2016 11:21:47-mIT][version] mccortex=v0.0.3-539-g22e27b7 zlib=1.2.8 htslib=1.3.2-134-g1bc5c56 ASSERTS=ON hash=Lookup3 CHECKS=ON k=3..31
[29 Dec 2016 11:21:47-mIT] Actions:
[29 Dec 2016 11:21:47-mIT] 0. Saving kmer coverage distribution to: /well/iqbal/people/phelim/bins/k31/ERR233/ERR233401/cleaned/stats/ERR233401_before.csv
[29 Dec 2016 11:21:47-mIT] 1. Saving unitig length distribution to: /well/iqbal/people/phelim/bins/k31/ERR233/ERR233401/cleaned/stats/ERR233401_lbefore.csv
[29 Dec 2016 11:21:47-mIT] 2. Cleaning tips shorter than 62 nodes
[29 Dec 2016 11:21:47-mIT] 3. Cleaning unitigs with auto-detected threshold
[29 Dec 2016 11:21:47-mIT] 4. Saving kmer coverage distribution to: /well/iqbal/people/phelim/bins/k31/ERR233/ERR233401/cleaned/stats/ERR233401_after.csv
[29 Dec 2016 11:21:47-mIT] 5. Saving unitig length distribution to: /well/iqbal/people/phelim/bins/k31/ERR233/ERR233401/cleaned/stats/ERR233401_lafter.csv
[29 Dec 2016 11:21:47-mIT][memory] 72 bits per kmer
[29 Dec 2016 11:21:47-mIT][memory] graph: 5.5GB
[29 Dec 2016 11:21:47-mIT][cleaning] 1 input graph, max kmers: 488,170,572, using 0 colours
[29 Dec 2016 11:21:47-mIT][memory] total: 5.5GB of 94.6GB RAM
[src/graph/db_graph.c:39] Assert Failed db_graph_alloc(): num_of_cols > 0
[29 Dec 2016 11:21:47-mIT] Assert Error
noporpoise commented 7 years ago

Hi @Phelimb,

The McCortex clean command guesses out how many colours it can handle at once given the memory constraints and an estimate of the number of kmers. This means it can clean a (for example) 1,000 colour graph by pooling the samples and cleaning this single colour population graph, then cleaning n colours against the pop graph at a time. The effect is the same as loading all colours at once. I think the issue is that it's guessing that n=0 is optimal - which is obviously crazy. Try setting the number of colours to use with -N 1. I think that will fix it - let me know if it doesn't.

I'm putting together a patch to fix this behaviour.

iqbal-lab commented 7 years ago

@phelimb is cleaning a single colour graph, so seems like this is a degenerate case where your calculation/estimate is unnecessary

noporpoise commented 7 years ago

Yep - should be fixed on develop branch now.

iqbal-lab commented 7 years ago

Your fix in principle allows the code to choose to clean in 2 colours even in a 1 colour graph though ? Anyway sounds good

Phelimb commented 7 years ago

Thanks @noporpoise. Will give the develop version a try and let you know if it fixes the issue.