mcveanlab / mccortex

De novo genome assembly and multisample variant calling
https://github.com/mcveanlab/mccortex/wiki
MIT License
113 stars 25 forks source link

Hash table is full #89

Open chatcrawler opened 4 years ago

chatcrawler commented 4 years ago

Hey there.

I have 220 bacterial genomes with around 12Mb genome size on average. I am running the following command:

/cluster/apps/gdc/mccortex/1.0.1/bin/mccortex63 vcfcov -m 280GB -n 10G --low-mem --ref ....

After k 31 and 63 have been completed, during the step of creating the vcfs I got the following error message: "Fatal Error: Hash table is full"

I am aware of the closed report further down this list with the same title ("Hash table is full").

Following the advice given initially solved the problem for me, during an earlier run in which I only relied on the default (1M) hash table size and 70GB of RAM.

After adjusting both to -m 280GB -n 10G respectively, the pipeline now git stuck again, increasing the memory or hash table size doesn't work.

HEre's the full output of the log file:

[26 Jun 2020 15:46:48-toj][cmd] /cluster/apps/gdc/mccortex/1.0.1/bin/mccortex63 vcfcov -m 280GB -n 10G --low-mem --ref /cluster/scratch/swielgos/SEB-IND$ [26 Jun 2020 15:46:48-toj][cwd] /cluster/scratch/swielgos/SEB-INDIANA-TRIMMED [26 Jun 2020 15:46:48-toj][version] mccortex=tags/mccortex-1.0.1 zlib=1.2.7 htslib=1.9-66-gbcf9bff-dirty ASSERTS=ON hash=Lookup3 CHECKS=ON k=33..63 [26 Jun 2020 15:46:48-toj][vcfcov] max allele length: 100; max number of variants: 8 [26 Jun 2020 15:46:48-toj][memory] 160 bits per kmer [26 Jun 2020 15:46:48-toj][memory] graph: 802MB [26 Jun 2020 15:46:48-toj][memory] total: 802MB of 755GB RAM [26 Jun 2020 15:46:48-toj][vcfcov] Output format: compressed VCF [26 Jun 2020 15:46:48-toj][hasht] Allocating table with 41,943,040 entries, using 642MB [26 Jun 2020 15:46:48-toj][hasht] number of buckets: 1,048,576, bucket size: 40 [26 Jun 2020 15:46:48-toj][graph] kmer-size: 63; colours: 1; capacity: 41,943,040 [26 Jun 2020 15:46:48-toj][vcfcov] Loading kmers from VCF+ref [26 Jun 2020 15:48:09-toj][hasht] buckets: 1,048,576 [2^20]; bucket size: 40; [26 Jun 2020 15:48:09-toj][hasht] memory: 642MB; filled: 39,346,704 / 41,943,040 (93.81%) [26 Jun 2020 15:48:09-toj][hasht] collisions 0: 37511945 [26 Jun 2020 15:48:09-toj][hasht] collisions 1: 1305068 [26 Jun 2020 15:48:09-toj][hasht] collisions 2: 337947 [26 Jun 2020 15:48:09-toj][hasht] collisions 3: 114617 [26 Jun 2020 15:48:09-toj][hasht] collisions 4: 44132 [26 Jun 2020 15:48:09-toj][hasht] collisions 5: 18388 [26 Jun 2020 15:48:09-toj][hasht] collisions 6: 8017 [26 Jun 2020 15:48:09-toj][hasht] collisions 7: 3539 [26 Jun 2020 15:48:09-toj][hasht] collisions 8: 1606 [26 Jun 2020 15:48:09-toj][hasht] collisions 9: 757 [26 Jun 2020 15:48:09-toj][hasht] collisions 10: 357 [26 Jun 2020 15:48:09-toj][hasht] collisions 11: 170 [26 Jun 2020 15:48:09-toj][hasht] collisions 12: 90 [26 Jun 2020 15:48:09-toj][hasht] collisions 13: 34 [26 Jun 2020 15:48:09-toj][hasht] collisions 14: 18 [26 Jun 2020 15:48:09-toj][hasht] collisions 15: 6 [26 Jun 2020 15:48:09-toj][hasht] collisions 16: 6 [26 Jun 2020 15:48:09-toj][hasht] collisions 17: 7 [26 Jun 2020 15:48:09-toj][hash_table.c:247] Fatal Error: Hash table is full Failed to open -: unknown file type

Thanks for your advice.