mcveanlab / mccortex

De novo genome assembly and multisample variant calling
https://github.com/mcveanlab/mccortex/wiki
MIT License
113 stars 25 forks source link

Fatal Error: Hash table is full #40

Closed peterdfields closed 8 years ago

peterdfields commented 8 years ago

I run into the following error on a very small sample dataset:

[05 Sep 2016 02:26:45-HUD][version] mccortex=v0.0.3-503-gf025dbf zlib=1.2.7 htslib=1.3.1-61-ge87ae87 ASSERTS=ON hash=Lookup3 CHECKS=ON k=3..31 [05 Sep 2016 02:26:45-HUD][memory] 179 bits per kmer [05 Sep 2016 02:26:45-HUD][memory] graph: 3.6GB [05 Sep 2016 02:26:45-HUD][memory](of which threads: 20 x 42991616 = 820MB) [05 Sep 2016 02:26:45-HUD][memory] paths: 1.4GB [05 Sep 2016 02:26:45-HUD][memory] total: 3.7GB of 377.6GB RAM [05 Sep 2016 02:26:45-HUD][hasht] Allocating table with 171,966,464 entries, using 1.3GB [05 Sep 2016 02:26:45-HUD][hasht] number of buckets: 4,194,304, bucket size: 41 [05 Sep 2016 02:26:45-HUD][graph] kmer-size: 31; colours: 3; capacity: 171,966,464 [05 Sep 2016 02:26:45-HUD][GPathReader] need 3589273 paths 21033608 bytes [05 Sep 2016 02:26:45-HUD][GPathSet] Allocating for 3,589,273 paths, 3.4MB colset, 17.5MB seq => 82.5MB total [05 Sep 2016 02:26:45-HUD][FileFilter] Reading file ./inb1_pl/k31/graphs/MA-ES-3.clean.ctx [1 src colour] [05 Sep 2016 02:26:45-HUD][GReader] 121,137,470 kmers, 1.5GB filesize [05 Sep 2016 02:27:15-HUD][GReader] Loaded 121,137,470 / 121,137,470 (100.00%) of kmers parsed [05 Sep 2016 02:27:15-HUD][FileFilter] Reading file ./inb1_pl/k31/graphs/MN-DM-1.clean.ctx [1 src colour] with filter: 0->1 [05 Sep 2016 02:27:15-HUD][GReader] 127,420,164 kmers, 1.5GB filesize [05 Sep 2016 02:27:40-HUD][hasht] buckets: 4,194,304 [2^22]; bucket size: 41; memory: 1.3GB; occupancy: 161,890,833 / 171,966,464 (94.14%) [05 Sep 2016 02:27:40-HUD][hasht] collisions 0: 154224356 [05 Sep 2016 02:27:40-HUD][hasht] collisions 1: 5397571 [05 Sep 2016 02:27:40-HUD][hasht] collisions 2: 1426637 [05 Sep 2016 02:27:40-HUD][hasht] collisions 3: 495114 [05 Sep 2016 02:27:40-HUD][hasht] collisions 4: 195947 [05 Sep 2016 02:27:40-HUD][hasht] collisions 5: 82824 [05 Sep 2016 02:27:40-HUD][hasht] collisions 6: 36706 [05 Sep 2016 02:27:40-HUD][hasht] collisions 7: 16913 [05 Sep 2016 02:27:40-HUD][hasht] collisions 8: 7662 [05 Sep 2016 02:27:40-HUD][hasht] collisions 9: 3639 [05 Sep 2016 02:27:40-HUD][hasht] collisions 10: 1772 [05 Sep 2016 02:27:40-HUD][hasht] collisions 11: 834 [05 Sep 2016 02:27:40-HUD][hasht] collisions 12: 426 [05 Sep 2016 02:27:40-HUD][hasht] collisions 13: 219 [05 Sep 2016 02:27:40-HUD][hasht] collisions 14: 113 [05 Sep 2016 02:27:40-HUD][hasht] collisions 15: 52 [05 Sep 2016 02:27:40-HUD][hasht] collisions 16: 24 [05 Sep 2016 02:27:40-HUD][hasht] collisions 17: 12 [05 Sep 2016 02:27:40-HUD][hasht] collisions 18: 6 [05 Sep 2016 02:27:40-HUD][hasht] collisions 19: 6 [05 Sep 2016 02:27:40-HUD][hash_table.c:293] Fatal Error: Hash table is full

The error arises during the mccortex bubbles step of the pipeline.

noporpoise commented 8 years ago

I can't see what command you ran, but you can fix this error by specifying the hash table size with e.g. -n 1M if your dataset has approximately a million kmers, or -n 3.2G if it contains 3.2 billion. You can estimate this from genome size.

McCortex attempts to estimate memory requirements when it starts and commits to only using that much RAM. We don't resize our hash table. Guessing max number of kmers for small examples and diverse samples can fail. Number of kmers can also be passed to the McCortex pipeline with make NKMERS=...

Max memory usage can also be specified with e.g. -m 1G for one gigabyte (2^30 bytes).

Best, Isaac

On Mon, 5 Sep 2016 at 02:31, peterdfields notifications@github.com wrote:

I run into the following error on a very small sample dataset:

[05 Sep 2016 02:26:45-HUD][version] mccortex=v0.0.3-503-gf025dbf zlib=1.2.7 htslib=1.3.1-61-ge87ae87 ASSERTS=ON hash=Lookup3 CHECKS=ON k=3..31 [05 Sep 2016 02:26:45-HUD][memory] 179 bits per kmer [05 Sep 2016 02:26:45-HUD][memory] graph: 3.6GB [05 Sep 2016 02:26:45-HUD]memory [05 Sep 2016 02:26:45-HUD][memory] paths: 1.4GB [05 Sep 2016 02:26:45-HUD][memory] total: 3.7GB of 377.6GB RAM [05 Sep 2016 02:26:45-HUD][hasht] Allocating table with 171,966,464 entries, using 1.3GB [05 Sep 2016 02:26:45-HUD][hasht] number of buckets: 4,194,304, bucket size: 41 [05 Sep 2016 02:26:45-HUD][graph] kmer-size: 31; colours: 3; capacity: 171,966,464 [05 Sep 2016 02:26:45-HUD][GPathReader] need 3589273 paths 21033608 bytes [05 Sep 2016 02:26:45-HUD][GPathSet] Allocating for 3,589,273 paths, 3.4MB colset, 17.5MB seq => 82.5MB total [05 Sep 2016 02:26:45-HUD][FileFilter] Reading file ./inb1_pl/k31/graphs/MA-ES-3.clean.ctx [1 src colour] [05 Sep 2016 02:26:45-HUD][GReader] 121,137,470 kmers, 1.5GB filesize [05 Sep 2016 02:27:15-HUD][GReader] Loaded 121,137,470 / 121,137,470 (100.00%) of kmers parsed [05 Sep 2016 02:27:15-HUD][FileFilter] Reading file ./inb1_pl/k31/graphs/MN-DM-1.clean.ctx [1 src colour] with filter: 0->1 [05 Sep 2016 02:27:15-HUD][GReader] 127,420,164 kmers, 1.5GB filesize [05 Sep 2016 02:27:40-HUD][hasht] buckets: 4,194,304 [2^22]; bucket size: 41; memory: 1.3GB; occupancy: 161,890,833 / 171,966,464 (94.14%) [05 Sep 2016 02:27:40-HUD][hasht] collisions 0: 154224356 [05 Sep 2016 02:27:40-HUD][hasht] collisions 1: 5397571 [05 Sep 2016 02:27:40-HUD][hasht] collisions 2: 1426637 [05 Sep 2016 02:27:40-HUD][hasht] collisions 3: 495114 [05 Sep 2016 02:27:40-HUD][hasht] collisions 4: 195947 [05 Sep 2016 02:27:40-HUD][hasht] collisions 5: 82824 [05 Sep 2016 02:27:40-HUD][hasht] collisions 6: 36706 [05 Sep 2016 02:27:40-HUD][hasht] collisions 7: 16913 [05 Sep 2016 02:27:40-HUD][hasht] collisions 8: 7662 [05 Sep 2016 02:27:40-HUD][hasht] collisions 9: 3639 [05 Sep 2016 02:27:40-HUD][hasht] collisions 10: 1772 [05 Sep 2016 02:27:40-HUD][hasht] collisions 11: 834 [05 Sep 2016 02:27:40-HUD][hasht] collisions 12: 426 [05 Sep 2016 02:27:40-HUD][hasht] collisions 13: 219 [05 Sep 2016 02:27:40-HUD][hasht] collisions 14: 113 [05 Sep 2016 02:27:40-HUD][hasht] collisions 15: 52 [05 Sep 2016 02:27:40-HUD][hasht] collisions 16: 24 [05 Sep 2016 02:27:40-HUD][hasht] collisions 17: 12 [05 Sep 2016 02:27:40-HUD][hasht] collisions 18: 6 [05 Sep 2016 02:27:40-HUD][hasht] collisions 19: 6 [05 Sep 2016 02:27:40-HUD][hash_table.c:293] Fatal Error: Hash table is full

The error arises during the mccortex bubbles step of the pipeline.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mcveanlab/mccortex/issues/40, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNyS0EffeX3fYbN1nm6Y4Kd9VDoKBJnks5qm2LPgaJpZM4J0qj_ .

noporpoise commented 8 years ago

I'm closing this issue, if the problem persists please re-open it.

markcharder commented 4 years ago

Just a comment on this issue for the interested reader. The NKMERS (or -n) flag should be set to at least the number of K-mers expected from all genomes in the data set. So, if you have a genome of 40 Mb in 5 samples (as I did), NKMERS should be around 40 million * 5 or 200 million.

After running into the same issue as @peterdfields I changed NKMERS to 300 million and mccortex ran perfectly.