Closed datdenkikniet closed 12 months ago
Interesting strategy and looks like a good potential saving. Doesnt change anything in your testing but just as an fyi size16 only effects opti-bit-set not point-list
6:15 hours into N=15 it ran out of memory (96 GB). Seems like this is not a terrible approach, but it does seem like the hash function has some problems: I calculated 1039496296 unique expansions for N = 14, which is one item too little :/
One option could be t use similar or better buckets to what my patch does for parallelising, generate an output file pcube per one, then reload the now smaller files individually to de-duplicate. im not sure what ratio of duplicates we're filtering and it could cause a potentially large increase in the amount of data we're storing on the disk.
or something like store a pcube file per low byte of the hash and not fully deduplicate it on the first pass
Yeah! Just storing a few extra copies of a specific canonical expansion and filtering them out later on sounds like a good alternative when it comes to memory usage (will use a bunch more disk though, but that should be fine)
@datdenkikniet can this be updated or closed ?
I will close this until I get around to trying this out again :)
This is on top of #21.
Seeing if getting the memory size down by never storing many full cubes and instead streaming them from and to disk in memory is effective or not. Am comparing to points-list with size16, since that seems like it had a pretty small footprint. Important to see if the tradeoff between memory usage and slowdown (I'm fairly sure this will be a tad slower) is worth it.
Results so far (definitely far less memory used :D):
TODO: