mikepound / opencubes

A community improved version of the polycubes project!
MIT License
44 stars 23 forks source link

Rust disk-only computation test #24

Closed datdenkikniet closed 12 months ago

datdenkikniet commented 1 year ago

This is on top of #21.

Seeing if getting the memory size down by never storing many full cubes and instead streaming them from and to disk in memory is effective or not. Am comparing to points-list with size16, since that seems like it had a pretty small footprint. Important to see if the tradeoff between memory usage and slowdown (I'm fairly sure this will be a tad slower) is worth it.

Results so far (definitely far less memory used :D):

Implementation N = 12 N=13
Disk-only 500 MB 3.6 GB
points-list (size16) 1.2GB 9.3 GB

TODO:

NailLegProcessorDivide commented 1 year ago

Interesting strategy and looks like a good potential saving. Doesnt change anything in your testing but just as an fyi size16 only effects opti-bit-set not point-list

datdenkikniet commented 1 year ago

6:15 hours into N=15 it ran out of memory (96 GB). Seems like this is not a terrible approach, but it does seem like the hash function has some problems: I calculated 1039496296 unique expansions for N = 14, which is one item too little :/

NailLegProcessorDivide commented 1 year ago

One option could be t use similar or better buckets to what my patch does for parallelising, generate an output file pcube per one, then reload the now smaller files individually to de-duplicate. im not sure what ratio of duplicates we're filtering and it could cause a potentially large increase in the amount of data we're storing on the disk.

or something like store a pcube file per low byte of the hash and not fully deduplicate it on the first pass

datdenkikniet commented 1 year ago

Yeah! Just storing a few extra copies of a specific canonical expansion and filtering them out later on sounds like a good alternative when it comes to memory usage (will use a bunch more disk though, but that should be fine)

bertie2 commented 12 months ago

@datdenkikniet can this be updated or closed ?

datdenkikniet commented 12 months ago

I will close this until I get around to trying this out again :)