Closed datdenkikniet closed 1 year ago
@NailLegProcessorDivide if you could take a look at this I'd appreciate it :) It makes some changes that are important for calculating larger sizes of N.
is this mostly for getting ready to stream to files? I dont quite understand the LEB128 thing.
The leb128 change makes it so that we can write a header of a predetermined length (by default, leb128
makes it so you can write smaller numbers in a fewer bytes. In this case, we already know the theoretical limit (u64
), and saving 9 bytes on a file of 600 GB is pointless).
Now we can write a stream of cubes to the file while keeping a running count, and set it to the correct value at the end. Makes it so we don't have to re-write an entire file once we know the length (there is, AFAIK, no way to "shift a file to the right" without rewriting all data following it). Not having to re-write and/or re-read the entire file if we're well aware that we will eventually know how many cubes we've written is nice.
To illustrate: Before:
Now:
I'm not sure why the format uses leb128 (just a u128 or u64 would have avoided having to do it this messily), but :shrug:
This could technically mess up implementations that don't have the read function implemented (correctly!) as we have it here, but there is nothing about leb128 encoding that says you can't do this (i.e. encoding a number with a fixed amount of bytes).
Main motivation is that I don't want to wait 2.5 hours just to see if the file I wrote for N = 16
thinks it contains 59795121480
polycubes.
It's especially relevant when generating new cubesets, since the whole point is that you can't keep them all in memory and there is no way to know how many there will be in advance.
Oh and the rest is for perf improvements (unbuffered writes to disk are slow) and bug fixes.
ok yep, I hadnt realised the length field was variable size, that is a choice given the file is unstoreable even close to a u64 of polycubes.
Rest of the changes look good to me.
@bertie2 I think this is ready for merge :) I have rebased the commits to clean up the history a bit (and would prefer a non-squash this time around).
I have also verified that the funky trick (ref) for getting a fixed-size header works fine with the leb128
python library, so I hope you're OK with it.
apologies for the delay in merging this partly life has gotten in the way, partly I have become fixated on methods for distributing the work. all looks good, and runs a good bit faster on my machine. also yes the files still seems to be compatible, I guess the python library handles extraneous blocks of zeros in case another implementation uses them. hope to get back to you soon with the V2 .pcube format asap as I know the current limitations are proving annoying.
That sounds cool! Would like to see that too, the way the DFS algorithm works seems to support distrubution well :)
Also no worries about delays, things happen I will gladly take any response, especially as quickly as 3 days! Thanks for merging.
Add some write buffering to the Rust version to significantly speed up write speeds, and an option to count the cubes in a file when converting stream-oriented files.
Also add a "prefill" option to the PCube header so that we can write a fixed-size header.