Closed datdenkikniet closed 1 year ago
so there is the proviso in the format that a cube_count of zero means ongoing stream. I don't see how having the cube count at the end would help as reading to the end to read the cube count would be slower than just reading each cube as it comes? or is their a fast way to skip straight to the end of the file ?
Yes, the streaming format I'm aware of.
There is no fast way to skip to the end of the file, but we could encode the length of a trailer and the type(s) of data that are in the trailer in the header.
The main purpose would be where the program is writing a very large file in a streaming fashion while keeping track of the count. Since the count is already known, it would save a bunch of effort if we could just tack it on to the end instead of requiring either an entire rewrite of the file, or requiring that whoever opens the file must be OK with not knowing the amount of cubes in the file from the get-go. In my case, not knowing the amount of cubes from the start makes my parallel implementation go wonky, and being able to skip reading the file just makes it easier.
I can understand if the complexity is not warranted though.
Still, I do think we should add a reserved header byte for future expansion opportunities! Even if we don't have it now, being able to retroactively add new features without breaking old files is a good idea, IMO.
Okay, I have realized that my specific use case of "putting the cube count at the end" may not be super useful. I have, however, come up with a different thing that definitly requires the same infrastructure: writing blocks of cubes of the same size. This would entail:
This would mean reducing the amount of data stored per cube 3 bytes, and perhaps increase the file density significantly.
This would also allow for far more efficient in-file deduplication
Closing in favor of #8
For (eventual) implementations that may support reading & writing cubes directly from disk to avoid having to store them in RAM, it would be really beneficial if we can write the count of cubes at the end of the file as well.
If writing the count to the end of the file is supported, cubes can be written to it in a streaming fashion and the count can be added at the end. If it's not supported, you have to rewrite the entire file from the beginning in order to fit the LEB128 encoding into the header.
Is there an easy way to support this?
I think we should definitely add a byte to the header that is just flags, with 1 bit explicitly reserved for increasing said header size (if we ever find more than 7 flags we need).