Open spillerrec opened 8 years ago
MergedPPFile
needs to have the filesize in order to know how much space to reserve for the file. I have set it to be stored as the very first thing in the file, but this still requires us to touch the drive, and if we need to check thousands of files this could slow it down significantly. We could have a header file containing all the sizes, but I fear feature-creep. I would rather have a proper PP archive format which dedupes everything nicely, and live with the slower initialisation for mod development, which probably wouldn't use compressed files anyway.
Lz4
compression has very nice decompression speed which would make sense even on SSDs. Compression ratio is quite bad however, being slightly worse than deflate
. However it might obsolete deflate
as it is 10x faster and you would probably go with LZMA
if you want better compression.
Right now the seeking problem was addressed by just loading the entire file into memory and releasing it when destroying the handle. This might cause some latency however. Some proper performance tests should be made, also in order to compare the different compression algorithms to better understand when to use what.
I have added an archive format for containing several compressed files. It just stores the data and compression settings, a separate header file needs to be created to add the wanted behavior to it. It should be easier to create several different types of headers, and reuse the compressed storage for all the tested deduplication schemes.
A bit of (more or less) manual testing was done, with results shown in: https://docs.google.com/spreadsheets/d/1rxB9iDwyzyyf4XaaBcUBWkV4wF_yIdm4lXQN9d0M4GA/edit?usp=sharing Just using LZMA compression on each file had a much more impressive compression ratio on the internal formats than I expected from earlier experiments:
.xx
are reduced with about 85%.xa
are reduced with about 95%.sviex
only about 55%Internal file deduplication should still give quite a bit more, up to half the file size of the current results (optimally, which probably wouldn't happen). .sviex
have a lot of redundancy, which could be reduced about 80% more, but I have yet to look into how they are working.
The more troublesome is audio and images:
.wav
only compresses with about 25%. Since there is about 6.5 GB of audio, improving this would give big improvements. Using FLAC in some manner should (estimated) improve the ratio to 45%, reducing it to 3.75 GB from 5 GB..bmp
and .tga
compresses depending on the complexicity, the .pp
file with the big backgrounds only gets down to ~50%, which is what you would expect from just compressing the raw pixel data. PNG might get slightly better, more modern formats is likely better, but it might not be worth it for now.What should be done now is generalize MergedPPFile
such that several PP implementations could use it. We should be able to support merging:
.pp
files directly without splitting them up as needed now
To drastically reduce filesize we would like to be able to compress the files.
Things to consider: