spillerrec / VirtualAA2

2 stars 0 forks source link

Support compressed files #5

Open spillerrec opened 8 years ago

spillerrec commented 8 years ago

To drastically reduce filesize we would like to be able to compress the files.

Things to consider:

spillerrec commented 8 years ago

MergedPPFile needs to have the filesize in order to know how much space to reserve for the file. I have set it to be stored as the very first thing in the file, but this still requires us to touch the drive, and if we need to check thousands of files this could slow it down significantly. We could have a header file containing all the sizes, but I fear feature-creep. I would rather have a proper PP archive format which dedupes everything nicely, and live with the slower initialisation for mod development, which probably wouldn't use compressed files anyway.

Lz4 compression has very nice decompression speed which would make sense even on SSDs. Compression ratio is quite bad however, being slightly worse than deflate. However it might obsolete deflate as it is 10x faster and you would probably go with LZMA if you want better compression.

Right now the seeking problem was addressed by just loading the entire file into memory and releasing it when destroying the handle. This might cause some latency however. Some proper performance tests should be made, also in order to compare the different compression algorithms to better understand when to use what.

spillerrec commented 8 years ago

I have added an archive format for containing several compressed files. It just stores the data and compression settings, a separate header file needs to be created to add the wanted behavior to it. It should be easier to create several different types of headers, and reuse the compressed storage for all the tested deduplication schemes.

A bit of (more or less) manual testing was done, with results shown in: https://docs.google.com/spreadsheets/d/1rxB9iDwyzyyf4XaaBcUBWkV4wF_yIdm4lXQN9d0M4GA/edit?usp=sharing Just using LZMA compression on each file had a much more impressive compression ratio on the internal formats than I expected from earlier experiments:

Internal file deduplication should still give quite a bit more, up to half the file size of the current results (optimally, which probably wouldn't happen). .sviex have a lot of redundancy, which could be reduced about 80% more, but I have yet to look into how they are working. The more troublesome is audio and images:

What should be done now is generalize MergedPPFile such that several PP implementations could use it. We should be able to support merging: