nicolas-comerci / precomp-cpp

Precomp, C++ version - further compress already compressed files
http://schnaader.info/precomp.php
Apache License 2.0
29 stars 2 forks source link

Precomp Neo

Why does this fork exist?

It started when on mid 2022 (around 9 months ago, at the time I am writing this) I attempted to add stdin/stdout support to Precomp, so we could use it without needing to write to or read from massive files. OTF compression supported by Precomp was not an ideal solution as it prevented us from using other tools (like specialized data deduplicators like SREP for example).

In any case I got fairly advanced with the project (there is even an MR on Precomp's github https://github.com/schnaader/precomp-cpp/pull/140) but pretty quickly ran into problems. The code was pretty hard to get into because it wasn't organized in any modular fashion, with most of the code on a single file. The more I looked at the code the more things I saw that I realized I could improve.

So I decided to just do it! The reason I tackled it as a fork instead of contributing to the main project is that it would have been too much work, it would have meant MASSIVE MRs, which would have looked to be doing unnecessary things unless I spent inordinate amounts of time explaining how those things made sense because of what I was planning next, and I figured I would probably not even do it if I had to do that. Sorry if it makes things inconvinient in terms of contributing pieces from my fork to the main Precomp project, but again, I don't think it would have been too feasible to do this work otherwise.

Okay, but in the end what are the changes in this fork?

Despite the large amount of changes, Precomp Neo should be mostly compatible and be able to recompress Precomp v0.4.8 PCF files. The previously mentioned exceptions apply, OTF compressed files or files using Brotli won't work. If your PCF file has Brotli compressed JPGs you are out of luck and will need to recompress using mainline Precomp and precompress again using Precomp Neo. For OTF compressed PCF files you can use mainline Precomp's convert feature to get an uncompressed PCF file which Precomp Neo should be able to recompress.

Great, but what happens now?

While I have tested this with a lot of files, I need to continue testing, specially against mainline Precomp to ensure I fix any new bug I have introduced during this whole refactoring project. Of course if I can fix any already existing bug in precomp I run into while testing I will probably do so.

In so far as this fork and mainline precomp, or how to get improvements from this fork into precomp, I have no idea. I would probably have to consult with Schneider if he is interested, in what of the improvements he is most interested, how we could tackle it, etc.

For now, I will continue working on this on my spare time, my main focus being fixing bugs, and improving reliability. Adding extra formats and making Precomp even more powerful is something I would like to do, but for now I would love to get Precomp into a more 'production ready' state with the format support it already has. Like I would like it if I didn't need to immediately recompress and hash check the output of precomp against the original file to make sure I am actually recovering the original file. Get it to a level where other products can confidently use libprecomp without worrying their data might get corrupted.

If you want to contribute to this fork, feel free to do so! I do think it should be much easier to get into it than mainline Precomp so take a look around the code.

Contact

You can reach me at nicolas.comerci@fing.edu.uy.

However, please do not contact me by email if another channel would be more appropriate. In particular, don't ask for features, improvements, bug fixes or format support requests. The github issues page on the repo is the appropriate channel for those subjects, you WILL be flagged as spam and ignored if you email me about these things.

ORIGINAL PRECOMP README BELOW

Precomp

Join the chat at https://gitter.im/schnaader/precomp-cpp Build Status Build status

Packaging status

What is Precomp?

Precomp is a command line precompressor that can be used to further compress files that are already compressed. It improves compression on some file-/streamtypes - works on files and streams that are compressed with zLib or the Deflate compression method (like PDF, PNG, ZIP and many more), bZip2, GIF, JPG and MP3. Precomp tries to decompress the streams, and if they can be decompressed and "re-"compressed so that they are bit-to-bit-identical with the original stream, the decompressed stream can be used instead of the compressed one.

The result of Precomp is either a smaller, LZMA2 compressed file with extension .pcf (PCF = PreCompressedFile) or, when using -cn, a file containing decompressed data from the original file together with reconstruction data. In this case, the file is larger than the original file, but can be compressed with any compression algorithm stronger than Deflate to get better compression.

Since version 0.4.3, Precomp is available for Linux/nix/macOS, too. The different versions are completely compatible, PCF files are exchangeable between Windows/Linux/nix/macOS systems.

Usage example

Command Comment
wget http://mattmahoney.net/dc/silesia.zip
(or download from here)
We want to compress this file (the Silesia compression corpus).
Size: 67,633,896 bytes (100,0%)
7z a -mx=9 silesia.7z silesia.zip Compressing with 7-Zip LZMA2, setting "Ultra".
Size: 67,405,052 bytes (99,7%)
precomp silesia.zip Compressing with Precomp results in silesia.pcf.
Size: 47,122,779 bytes (69,7%)
precomp -r -osilesia.zip_ silesia.pcf This restores the original file to a new file named silesia.zip_.
Without the -o parameter, Precomp would decompress to silesia.zip.
diff -s silesia.zip silesia.zip_ Compares the original file to the result file, they're identical

How can I contribute?

Releases/Binaries

Official GitHub releases for both Windows and Linux.

Alternative binary download of the latest official release for both Windows and Linux.

Binaries for older version can be found at this Google Drive folder.

Contact

Christian Schneider

schnaader@gmx.de

http://schnaader.info

Donations

Donate

To donate, you can either use the donate button here, the one at the top of the page ("Sponsor") or the one on my homepage. You can also send bitcoins to

1KvQxn6KHp4tv92Z5Fy8dTPLz4XdosQpbz

Credits

Thanks for support, help and comments:

Legal stuff

License

Copyright 2006-2021 Christian Schneider

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.