schnaader / fairytale

encode.ru community archiver
GNU Lesser General Public License v3.0
31 stars 13 forks source link

Recompression: Inexpensive expansion for all methods in squashfs #29

Open M-Gonzalo opened 6 years ago

M-Gonzalo commented 6 years ago

Squashfs is a read-only filesystem that is frequently used to transparently compress whole operating systems in a live portable media, to distribute software in Snap and AppImage formats, and to efficiently store large multimedia archives. It divides the data into rather small blocks and then compresses them with one of 6 algorithms:

If Fairytale were able to recompress them, it could signify several GB of savings on a sysadmin's drive. But doing so by means of brute-force guessing the precise method that was used to create the file, goes from extremely impractical to virtually impossible.

Luckily, there's no need to do that. Squashfs stores all options used to compress a block so recompressing a SQS file is just a matter of decompressing the streams and copying the flags. Complexity of O(1). Just to make sure, I had a private conversation with the author, Philip Lougher:

As I understand the docs, squashfs stores now the options used to compress every block. [...] My doubt is: Does squashfs really store all compression options inside the final archive?


Yes it does.

If a user has specified non-default compression options, these are stored in the final archive.

If the user has used the default compression options, these are not stored in the archive, because no stored compression options indicate defaults were used. If the defaults were used then storing them is unnecessary, because the software knows what the defaults are.

The presence of compression options is indicated by setting the SQUASHFS_COMP_OPT bit (bit 10) in the Squashfs flags field in the Squashfs superblock.

If that is set, then the compression options are stored immediately after the superblock. The size and structure of the compression options vary depending on which compressor was used to compress the filesystem. The compressor used is stored in the "compression" field in the superblock.

If all the various compressors are enabled and compiled into Mksquashfs, then mksquashfs -info will list the compression options supported by each compressor, and the defaults (which is used, means no compression options will be stored).

Unsquashfs -stat will report the compressor used and any non-default compression options used by the filesystem. If no compression options are reported by -stat then the default options were used.

M-Gonzalo commented 6 years ago

From https://www.kernel.org/doc/Documentation/filesystems/squashfs.txt

 ---------------
|  superblock   |
|---------------|
|  compression  |
|    options    |
|---------------|
|  datablocks   |
|  & fragments  |
|---------------|
|  inode table  |
|---------------|
|   directory   |
|     table     |
|---------------|
|   fragment    |
|    table      |
|---------------|
|    export     |
|    table      |
|---------------|
|    uid/gid    |
|  lookup table |
|---------------|
|     xattr     |
|     table |
 ---------------