isotopp / tarstats

Generate some statistics about a tarfile.
GNU General Public License v3.0
13 stars 2 forks source link

Count compressed size #2

Open BenBE opened 2 years ago

BenBE commented 2 years ago

The size listed currently is the accumulated uncompressed size for each archive.

Further interesting stats are:

The latter is mostly of interest, when more than one archive is processed at once.

isotopp commented 2 years ago

Compressed size (actually raw file size as seen by the OS) handled in ae1c202d3146de6cb6547e18d20af78d18271fda.

BenBE commented 2 years ago

That commit LGTM.

For the headers the size can be estimated as 512*filecount. For really long filenames* I'd have to look into the spec, but even then the header allocations operate on whole blocks AFAIR.

The overhead per file is basically the empty space inside the last of its 512 byte blocks.

Overall there's always at least two full blocks of empty space as end-of-file marker (though the standard does not enforce its presence).

*For filenames above 99 characters an additional "synthetic" header describing a file containing the actual full filename is produced.

$ tar c 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef.hex | hexdump -C
00000000  2e 2f 2e 2f 40 4c 6f 6e  67 4c 69 6e 6b 00 00 00  |././@LongLink...|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000060  00 00 00 00 30 30 30 30  36 34 34 00 30 30 30 30  |....0000644.0000|
00000070  30 30 30 00 30 30 30 30  30 30 30 00 30 30 30 30  |000.0000000.0000|
00000080  30 30 30 30 32 30 35 00  30 30 30 30 30 30 30 30  |0000205.00000000|
00000090  30 30 30 00 30 31 31 36  30 30 00 20 4c 00 00 00  |000.011600. L...|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100  00 75 73 74 61 72 20 20  00 72 6f 6f 74 00 00 00  |.ustar  .root...|
00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000120  00 00 00 00 00 00 00 00  00 72 6f 6f 74 00 00 00  |.........root...|
00000130  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200  30 31 32 33 34 35 36 37  38 39 61 62 63 64 65 66  |0123456789abcdef|
*
00000280  2e 68 65 78 00 00 00 00  00 00 00 00 00 00 00 00  |.hex............|
00000290  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000400  30 31 32 33 34 35 36 37  38 39 61 62 63 64 65 66  |0123456789abcdef|
*
00000460  30 31 32 33 30 30 30 30  36 36 34 00 30 30 30 31  |01230000664.0001|
00000470  37 35 30 00 30 30 30 31  37 35 31 00 30 30 30 30  |750.0001751.0000|
00000480  30 30 30 30 30 31 35 00  31 34 31 37 37 33 30 37  |0000015.14177307|
00000490  36 32 32 00 30 32 36 30  32 36 00 20 30 00 00 00  |622.026026. 0...|
000004a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000500  00 75 73 74 61 72 20 20  00 62 65 6e 62 65 00 00  |.ustar  .benbe..|
00000510  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000520  00 00 00 00 00 00 00 00  00 62 65 6e 62 65 00 00  |.........benbe..|
00000530  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000600  3a 30 30 30 30 30 30 30  31 46 46 0d 0a 00 00 00  |:00000001FF.....|
00000610  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002800

If this same file is called a.hex, only the actual file header is present, but the resulting archive is 10KiB nonetheless.