jpakkane / jpak

Jpak compression format
GNU General Public License v3.0
15 stars 2 forks source link

Sort by file extension #3

Closed tobia closed 7 years ago

tobia commented 7 years ago

This is an old trick used by many archivers of ages past: first expand the source directory into a huge array of files; then sort them using a heuristic that places similar files close together, improving compressor performance.

The heuristic I remember was sorting by extension and then by size (algorithmically: first sort by size and then stable sort by extension.)

The reason is that two small .h files will be more likely to be similar to each other than a small and a big .h files, let alone a .h file and a .so file.

jpakkane commented 7 years ago

Thanks for the suggestions. I'll try to have some time to work on this in the future.