jermp / data_compression_course

A Crash Course on Data Compression.
291 stars 22 forks source link

recompressed lists.txt, modified run_all script so it won't delete archive #1

Closed tansy closed 2 years ago

tansy commented 2 years ago

As befits a decent repository about compression I recompressed `lists.txt', cutting in half its size, and modified running scripts so it won't delete it. There is no point in decompressing and compressing it all over again every single time one runs a test.

It might be a good idea to make a script to generate this list if there is some logic behind these numbers.

jermp commented 2 years ago

Hi, first of all, thank you for your interest and the PR!

I'm going to accept the PR and you're very welcome to contribute more, although I think you missed the point of that script.

The purpose of the script is to just have all the commands for compilation and running in a single place. It is not there to be run, say, 1000 times. If you want to run many times a compr/decompr. experiment, then you should do - for example - ./compress gamma lists.txt out_gamma.bin; ./decompress gamma out_gamma.bin as explained in the readme, after only a single gunzip command.

In general, it would not even make sense to decompress the file in order to be read as input: we could directly iterate over it (gzipped) using some external libraries. (But I did not want to add it to the repo uncompressed because of its size.) Another option is to have it in binary format and not textual.

The used lists are real integer lists coming from the Gov2 collection, so they cannot be generated on the fly.

tansy commented 2 years ago

I understand the point of a script but point of PR was (as I correctly guessed `list.txt' is a test file):

PS. Is there any particular reason the .key files, which I guess are used to generate PDFs, are executable?

jermp commented 2 years ago

Thank you! The .key files are just my keynote slides, used to generate the pdfs.