qsbase / qs

Quick serialization of R objects
400 stars 19 forks source link

Questions about the benchmark #26

Closed mllg closed 4 years ago

mllg commented 4 years ago

Thanks for the package. I have a couple of questions regarding the benchmark in the README:

  1. Where can I find the benchmark script so that I can replicate it on my system?
  2. Which compression algorithm is used in saveRDS()? I assume "bzip2", but this should be documented.
  3. Why are gzip, xz and uncompressed (compress = FALSE) not included in the benchmark?
  4. AFAIK saveRDS() is single threaded for bzip2. How is it possible that it is so much faster with 4 threads?
traversc commented 4 years ago

Hi @mllg:

1. Please see here for the benchmark script.

https://gist.github.com/traversc/a3b42c2ca0af940df40ed954c40cb315

There is a variable called "reps" which I set to 1 for testing between versions. You can increase that number if you'd like to get something closer to something like the one in the readme.

Note also, that there is a lot of relative variation depending on the hardware and OS you are using.

2. I am not using bzip2 for the comparison. I am using gzip and multithreaded gzip via a program called pigz.

As far as I understand, bzip2 will produce better compression but will be much slower than gzip.

3. Mostly because it would make the plot too busy and complex. I have tested save/read RDS with no compression, and I believe qs would be favorable, especially with low compression.

4. See #2.

mllg commented 4 years ago

I am not using bzip2 for the comparison. I am using gzip and multithreaded gzip via a program called pigz.

Ah, that explains a lot.

I have tested save/read RDS with no compression, and I believe qs would be favorable, especially with low compression.

I still think including timings for no compression would help, this is an important baseline.

Anyways, thanks for your comments and all the effort you put into this package!