traversc / qs

Quick serialization of R objects
397 stars 19 forks source link

Slow initial qread after period without reading #53

Closed alexvpickering closed 3 years ago

alexvpickering commented 3 years ago

Hello - and thank you for the amazing package. It has generally replaced my usage of readRDS and saveRDS.

I've noticed that after a saving an object with qsave and then reading it a while later the initial read will be substantially slower than subsequent reads. Is this to be expected and is there any way to avoid it? Thanks again.

traversc commented 3 years ago

Yes, it's expected. You can see same thing with any method.

The first cold read needs to read the data from disk, so that is limited to disk speed * compression ratio.

After the first read, the OS stores the data in cache on RAM for a while. So subsequent reads will read data from the RAM cache. This is a fully OS dependent process and not something a person can have any real control over.

There are lots of things you can do to make it faster -- use a SSD (or a faster SSD if you already have one). You can also do things like preload the file to force it into cache. See here: https://serverfault.com/questions/43383/caching-preloading-files-on-linux-into-ram

alexvpickering commented 3 years ago

Thanks for the quick reply ... makes sense! I'll explore preloading the file into cache