Open AshesITR opened 6 years ago
This was previously discussed in https://github.com/richfitz/storr/issues/11
There are a few ways of doing this and the trick is in the details. I would be very open to a PR that handles this gracefully.
I would encourage you to think about implementing this in the style of multistorr
which separates out key and data storage. This allows reuse of heaps of existing storr functionality without too much boilerplate.
The other similar bit of code (in fact it might basically be enough) is driver_remote
. This is part completed work - the other half is in https://github.com/ben-gready/storr.remote/pull/2/files - however, there is some simple usage in the tests.
Apologies that neither of these features are too well documented!
I've spun up some concept code (still a lot to do for documentation and testing) Do you mind checking out the code in my fork? I'd be interested in your feedback on design, naming etc. Maybe we need more features? Also, kudos for the good testing harness. I could easily verify the basic stuff works :)
https://github.com/AshesITR/storr/commit/6c133f0e5767f6e8873cc280bc4c49cce87fa97d
Assume we have some data wich lives on a slow network drive, in a
storr::storr_rds()
and we need to read often from it and from multiple R sessions on the same computer.In this scenario I thought about pulling the network backed storr once and storing it on disk
Now each session has the default environment cache for speedup with multiple reads. Additionally, after calling import from one session, other sessions can read from a local SSD instead of a slow network share.
There are a few problems with this though, mainly staying in-sync with s_remote.
So what I thought could be useful would be some kind of multi-level caching, like
s <- storr_multi(master = storr_rds("/slow/network/share"), cache = storr_rds("/local/ssd"))
now
s$set
would domaster$set
andcache$set
.s$get
would retrieve the hash frommaster
, check if it has the object incache
and return from there if found. Otherwise the object would be gotten frommaster$get
and then written tocache
for future readers.This way re-reads of a file can be done from SSD once any R session has requested a particular key.
I do notice that
cache
actually needs no keystore (just hash -> object), so maybe there is a better way for the same feature to become available?If the idea is worth trying, I'd be happy to try and code a PR.
Another possible interface to this feature could be the
use_cache
parameter wich currently only enables or disables an environment cache. This could be expanded to include other caches. It must be possible to re-use a previously used cache from another R session - ideally even simultaneously.