Open kendonB opened 6 years ago
+1 if the solution applies to all storr
drivers and does not require a specialized new driver.
Oh wow, this would be a huge can of worms :)
I'm not sure how this could be done with reasonable performance tbh. At this point you'd be better off using something like the redis backend and having an actual client/server relationship - that's what we do here at work for this use case. Shared filesystems also work fairly well in practice with the rds driver because I designed storr to avoid some file locking issues - we have that working with hundreds of jobs without running simultaneously (though some of this comes down to careful design of keys and farming out jobs, etc).
A further option, if just reading from a remote drake session is desired - rsync it down over ssh?
Does the redis driver avoid file locking issues as beautifully as the rds driver?
The redis server is single threaded so a lot of things are just not possible to do wrong (data comes in in a single stream dealt with one at a time). The downside of redis for what you are doing is that it's in memory only
This is unrelated to ssh
and HPC systems, but what about integration with googleCloudStorageR? It is already making its way to memoise.
Hi @kendonB , I'm assuming that some time in the span of two years you've figured some workaround or solution to this, but for anyone with the same question:
The example I'm thinking of is reading from a
drake
cache on an HPC system to my local computer within a local R session.
It IS fully possible to do this, using sshfs, and all things considered actually works pretty well / doesn't impose significantly greater overhead for reads.
Essentially, sshfs
allows you to mount any remote volume over ssh, and once the your cache is mounted to some local mount point, accessing it is identical to normal cache usage.
That said, unless your local terminal is somehow very close to the remote fs, I wouldn't advise actually trying to run any drake::make
s on it, but I'm assuming you're more interested in reducing friction in accessing the cache interactively.
Also, I assume that using certain DBI backends such as Postgres also support this implicitly, since a Postgres DBI connection requires the user to specify the DB server's address anyway.
Hi @mstr3336 - I didn't get a real workaround for this.
I currently make use of Xforwarding, the RStudio nested terminal, and rsync to move info from the HPC to local.
My use case for this issue was really just for making graphs locally as the iteration process is a bit faster
I currently make use of Xforwarding, the RStudio nested terminal, and rsync to move info from the HPC to local.
That sounds complicated! Do you run RStudio desktop on the remote, and operate the GUI locally?
I'm curious as to the UX of that, I considered doing the same, but found building RStudio and all of its GTK (Or whatever GUI library it uses) dependencies too painful on our HPC.
If you're not happy with your current workflow, I actually do recommend you see if SSHFS works for you. It's supported on Linux and OSX (It's got a few small quirks on OSX) and works really quite well.
I'm not too sure about Windows
I could be totally off when I say this, but what about a REST/plumber
API for storr
s? Would that help?
Is it feasible to conveniently read from a remote storr through an ssh connection? The example I'm thinking of is reading from a
drake
cache on an HPC system to my local computer within a local R session.