richfitz / storr

:package: Object cacher for R
http://richfitz.github.io/storr
Other
116 stars 10 forks source link

Add support for remote storrs? #61

Open kendonB opened 6 years ago

kendonB commented 6 years ago

Is it feasible to conveniently read from a remote storr through an ssh connection? The example I'm thinking of is reading from a drake cache on an HPC system to my local computer within a local R session.

wlandau-lilly commented 6 years ago

+1 if the solution applies to all storr drivers and does not require a specialized new driver.

richfitz commented 6 years ago

Oh wow, this would be a huge can of worms :)

I'm not sure how this could be done with reasonable performance tbh. At this point you'd be better off using something like the redis backend and having an actual client/server relationship - that's what we do here at work for this use case. Shared filesystems also work fairly well in practice with the rds driver because I designed storr to avoid some file locking issues - we have that working with hundreds of jobs without running simultaneously (though some of this comes down to careful design of keys and farming out jobs, etc).

A further option, if just reading from a remote drake session is desired - rsync it down over ssh?

wlandau-lilly commented 6 years ago

Does the redis driver avoid file locking issues as beautifully as the rds driver?

richfitz commented 6 years ago

The redis server is single threaded so a lot of things are just not possible to do wrong (data comes in in a single stream dealt with one at a time). The downside of redis for what you are doing is that it's in memory only

wlandau-lilly commented 6 years ago

This is unrelated to ssh and HPC systems, but what about integration with googleCloudStorageR? It is already making its way to memoise.

strazto commented 4 years ago

Hi @kendonB , I'm assuming that some time in the span of two years you've figured some workaround or solution to this, but for anyone with the same question:

The example I'm thinking of is reading from a drake cache on an HPC system to my local computer within a local R session.

It IS fully possible to do this, using sshfs, and all things considered actually works pretty well / doesn't impose significantly greater overhead for reads.

Essentially, sshfs allows you to mount any remote volume over ssh, and once the your cache is mounted to some local mount point, accessing it is identical to normal cache usage.

That said, unless your local terminal is somehow very close to the remote fs, I wouldn't advise actually trying to run any drake::makes on it, but I'm assuming you're more interested in reducing friction in accessing the cache interactively.

strazto commented 4 years ago

Also, I assume that using certain DBI backends such as Postgres also support this implicitly, since a Postgres DBI connection requires the user to specify the DB server's address anyway.

kendonB commented 4 years ago

Hi @mstr3336 - I didn't get a real workaround for this.

I currently make use of Xforwarding, the RStudio nested terminal, and rsync to move info from the HPC to local.

kendonB commented 4 years ago

My use case for this issue was really just for making graphs locally as the iteration process is a bit faster

strazto commented 4 years ago

I currently make use of Xforwarding, the RStudio nested terminal, and rsync to move info from the HPC to local.

That sounds complicated! Do you run RStudio desktop on the remote, and operate the GUI locally?

I'm curious as to the UX of that, I considered doing the same, but found building RStudio and all of its GTK (Or whatever GUI library it uses) dependencies too painful on our HPC.

If you're not happy with your current workflow, I actually do recommend you see if SSHFS works for you. It's supported on Linux and OSX (It's got a few small quirks on OSX) and works really quite well.

I'm not too sure about Windows

wlandau commented 4 years ago

I could be totally off when I say this, but what about a REST/plumber API for storrs? Would that help?