rocker-org / rocker-versioned

Run current & prior versions of R using docker
https://hub.docker.com/r/rocker/r-ver
GNU General Public License v2.0
297 stars 169 forks source link

What are recommended volumes to link for RStudio images? #122

Closed MarkEdmondson1234 closed 5 years ago

MarkEdmondson1234 commented 5 years ago

I often get asked why the RStudio images are not persistent in the libraries and code installed within them when running on a VM, which for me isn't an issue as I treat them all as temporary and use derived Docker images to add R packages and files.

But, in the spirit of giving people a choice, may I ask what volumes should be linked to a running RStudio container to write these files to the host computer?

I think the below covers the user's home directory, but what else should be linked to keep the R packages persistent:

docker run -p 8787:8787 -e USER=me -e PASSWORD=me -v /home/me:/home/me --name rstudio

Similar Q to this issue https://github.com/rocker-org/rocker/issues/239 but with a question for the specific rocker directory that needs to be linked for writing package installs to the host VM.

cboettig commented 5 years ago

Goo question. I've tried to put some notes on this at https://www.rocker-project.org/use/shared_volumes/.

I'll also add that I generally tell people to avoid linking their user's $HOME on the host directly to the HOME on the container -- the OS stores lots of system/software specific configuration there that really isn't meant to be portable! It is much better to link the home dir of the container to a subdirectory of the host. This isn't a big deal in a vanilla VM that is being used exclusively to run the docker container, but can be a huge headache if you are doing anything outside the container on the host VM.

Of course use cases will differ. I generally find having people not link anything to persist to the host is better -- data can be attached as separate database volumes, code can be moved around over git/GitHub, and this all teaches good data hygiene, like you say.

Also just a note that some folks prefer to first create a separate docker volume which binds the host disk, and link the rstudio container to that. I believe that may have advantages in portability and in performance, but possibly not, it's been a long time since I've looked at that stuff.

MarkEdmondson1234 commented 5 years ago

Thanks, thats good for linking home directories, but I have trouble when trying to link the package install directory - I tried:

                docker run -p 80:8787 \
                                  -e ROOT=TRUE \
                                  -e USER=%s -e PASSWORD=%s \
                                  -v /home/gcer:/home/rstudio \
                                  -v /home/gcer/packages:/usr/local/lib/R/site-library
                                  --name = rstudio \ 
                                  rocker/tidyverse

..but that run into permissions issues - it wouldn't let you install to the /usr/local/lib/R/site-library folder.

This is to cover a 1VM for 1 RStudio, so hopefully will avoid the more complicated setups.

cboettig commented 5 years ago

Did you make sure /home/gcer/packages exists before you ran the above command? Check the permissions of that dir (on the host). The user needs to have write permission there for this to work.

If you link a volume on the LFS that does not yet exist, it will be created by docker run call, but as root, so you will run into a permission error. (Can happen with linking subdirs in the home directory too).

Also, while I get the desire to link the packages dir to save installation, it also makes me hesitant, seems like that could be asking for trouble. again probably not an issue in this tightly controlled case, but for persistence it seems easier to just start/stop/restart your container than delete it and have the installed packages persist? I'm probably not understanding the context very well.

MarkEdmondson1234 commented 5 years ago

That looks like it, it was owned by root so I'll try making it beforehand.

But you say if the container is just stopped and started it will persist the packages? Ahhh. I didn't think that happened, and it started afresh each time, but now I look there is a docker rm rstudio which would mean it pulls the image fresh each time. Ok thanks! That looks like it will suit.

MarkEdmondson1234 commented 5 years ago

I see it was added as the VM restarting started a service, and that conflicted https://github.com/cloudyr/googleComputeEngineR/commit/bc3d906d64fc12015299b6ddb672be9f7cf6e9c6#diff-9a84f8b5647fbd9a9bd0ee29bf20ce95