openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
309 stars 90 forks source link

Test and add docker installation instructions #416

Open j-wags opened 4 years ago

j-wags commented 4 years ago

Thanks to @tyggna for the idea and explanation

Currently, Windows users don't have any way to use the Open Force Field Toolkit. Docker could provide a way for them and people in other circumstances to use the toolkit.

Todd Millecam 16:44

Here's a quick rundown of how to use Docker. It starts with a file named Dockerfile and you build the container by running the command docker build . -t <name> > Here's the dockerfile to build something based on CentOS7, Cuda, and something that already exists in Conda:


FROM nvidia/cuda:10.0-devel-centos7

RUN yum update -y RUN yum install -y epel-release wget cmake RUN yum update -y RUN yum install -y conda sudo RUN conda create -y -n root RUN conda install -y -c omnia -c conda-forge openmm


> but, once you have figured out the commands you need to get it working on your local machine, you can just put those in as a RUN line in the Dockerfile
> to use it, the most basic command is:

> `docker run <name>`

> to add a gpu to the container, you do

> `docker run --gpus 1 <name>`

> and to add an external file/database to it you type:

> `docker run --gpus 1 -v my_db_location:/data <name>`

> to debug it and go in and actually see the code and methods that are distributed, most docker images will support an interactive shell:

> `docker run -it <name> /bin/bash` 

> and then you will have a terminal inside your container, and it'll behave very similar to a virtual machine that contains all your dependencies and libraries

Jeffrey Wagner 16:55
> Very cool -- Thanks, Todd! I'll try these out in the next few days.
> 
> In terms of best practices, if we wanted to distribute builds using Docker, would we want to distribute the dockerfile, or a zipped Docker image for each version?

Todd Millecam 16:57
> it's probably best to distribute using Docker, but both work fine.  Docker has mechanisms for hosting your own repository, and from copying from one repo to another
> 
> so people behind a firewall can host a Docker repo in their DMZ (something they probably already have), type two commands and still get it, or they can download a .tar and run one command

Jeffrey Wagner 17:00
> > it's probably best to distribute using Docker

> By this, do you mean DockerHub?

Todd Millecam 17:02
> yeah, DockerHub is just the public repo hosted by the Docker company

> but anyone can make a repo, and the docker command can move containers between repos
davidlmobley commented 4 years ago

Is this something Todd would want to take on/get working?

jchodera commented 4 years ago

@Lnaden has extensive experience with crafting Docker containers that minimize the size of the resulting image, and can likely provide useful input.

One thing he suggested early on is to not craft a Dockerfile with multiple RUN statements, since each one creates a new layer that must be downloaded, adding to size and slowing down docker image retrieval. Instead, we want to coalesce as many commands as we can into a single call so that only the diff to the final state is stored in the image layer.

It should be very easy to put this together, but some questions:

j-wags commented 4 years ago

I suspect the instructions above are actually sufficient to get things running (just replacing openmm with openforcefield). Some other important considerations are:

tyggna commented 4 years ago
  * Do we tell people how to install docker?

Yes, specifically version 19 or newer, that's where convenient support for GPUs was added. OpenMM can leverage GPU and works inside the nvidia base container, it adds 200MB to the download size to have it for the OpenCL version, and 900MB for the CUDA version (plus CUDA needs a license disclaimer on it)

  • Getting files in and out is a little bit complicated, so we should probably add instructions for that I'll write the documentation for that

  • Do we expect this to have good performance relative to not-containerized runs? If we expect a 75% performance loss, we may want to rethink if we really want to encourage people to spend GPU time on this Containers tend to run near-native performance on Linux since they utilize the host OS kernel. I don't know about Windows

    • Where to distribute these assets
  • Could put whole images in the GH Release Assets section

  • Could host on DockerHub or other docker repo

  • Could just paste the above dockerfile, maybe with pinned OFFTK version depending on whether the instructions are attached to a release It is often a good idea to post both the container to DockerHub, and the Dockerfile there as well, AND to have the Dockerfile contained in your git repo next to the code.

tyggna commented 4 years ago

Also, this sample Dockerfile is a bit of a hack, but this does build just fine

FROM nvidia/opencl:devel-centos7
RUN yum update -y && \
    yum install -y epel-release && \
    yum update -y && \
    yum install -y conda && \
    conda create -y -n root && \
    conda install -y -c omnia -c conda-forge openforcefield