openforcefield / openff-bespokefit

Automated tools for the generation of bespoke SMIRNOFF format parameters for individual molecules.
https://docs.openforcefield.org/bespokefit
MIT License
57 stars 9 forks source link

Make docker container #340

Open hmacdope opened 2 months ago

hmacdope commented 2 months ago

It would be great if you could deploy bespokefit with a docker container.

hmacdope commented 2 months ago

We can also publish docker packages with ghcr.io

j-wags commented 2 months ago

What's the motivation for deploying on docker? We have to provide documentation and user support for anything we release so I want to make sure this is something that provides our partners value and that we have the time and expertise to maintain. This seems simpler than an entirely new deployment pathway (since it's still just conda under the hood), but every deployment pathway that we have to maintain is an added cost.

We have a dockerhub account that I could give you access to for hosting the images if we go ahead with this. Would the docker image have psi4 and/or xtb (and/or other engines)? And what would the automated testing for this look like?

hmacdope commented 2 months ago

Fair points @j-wags, I should have explained motivation a bit better.

A Dockerfile based container is a very easy way to spool up a service on a host, with all the dependencies and configuration already taken care of, alongside logs and easy status tracking.

This is most useful in a non HPC based environment, where the server is intended to not be ephemeral, but rather exist in an always on mode. This is very common outside of academia and on cloud instances. Docker is pretty much the standard for service based architectures to package their work for easy deployment. Running a python process in tmux is probably too fragile for a more heavyweight deployment.

There are two separate issues here that we should try not to conflate.

1) Having a Dockerfile. This provides people the ability to start bespokefit-server as a docker service by building it themselves.

e.g start a working bespokefit-server instance in two commands (you could do similar for the executor service if you wanted.)

git clone git@github.com:openforcefield/openff-bespokefit.git
docker-compose up -d
# profit 

Possible value adds

A lot of these are probably more relevant to industry than in an academic lab, which is why I wanted to pass this on as an industry adjacent perspective.

2) Deploying a Docker based container to dockerhub or ghcr.io, container registries that house pre-built containers. I prefer ghcr.io as already integrated with github, but dockerhub also very easy to integrate.

e.g

# Hey, what is an easy way to start using bespokefit?

docker run -it ghcr.io/openforcefield/bespokefit
# now in a bespokefit ready environment with the code all ready to roll

Possible value adds

I defs don't want to burden you with any additional maintenance or things you are not comfortable with, just let me know either way. Just thought might be an avenue to explore together. :smile:. Re Happy to jump on a call as well.

We can always go ahead in a fork and/or modify the docker setup to pull in bespokefit from source ( in a standalone repo, rather than being checked into this repo). That is to say for our needs, there are other options available.

Would the docker image have psi4 and/or xtb (and/or other engines)?

My understanding is this is not needed for the server but if you wanted to do the same with the executor or package them both in the same container this is super easy.

What would automated testing look like?

If you wanted to do 2. (also the best way to test 1.) then you can build a container on commit main in CI. Given that the container only packages up the functionality already in the repo, if the tests pass, you can be pretty confident that the Dockerfile will work. You can run the testsuite inside the built container pretty easily if you want also.

j-wags commented 2 months ago

Thanks for the great explanation, and apologies for something I forgot - Josh H is currently the "owner" for bespokefit so I shouldn't be stepping in here unless I strongly object to something.

What would automated testing look like? If you wanted to do 2. (also the best way to test 1.) then you can build a container on commit main in CI... You can run the testsuite inside the built container pretty easily if you want also.

Ah, beyond just testing the stuff inside the container, I know very little about "having processes on different computers/containers talking to each other" - presumably there's some configuration of ports to expose and connect to, places to set addresses and tokens - and it's those settings/docs that I'd be most concerned with testing (also I anticipate a lot of the support requests would be about setting this up on different people's clusters).

I am a little nervous that putting this in our repo is a signal to folks that this is a deployment method that we support, and if external people start using it, it will be hard to take back later. Could we keep it in standalone repo or a branch/fork while we see how it performs, document it, and make any behavior changes?

jthorton commented 2 months ago

Ah, beyond just testing the stuff inside the container, I know very little about "having processes on different computers/containers talking to each other" - presumably there's some configuration of ports to expose and connect to, places to set addresses and tokens - and it's those settings/docs that I'd be most concerned with testing (also I anticipate a lot of the support requests would be about setting this up on different people's clusters).

Just to clarify we are not exposing anything new here the configuration of ports and addresses is already part of bespokefit and users are free to change these settings to suite their needs and we advise them to do this when they use a distributed worker setup. The docker deployment should just make this a lot easier, we essentially want to replicate what alchemiscale does and offer a very easy way to deploy a server and remote workers on any infrastructure which is a massive selling point for industry users, from my very biased view 😄.

j-wags commented 2 months ago

Gotcha. Proceed as you see fit - Just note that once the Docker stuff is pushed to main it is as good as released and in the public API :-)