openforcefield / openff-bespokefit

Automated tools for the generation of bespoke SMIRNOFF format parameters for individual molecules.
https://docs.openforcefield.org/bespokefit
MIT License
59 stars 9 forks source link

BespokeFit client #351

Closed jthorton closed 2 weeks ago

jthorton commented 3 months ago

Description

This PR introduces a simple bespokefit client which splits out the logic of communicating with the executor. This should make it easier to submit and retrieve calculations from an executor on another machine using the python API similarly to Alchemiscale or QCArchive. To enable this we also expose the BEFLOW_GATEWAY_ADDRESS which can be used to provide the location of an executor on another machine which is currently not possible.

We also introduce some basic authentication which checks that the value of the new setting BEFLOW_API_TOKEN matches between the server and the client value before responding to requests.

These features should make it easier to provide a light weight bespokefit-client package in future for applications which only need to interface with a bespokefit executor.

Todos

Notable points that this PR has either accomplished or will accomplish.

Questions

Status

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 89.84772% with 20 lines in your changes missing coverage. Please review.

Project coverage is 91.24%. Comparing base (f7cd2a6) to head (9e5e9e2). Report is 9 commits behind head on main.

Additional details and impacted files
jthorton commented 3 months ago

Hi @j-wags , @mattwthompson , @Yoshanuikabundi I wasn't sure who to ask for a review here so feel free to ignore if I got the wrong person! The main part I need feedback on is if I should keep the old methods on the executor and add deprecation warnings to them or if I can get away with removing them and having this be a major version change, happy to do either though. Also, any feedback generally would be great!

mattwthompson commented 3 months ago

To be frank I'm not really sure where to start reviewing this or how to provide feedback; I skimmed some of the diff while nodding along, but the code alone doesn't do much to get me thinking about where I should really focus (and these sort of details kinda need to be spoonfed to me).

Also you're the owner of this repo so it's your playground to do with what you please https://github.com/openforcefield/status?tab=readme-ov-file#openff-maintainer-dashboard

j-wags commented 3 months ago

I'll give this a review today, mostly with an eye on the API change/deprecation questions.

jthorton commented 3 months ago

Thanks @mattwthompson and @j-wags! I think I have covered everything now but it would be great if you could give the docs a quick look over to make sure I didn't miss an old use of the executor!

dotsdl commented 1 month ago

I think it's generally wise to offload handling of TLS to a TLS termination proxy, such as traefik. This can also handle certificate renewals automatically when using e.g. Let's Encrypt as a certificate authority.

Since openff-bespokefit uses fastapi for its HTTP APIs, this page gives a great explanation for how these pieces fit together: https://fastapi.tiangolo.com/deployment/https/

Do we have docs on how to deploy openff-bespokefit's server components, especially for a long-lived instance accessible across the network/internet? What may make sense is to offer a docker-compose-based deployment path like we do for alchemiscale, since this includes deployment and configuration of traefik all in a single deployable config.

dotsdl commented 1 month ago

That said, if a user deploys openff-bespokefit's server components to a host inside of a closed network (not reachable from the internet), you can get away with not using TLS for HTTPS communication and instead use only raw HTTP.

So, this comes down to decisions by an individual user/deployer as to how they plan to expose their openff-bespokefit instance.

If a user is deploying openff-bespokefit on the same host they intend to use the client on, then there is definitely no need for TLS. Typical use of, say, jupyterlab is an example of this.

Does this help clarify paths forward?

jthorton commented 1 month ago

Thanks @bennybp and @dotsdl for the response this definitely helps and hopefully clears things up for @j-wags.

What may make sense is to offer a docker-compose-based deployment path like we do for alchemiscale, since this includes deployment and configuration of traefik all in a single deployable config.

Yes this is a great idea and something I want to add and started in #342 , but as you point out most of the time users won't need this as they can run it on a local machine, but hopefully we can document how to do this on AWS as we have with the ASAP deployment.

j-wags commented 1 month ago

Yes, thank you both for the clarification. I'm fine with the client functionality being added, but I don't think the token field makes sense if the two situations where it's being used are:

  1. when the client and server are on the same machine/a closed network, and no authentication is needed in the first place
  2. when the client and server are communicating through the open internet, where third parties can see the token in plaintext

So I'm happy to consider this PR without any use of tokens and making no security guarantees (and a documented warning that using this over the open internet is not secure). Or I'm happy to have tokens with https-by-default to enable secure communication. But I don't want to put this out with seemingly-secure tokens that aren't actually secure.

mattwthompson commented 1 month ago

Who owns this project now? @jthorton is currently listed as the contact https://github.com/openforcefield/status?tab=readme-ov-file#openff-maintainer-dashboard

j-wags commented 1 month ago

Josh is indeed currently the owner of Bespokefit and could unilaterally merge this if he wanted. However this (to me) is a serious enough issue that I'd bring it up with the lead team if this went ahead in its current state.

The context around this PR is that ASAP is building software pipelines to discover new drugs, and they must implement data security to avoid massive IP/legal trouble if the data leaks. Using structurally insecure token authentication presents a very large risk for them, and has the possibility of catching us in the blast radius if something goes wrong. And this is before the issue of us pushing insecure token authentication to OUR partners as well.

jthorton commented 1 month ago

Yes, thank you both for the clarification. I'm fine with the client functionality being added, but I don't think the token field makes sense if the two situations where it's being used are:

Yeah I could see users doing option 2 rather than option 3 which is to deploy behind treafik and thinking that the server is secure when its not, but I am not sure if its possible for us to automatically enforce/ enable https by default without adding a lot of complexity. I think the best thing we can do is to provide the docker file with treafik and detailed deployment docs outlining how this works and cases when it is secure similar to alchemiscale.

We could also have warnings emitted if the user sets a authentication token and tries to connect to a http address rather than https and warn that its not secure?

j-wags commented 1 month ago

To recap our discussion in the Newcastle/bespoke meeting today - Josh's proposal to emit a warning when a client connects to anything except localhost or 0.0.0.0 using http works for me.

We didn't talk about steps forward for the dockerfile - I'd prefer that go into a separate PR given that the warning-on-http plan gets this to a mergeable state, and we'll need to do some thinking about the devops/testing/documentation of a docker deployment pathway.

jthorton commented 2 weeks ago

Thanks @j-wags I have added the warning along with a test please feel free to suggest a changing to the wording of the warning I wasn't quite sure of how urgent to make the warning!

jthorton commented 2 weeks ago

Thanks @j-wags, there should be no changes to past commands or scripts as we export all of the old functions still but run them through the new client feature and yes a 0.4.0 version would be great to mark this feature!