openmm / pdbfixer

PDBFixer fixes problems in PDB files
Other
446 stars 113 forks source link

Multi-tenant PDBFixer on a public server #29

Open rmcgibbo opened 10 years ago

rmcgibbo commented 10 years ago

As a hack week type project, it should be fun to turn PDBFixer into a real server-based application that could be run on a publicly available webserver. @proteneer were talking about this at dinner. It would require some significant architectural changes to the uiserver.

Ideally, there would be a single (or small pool) of public servers running a sessioned async webserver (e.g. tornado), and a collection of one or more backend servers. The two would be hooked together with something like celery. Decoupling the task of serving of the UI from running of the minimizations will be the key to making this work.

It's a significant amount of work, but not crazy.

jchodera commented 10 years ago

I like this idea!

Building a nice solution with a small pool of servers sounds like the right engineering approach, but what about the simpler approach of just tossing up a webserver with a single GPU? When many people started to use it simultaneously, it will get slower, but it may be enough to do the trick and might actually be hard to take out.

It may even be possible to do this with an Amazon GPU instance, though this might get pricey.

rmcgibbo commented 10 years ago

I guess someone could try tossing up the current code on a public webserver with a GPU in it if they wanted.

jchodera commented 10 years ago

The UI would still need to be changed to allow a file to be uploaded or a pdbid or URL specified.

rmcgibbo commented 10 years ago

The first and second of those things already exist.

jchodera commented 10 years ago

Oh! Awesome!

peastman commented 10 years ago

Also, there's currently no session management. It assumes it's being used by a single user through a single web browser. So the global state would need to be moved into a per-session state so multiple people could be processing different files at the same time.

proteneer commented 10 years ago

I had a chat with Peter and Robert about this. This should be pretty fun. Some things to note:

findMissingResidues() - CHEAP findNonstandardResidues() - CHEAP replaceNonstandardResidues() - CHEAP findMissingAtoms() - CHEAP removeHeterogens(False) - CHEAP addMissingAtoms() - EXPENSIVE, requires additional contexts addMissingHydrogens(7.0) - SOME WHAT EXPENSIVE, requires additional contexts addSolvent() - EXPENSIVE, requires additional contexts

From an architectural point of view, we probably want several backends that handle the expensive parts asynchronously. If we're doing it REST style, then we pickle PDBFixer objects and implement a few request backend handlers that handle the expensive parts. Note that everything in OpenMM is blocking (i.e. synchronous), so we'd need to probably offload the expensive parts to a separate thread. I recommend tornado since I have a fair bit of experience with its asynchronous style.

A production grade webserver has a lot of moving parts, as we may add a database, or an in-memory cache, etc., so we can't probably won't submit PRs to this particular repo for the webserver parts.

jchodera commented 10 years ago

There's a web-based CHARMM input generator that is pretty handy. Maybe some ideas could be had here? http://www.charmm-gui.org/?doc=input/pdbreader