[user story] Support for compound progression for open science Diamond Light Source XChem / Fragalysis projects

jchodera commented 2 years ago

In broad terms, what are you trying to do?

The Diamond Light Source XChem facility provides a fragment screening service for public, open science targets for which better ligands are desired. Fragment screening produces a number of structures with fragments bound, which are then shared through Fragalysis, a platform for visualizing interactions and identifying potential new compounds to purchase. Using the Fragment Network approach, compounds purchasable from 16.5B compound Enamine REALSpace ($100/compound, 4-6 week delivery, >80% synthesis success rate) highly related to the existing bound compounds are identified by Fragalysis, but no information about putative binding pose or relative affinity is currently provided.

For each of these public projects (currently ~12/year, projected to increase 5x soon), we would like to use the fah-alchemy infrastructure to

dock all proposed related Enamine REALSpace compounds using their closest X-ray structures in a manner that can be used for relative free energy calculations
run relative alchemical free energy calculations between X-ray structures and new proposed compounds (and, if feasible, between different compounds with X-ray structures; may need to be absolute free energy calculations)
provide the resulting pose predictions and computed affinities for import or push into fragalysis

How do you believe using this project would help you to do this?

Reusing components of this infrastructure used in other projects would allow us to easily automate this process and provide an enormous amount of value for researchers around the world looking to discover potent, open chemical matter for targets with little or no chemical matter yet known.

What problems do you anticipate with using this project to achieve the above?

While the scale of each batch of calculations may be large (one target/month, 10-20K compounds/target), the calculations are infrequent.

No affinity data will be available in this case, so absolute free energy calculations of all compounds for which X-ray structures are available would be necessary to provide absolute free energy predictions for all predicted compounds on the same scale.

Since a large variety of targets will be processed, there may be many more unexpected issues (missing density, cofactors, protonation states, other post-translational modifications) than with other well-understood benchmark systems. On the other hand, predictions will be tested experimentally at the expense of others, providing an incredibly valuable amount of feedback for force fields and free energy methods.

IAlibay commented 2 years ago

run relative alchemical free energy calculations between X-ray structures and new proposed compounds

Having had many discussions with Harold on the follow-up to the SAMPL7 / PHIP work and also my own (mis)adventures with fragments, there are often gains in running cheaper methods (e.g. MM/GBSA) to reduce some of the noise prior to starting FEPs.

One of the things I would be interested in seeing OpenFE do post year 1 (pending discussions & also interest from funders) would be to have hierarchies of methods in free energy campaigns that would allow for on-the-fly prioritization of network paths without direct intervention.

So for example: MMGBSA -> get top N% nodes -> ABFE -> RBFE to elaborations or MMGBSA -> prioritisation of "best edges" for RBFE.

Would doing this within a single F@H campaign for something like this be of interest?

jchodera commented 2 years ago

I absolutely agree! My favored choice would be to use a structure-enabled machine learning method, however, that could be a surrogate for the more expensive free energy calculations and prioritize allocation of effort to free energy calculations. An early attempt at this (for ligand-based ML only) is this Schrodinger example.

Enabling support for these "surrogate models" in active learning loops or pre-filtering/prioritization models would be neat.

dotsdl commented 2 years ago

Raw notes from story review, shared here for visibility:

drug discovery campaign support
wants to dock all proposed Enamine REALSpace compounds from Fragalysis using closest X-ray structure with bound fragment
not sure what is meant by "between different compounds with X-ray structures" compared to "between X-ray structures and new proposed compounds"; what is the distinction?
vision is for downstream artifacts to be processable by (external) process and pushed to fragalysis in a way that can be used for ranking candidate compounds
from IAlibay: some preprocessing for deciding on network paths could be a priority for OpenFE year 2; these would help reduce noise in downstream ABFE/RBFE

openforcefield / alchemiscale

[user story] Support for compound progression for open science Diamond Light Source XChem / Fragalysis projects #7