primaryodors / primarydock

PrimaryOdors.org molecular docker.
Other
6 stars 4 forks source link

Probability-based conformer search #275

Open primaryodors opened 1 year ago

primaryodors commented 1 year ago

How feasible would the following algorithm be?

Suppose the docking space (the box containing the binding pocket) were divided into 0.67Å voxels. Each atom of the ligand can then be given a probability that it might be located in that voxel, based on its potential binding energies (ignoring anisotropy) and whether it would clash with an existing atom.

Suppose further that the ligand atoms' probabilities were "collapsed", one by one, starting with the atom having the strongest localized binding potential. Collapsing the probability would mean zeroing out all negative probabilities (e.g. due to clashes), dividing each positive probability by the sum of all positive values, then committing the atom to one X,Y,Z coordinate.

As successive ligand atoms are collapsed, they would organize back into the molecular structure roughly docked into the binding pocket. The interatomic distances can then be iteratively corrected.

Probabilities would have to take stereoisomerism into account, including preservation of E-Z isomers and enantiomers.

Can this be done, or at least approximated, in a reasonable amount of processor cycles?

electronicsbyjulie commented 1 year ago

Well, a typical pose search box will have a little over 10x10x10 voxels, or 1000 voxels, by that measurement. 0.67Å seems very specific - why not offer a resolution parameter and default it to 1/10 the box dimension if unspecified?

A typical dock might include about 500 receptor atoms for ligand-protein interactions. So each atom of the ligand would require 500,000 computations to find its probability distribution. It's not unreasonable to double or even quadruple that figure if a decent resolution is selected, and the number of receptor atoms exceeds 500. This is really a lot of computation even for small ligand molecules.

The good news is we can exclude hydrogens from the probability algorithm, so long as we compute the heavy atom distances for things like hydrogen bonds. That's easy enough to approximate - just add 1Å to the binding distance from bindings.dat. As long as the ligand has polar atoms and/or pi atoms, we can omit saturated aliphatic carbons. Otherwise, we could include just one or two saturated carbons. If the molecule has an aromatic ring, an algorithm could select one atom from that ring.

This will limit the number of probability distributions to between 1 and 5 for most odorants.

Once the first atom's probability field has been collapsed, the next atom to have its field collapse will be the first atom's lowest numbered uncollapsed-field bonded atom. Its field will be a sphere centered on the first atom, having a radius equal to the covalent bond length, with points on the sphere having probabilities based on avoiding clashes and seeking interatomic binding forces.

After the second atom's field collapses, each subsequent atom will be the lowest numbered uncollapsed-field atom with a bond to a collapsed-field atom, until no atoms are left whose locations have not been determined.

Iterations of the conform_molecules() function can then fine tune the pose, including performing side chain flexions.

Multiply the above by the number of poses. Each pose would start by collapsing a random atom's field to a weighted random location, so a diverse array of poses would be output. Only the top 1 or 2 poses would likely represent the true conformer of the ligand inside the binding pocket.

This would take longer to compute than the existing search methods, but whether it would take twice as long or a hundred times as long, I'm not sure.