primaryodors / primarydock

PrimaryOdors.org molecular docker.
Other
6 stars 4 forks source link

The path to predictions #260

Closed electronicsbyjulie closed 1 year ago

electronicsbyjulie commented 1 year ago

To dock ligands in an active conformation protein model would involve:

For ligands below a certain total binding strength, the pre-activation raw PDB dock binding energy would be subtracted.

An accuracy of 80% or better would be just as trustowrthy as the empirical measurements.

primaryodors commented 1 year ago

Can certainly work on this intermittently while solving higher priority issues.

primaryodors commented 1 year ago

Also see:

The takeaway here is that not only are there multiple possible active configurations per receptor, but also there are configurations active for some G proteins but not others, and these dissimilar conformers bind to different sets of agonists. So it is not sufficient to say that any given receptor is sensitive to an odorant, but rather that a receptor-G-protein pair is sensitive to that odorant, or a receptor can be sensitive to it depending on the G protein.

Originally posted by @primaryodors in https://github.com/primaryodors/primarydock/issues/256#issuecomment-1566694837

In other words, truly accurate predictions will rely on not just an active configuration model of the receptor protein, but one (or ones) made to fit whichever G protein(s) are expressed in the olfactory cells. Presumably this is usually Golf for in vivo olfaction, but some of the empirical measurements may have used other G proteins besides Golf, so this may explain some of the self contradictions in the scientific data.

On the other hand, this makes the job easier when it comes to extrapolating active receptor structures. A typical active receptor model would be modified from the AlphaFold original to 1.) fit the Golf structure or other G protein on the cytoplasmic end; 2.) maintain a salt bridge between EXR2 and the extracellular end of TMR6; 3.) otherwise resemble the conformational changes found in the active OR51E2 cryo-EM model; 4.) minimize internal clashes of the protein at least to the energy level of the AlphaFold original.

Once such a model is generated of a receptor-G-protein pair, it can be serialized as a PDB file, in a subfolder named active, with a name that identifies both the receptor and the G protein, and then all docks done on this active state conformer would be hard docks for the sake of performance.

primaryodors commented 1 year ago

Can use the g-protein-based-reshaping branch for development on this.

electronicsbyjulie commented 1 year ago

Some of the contact points between proteins:

GNAS1 - OR51E2:     BW                Conserved?
--------------------------------------------------
Q35   - N136        *4.38             Y
A39   - A132/A133   *4.34-5/*3.61-2   N
D215  - R130        *3.59             mostly
E392  - K294        7.56              ORs only, less one orphan; not TAARs.

R385  - E232        6.29              >50%
E392  - K296        7.58              mostly
D215  - R130        3.59              usually R or H
Y358  - S229        5.70/6.26         mostly; let's call this residue 56.50.
Y391  - H131        3.60              most receptors have an H or an R in the 3.59 position; OR51E2 is unusual having HR.
primaryodors commented 1 year ago

To break the task down into easier pieces:

We've opted to increase this task to the same priority as the unit tests. The two may be developed concurrently, with equal time given to both.

electronicsbyjulie commented 1 year ago

Were it only that simple, but the G proteins change shape too. The chain from 8F76 can be extracted with the new update to pepteditor and will at least fit a class I OR.

electronicsbyjulie commented 1 year ago

So the GNAL protein from 8iw1 is not even a real human G protein but an engineered chimera.

Wonder if it will still suffice.

electronicsbyjulie commented 1 year ago

Very upset. Best binding paired the aliphatic atoms of d-limonene with OR1A1's Asn109 residue. I've added code to prevent pairing polar with nonpolar like that.

primaryodors commented 1 year ago

Don't forget #266 is an urgent task.

electronicsbyjulie commented 1 year ago

The protein coupling app has a memory leak that interferes with testing it locally.

electronicsbyjulie commented 1 year ago

Updated the list of inter-protein contacts.

Recommend rewriting the couple app to accept its own format of config file with lists of coupling pairs. Can offer a way to specify multiple matching aminos, e.g. the one in the list that's usually H or R, as well as a positional tolerance, e.g. specify K7.58 but allow for +/-2 so that K7.56 thru K7.60 will also match.

primaryodors commented 1 year ago

Agreed. Can use a file extension like .cplcfg to distinguish it from a PrimaryDock .config file.

The two proteins can be specified with PROT1 and PROT2 params, overridable on the command line. Then multiple CONTACT params can follow, with a format similar to CONTACT DE215 RH3.59~2 using the tilde to specify positional tolerance. The first argument would apply to protein 1, and the second to protein 2.

If fewer than 3 of the contacts correspond to residues in both proteins, then the app should exit with an error.

electronicsbyjulie commented 1 year ago

There would also want to be params for bridging, homology, and regions for repositioning.

electronicsbyjulie commented 1 year ago

The positions of the EXR and CYT regions can be adjusted instead of moving the TM helices for optimizing the inter-protein contacts. For olfactory GPCRs, the movable regions are CYT1 (between TMR 1 and 2), EXR1 (2-3), CYT2 (3-4), CYT3 (5-6), EXR3 (6-7). EXR and CYT regions that do not contact the G protein or form a salt bridge can then further adjust during soft dock. There are four axes of motion: horizontally outward/inward; horizontally perpendicular to protein XZ center; rotation about Y axis; rotation about radial axis from protein XZ center.

Once the piece has been rotated, the adjoining TM helices are rotated about their other-end residue's CA atom to line up their same-end CA atom with where the piece "expects" it to be. The piece is then moved to align itself with the actual same-end locations, which will probably differ by a fraction of an Angstrom from "expected".

electronicsbyjulie commented 1 year ago

It keeps flinging the TM helices out away from the rest of the protein! 😡

primaryodors commented 1 year ago

Switch gears and work on the unit tests for a while. It will be easier to solve the positioning bug after taking a break from it and coming back fresh.

electronicsbyjulie commented 1 year ago

It's finally a weekend with nothing going on. It's finally the perfect time to really dig into the code and figure out why it keeps messing up.

primaryodors commented 1 year ago

If you can find the cause of the problem tonight, then go ahead and take the time to fix it.

electronicsbyjulie commented 1 year ago

Even if limiting the motions to Δy = 0, TMR6 quickly pulls too far away from the rest of the protein.

The answer won't be improving the existing functionality, but rather performing a 3D conformational search that first lines up the contacts' CA atoms at suitable distances, particularly those of segments that don't move, then adjusts each movable segment to maximize contact with its contact partner. No iterations, just a one-time best fit followed by a one-time fine adjustment.

So, technically, I did find the cause of the problem tonight.

primaryodors commented 1 year ago

I should have been more clear.

Proceed.

electronicsbyjulie commented 1 year ago

There is a hydrophobic pocket in the cryo-EM model formed by I125, I220, V224, L227 of OR51E2 and L388, L393, L394 of GNAS. As these are not direct contacts between individual aminos but rather a general grouping, there ought to be a way to define such a thing as another form of contact point.