The path to predictions

electronicsbyjulie commented 1 year ago

To dock ligands in an active conformation protein model would involve:

Do a positional homology based on the active OR51E2 model;
If there is a TMR6-EXR2 salt bridge, engage it;
Soft-pocket all the TMRs without a ligand to get internal clashes down to pre-homology levels;
- All TMR6 rotations would be centered on the salt bridge residue, and the bridge would be re-formed at the end;
A hard dock could then be done, and the total binding strength to TMR6 times the sum of TMR2 thru TMR5 would be the prediction.

For ligands below a certain total binding strength, the pre-activation raw PDB dock binding energy would be subtracted.

An accuracy of 80% or better would be just as trustowrthy as the empirical measurements.

primaryodors commented 1 year ago

Can certainly work on this intermittently while solving higher priority issues.

primaryodors commented 1 year ago

Also see:

The takeaway here is that not only are there multiple possible active configurations per receptor, but also there are configurations active for some G proteins but not others, and these dissimilar conformers bind to different sets of agonists. So it is not sufficient to say that any given receptor is sensitive to an odorant, but rather that a receptor-G-protein pair is sensitive to that odorant, or a receptor can be sensitive to it depending on the G protein.

Originally posted by @primaryodors in https://github.com/primaryodors/primarydock/issues/256#issuecomment-1566694837

In other words, truly accurate predictions will rely on not just an active configuration model of the receptor protein, but one (or ones) made to fit whichever G protein(s) are expressed in the olfactory cells. Presumably this is usually Golf for in vivo olfaction, but some of the empirical measurements may have used other G proteins besides Golf, so this may explain some of the self contradictions in the scientific data.

On the other hand, this makes the job easier when it comes to extrapolating active receptor structures. A typical active receptor model would be modified from the AlphaFold original to 1.) fit the Golf structure or other G protein on the cytoplasmic end; 2.) maintain a salt bridge between EXR2 and the extracellular end of TMR6; 3.) otherwise resemble the conformational changes found in the active OR51E2 cryo-EM model; 4.) minimize internal clashes of the protein at least to the energy level of the AlphaFold original.

Once such a model is generated of a receptor-G-protein pair, it can be serialized as a PDB file, in a subfolder named active, with a name that identifies both the receptor and the G protein, and then all docks done on this active state conformer would be hard docks for the sake of performance.

primaryodors commented 1 year ago

Can use the g-protein-based-reshaping branch for development on this.

electronicsbyjulie commented 1 year ago

Some of the contact points between proteins:

GNAS1 - OR51E2:     BW                Conserved?
--------------------------------------------------
Q35   - N136        *4.38             Y
A39   - A132/A133   *4.34-5/*3.61-2   N
D215  - R130        *3.59             mostly
E392  - K294        7.56              ORs only, less one orphan; not TAARs.

R385  - E232        6.29              >50%
E392  - K296        7.58              mostly
D215  - R130        3.59              usually R or H
Y358  - S229        5.70/6.26         mostly; let's call this residue 56.50.
Y391  - H131        3.60              most receptors have an H or an R in the 3.59 position; OR51E2 is unusual having HR.

primaryodors commented 1 year ago

To break the task down into easier pieces:

In the Protein class, create a ftn that accepts a pointer to another Protein.
- For every amino acid in the local object, and every residue in the other protein, sum up their reach values and, if their CA-CA distance is less than or equal to the summed reach (plus an interaction distance of say 2.5A), add both residues to a std::vector (and mark both residues "dirty" to prevent duplicates). Then return that vector.
Next, create a new app, with all the necessary entries in makefile.
- The app will accept two command line arguments: a PDB of a GPCR, and a PDB of a G protein. It will use the three conserved points of contact above, omitting the A39-AA34.xx contact which is not conserved in ORs.
Any GPCR lacking BW can still be used if it has the right motifs:
- DRYXAICXPLXY finds *3.59 as the last X.
- After that motif comes four more residues and then *4.38.
- 7.56 is the fourth from last letter of the PMLNPLIYSLRNKD motif.
If any motifs are not found and there are no REMARK rows indicating BW numbering, then an error is generated.
G protein alignment shall be accomplished entirely by motif:
- There is an EKQLQKD motif early in the sequence; 2 letters after it is Q35.
- D215 is the last letter of GIFETKFQVD.
- E392 is the E of the terminal YELL motif.
Once the contact points have been identified, the GPCR shall be uprighted and the G protein positioned so that the YELL-7.56 contact is at first superimposed. Next, the G protein shall be rotated about the Y axis to bring its Q contact near the *4.38 residue.
The exact positioning of the G protein will then be adjusted to place GPCR:*4.38 in the +Y direction from GNAX:Q(35), with the optimal Y distance for polar contact between the two residues.
If there is a salt bridge for TMR6-EXR2, then the helix shall be rotated in such a way as to facilitate that salt bridge contact. Then the entire TMR6 helix will be rotated outward, by its salt bridge if it has one, else by its 6.48 residue, until it avoids clashes with the G protein. If there is no salt bridge and the +Y end is clashing with the other parts of the protein, then it shall be rotated outward, centering rotation on the -Y end, to avoid clashes.
The output of the new inter-protein function can be used to fine tune the positioning of the G protein and of all TMRs, and to conform the side chains.
At this point, the two proteins can be written to a single combined PDB. Create the ability to specify a chain letter for each protein. The GPCR can be chain A, and the G protein chain B. Current code is that for docking purposes, PrimaryDock will only load chain A. The file should not be saved in the usual pdbs/OR##/ folder, but there should be a dedicated folder under pdbs/ for storing coupled models, also organized under OR##/ folders like the root of pdbs/ has.
It would be optimal to also flex the backbone between TMR helices and rejoin the strands that inevitably get broken. What might be useful for this task would be to use Molecule::conform_molecules() and include the covalent bonds between adjacent residues in the conformer search. Each motion could be automatically accepted if it results in bettering either the worst or the average of the set of covalent bonds.

We've opted to increase this task to the same priority as the unit tests. The two may be developed concurrently, with equal time given to both.

electronicsbyjulie commented 1 year ago

Were it only that simple, but the G proteins change shape too. The chain from 8F76 can be extracted with the new update to pepteditor and will at least fit a class I OR.

electronicsbyjulie commented 1 year ago

So the GNAL protein from 8iw1 is not even a real human G protein but an engineered chimera.

Wonder if it will still suffice.

electronicsbyjulie commented 1 year ago

Very upset. Best binding paired the aliphatic atoms of d-limonene with OR1A1's Asn109 residue. I've added code to prevent pairing polar with nonpolar like that.

primaryodors commented 1 year ago

Don't forget #266 is an urgent task.

electronicsbyjulie commented 1 year ago

The protein coupling app has a memory leak that interferes with testing it locally.

electronicsbyjulie commented 1 year ago

Updated the list of inter-protein contacts.

Recommend rewriting the couple app to accept its own format of config file with lists of coupling pairs. Can offer a way to specify multiple matching aminos, e.g. the one in the list that's usually H or R, as well as a positional tolerance, e.g. specify K7.58 but allow for +/-2 so that K7.56 thru K7.60 will also match.

primaryodors commented 1 year ago

Agreed. Can use a file extension like .cplcfg to distinguish it from a PrimaryDock .config file.

The two proteins can be specified with PROT1 and PROT2 params, overridable on the command line. Then multiple CONTACT params can follow, with a format similar to CONTACT DE215 RH3.59~2 using the tilde to specify positional tolerance. The first argument would apply to protein 1, and the second to protein 2.

If fewer than 3 of the contacts correspond to residues in both proteins, then the app should exit with an error.

electronicsbyjulie commented 1 year ago

There would also want to be params for bridging, homology, and regions for repositioning.

electronicsbyjulie commented 1 year ago

The positions of the EXR and CYT regions can be adjusted instead of moving the TM helices for optimizing the inter-protein contacts. For olfactory GPCRs, the movable regions are CYT1 (between TMR 1 and 2), EXR1 (2-3), CYT2 (3-4), CYT3 (5-6), EXR3 (6-7). EXR and CYT regions that do not contact the G protein or form a salt bridge can then further adjust during soft dock. There are four axes of motion: horizontally outward/inward; horizontally perpendicular to protein XZ center; rotation about Y axis; rotation about radial axis from protein XZ center.

Once the piece has been rotated, the adjoining TM helices are rotated about their other-end residue's CA atom to line up their same-end CA atom with where the piece "expects" it to be. The piece is then moved to align itself with the actual same-end locations, which will probably differ by a fraction of an Angstrom from "expected".

electronicsbyjulie commented 1 year ago

It keeps flinging the TM helices out away from the rest of the protein! 😡

primaryodors commented 1 year ago

Switch gears and work on the unit tests for a while. It will be easier to solve the positioning bug after taking a break from it and coming back fresh.

electronicsbyjulie commented 1 year ago

It's finally a weekend with nothing going on. It's finally the perfect time to really dig into the code and figure out why it keeps messing up.

primaryodors commented 1 year ago

If you can find the cause of the problem tonight, then go ahead and take the time to fix it.

electronicsbyjulie commented 1 year ago

Even if limiting the motions to Δy = 0, TMR6 quickly pulls too far away from the rest of the protein.

The answer won't be improving the existing functionality, but rather performing a 3D conformational search that first lines up the contacts' CA atoms at suitable distances, particularly those of segments that don't move, then adjusts each movable segment to maximize contact with its contact partner. No iterations, just a one-time best fit followed by a one-time fine adjustment.

So, technically, I did find the cause of the problem tonight.

primaryodors commented 1 year ago

I should have been more clear.

Proceed.

electronicsbyjulie commented 1 year ago

There is a hydrophobic pocket in the cryo-EM model formed by I125, I220, V224, L227 of OR51E2 and L388, L393, L394 of GNAS. As these are not direct contacts between individual aminos but rather a general grouping, there ought to be a way to define such a thing as another form of contact point.

primaryodors / primarydock

The path to predictions #260