primaryodors / primarydock

PrimaryOdors.org molecular docker.
Other
6 stars 4 forks source link

Enhancements to best-binding selection. #254

Closed electronicsbyjulie closed 1 year ago

electronicsbyjulie commented 1 year ago
primaryodors commented 1 year ago

There's one pair that's going to be a bit of a challenge, hazelnut pyrazine in OR5K1. According to the literature, residues L104 and L255 are important for agonism, and the authors believe the two ethyl groups are coordinating to these two residues. https://doi.org/10.1101/2022.06.01.494157

Since OR5K1 is known to respond to substituted pyrazines, it will be important to find a pose that aligns not only L104 and L255 with aliphatic substituents, but also some kind of coordination to one or both nitrogens (hydrogen bonding?) and likely also a pi stack. The pose would also energetically favor an active state over the inactive state.

This is not an easy challenge and may require something more involved than the existing best-binding algorithm.

electronicsbyjulie commented 1 year ago

In that case, it may be sufficient to force the algorithm to consider L104 and L255 as having top priority, even though their pairings to the ethyl groups would be strictly van der Waals. Then for a third best binding pair, either F278 or T279 could do. F278 can pi stack with the pyrazine ring while T279 forms a hydrogen bond with one of the nitrogens. Best of all, this pose would require the top of TMR6 to move away from TMR7 and towards TMR5 in order to fit the ligand in that tight space.

I think a public bool member in the Molecule class would be useful here, called something like priority. Molecule and not AminoAcid because we'd want it accessible to conform_molecules(). It would be set to true for all residues mentioned in a CEN RES param, and then it would have multiple effects throughout the code. While searching the conformational space, it would enhance the binding energies to residues that have it. In the best binding algorithm, residues set as priority would be favored during pairing, and pairs involving priority residues would be sorted first in the output from GroupPair::pair_groups().

First I want to get nitrogen heterocycles correctly hydrogen bonding with water in the molecule test.

electronicsbyjulie commented 1 year ago

It looks like the heterocycles are good to go.

electronicsbyjulie commented 1 year ago

OR5K1_hazelnut_pyrazine.active.230523.txt

In this dock, Asp111 forms both pi and ionic bonds to the pyrazine ring, explaining why 1.) it has to be an aromatic ring, 2.) it really should have at least one nitrogen, and 3.) it really helps if it has more than one nitrogen. The two ethyl groups coordinate to Leu104 and Leu255. Tyr259 is brought near Glu108, helping keep the active state stable. The methyl group tucks into a pocket near Val69, adding some vdW binding force to TMR2, and Cys73 forms a polar-pi bond with the ligand.

primaryodors commented 1 year ago

Should be okay to bridge Tyr259 to Glu108 for active docks of OR5K1.

primaryodors commented 1 year ago

Taking a closer look at that dock, there's something wrong with the scoring. Ser71 for example is shown as clashing by about 0.3 kJ/mol with the ligand, when in fact Ser71 is nowhere near the ligand.

electronicsbyjulie commented 1 year ago

The reason for the scoring problem is charge repulsion. The code treats the pyrazine as partially charged so that it can form ionic bonds with acid groups, and that is causing a repulsive force with backbone NH groups of even distant residues.

primaryodors commented 1 year ago

Here's a sheet of heterocycle pKa values: https://www.scripps.edu/baran/heterocycles/Essentials1-2009.pdf

Pyrazine is nowhere near protonated at physiological pH. It can act as a hydrogen/halogen bond acceptor; it can participate in pi-stacking; it can coordinate to metals - but it does not form ionic bonds in the protein environment. https://pubmed.ncbi.nlm.nih.gov/32275822/

electronicsbyjulie commented 1 year ago

Short of creating a new .dat file and matching up molecules' rings to its contents, how can the code know which aromatic nitrogens can protonate under biological conditions and which cannot? What is the pattern that determines this?

primaryodors commented 1 year ago

Try using the formula pow(10, -pH) / pow(10, -pKa). (1) Using a biological pH of 7.4 (2), we find imidazole, the ring present in histidine, is a little less than 32% protonated. This makes it possible for acid groups to form ionic bonds with it. The next nearest pKa values in the chart are the pyridines, and using this formula they all give less than one percent protonation. For pyrazine, the calculation gives about 1.6 ppm of protonation, just a trace amount, and not relevant to docking.

Imidazole, and by extension histidine, is a special case and it is fine to ignore the ionizability of all other nitrogen heterocycles.

  1. https://www.chemteam.info/AcidBase/Calc-percent-dissoc-of-weak-acid.html
  2. https://www.ncbi.nlm.nih.gov/books/NBK507807/
electronicsbyjulie commented 1 year ago

What about in proximity to an acidic side chain? Asp and Glu both have side chain pKas of less than 5. Couldn't the local pH in that part of the protein ionize at least the nitrogen atom of pyridine?

primaryodors commented 1 year ago

Presumably not, since they've already given up their protons to basic side chains elsewhere in the protein. There doesn't seem to be any information about ionic bonds between acidic side chains and pyrazine. It doesn't seem to be a thing.

As far as charge repulsion, it shouldn't involve "ionizable" groups such as imidazole rings. The code would do well to follow a whichever-is-best approach with imidazole; either it is treated as fully charged for purposes of ionic bonding, or it is treated as having zero charge for purposes of avoiding charge repulsion. That decision could be made in the InteratomicForce::total_binding() function and nowhere else.

Feel free to add a special bool to the Atom class to store whether a given pnictogen atom has imidazole-type protonation ability, and only set it to true when the Ring class identifies an imidazole. Later we'll add some kind of .dat file, but I want to keep this PR from becoming another monstrosity.

electronicsbyjulie commented 1 year ago

OR5K1_hazelnut_pyrazine.active.230525.txt

electronicsbyjulie commented 1 year ago

TODO: CC bond count in Ring class imidazole discovery.

primaryodors commented 1 year ago

The output segfault would not be within the scope of this PR.

Looks good to add the carbon-carbon bond count, then test and submit for review.

electronicsbyjulie commented 1 year ago

The OR5K1 model might be slightly off. It looks like a pi stack to Phe278 and a hydrogen bond to Ser254 would effect the desired pose, but TMR7 would have to shift somewhat outward and downward for this to happen.

electronicsbyjulie commented 1 year ago

🟩🟩🟩

electronicsbyjulie commented 1 year ago

The best-binding is failing for aldehydes. It should favor pairing them with basic side chains, but it is pairing them to acids instead.

electronicsbyjulie commented 1 year ago

There are still more problems.

clear; make test/aniso_test && test/aniso_test "CCC[NH2]" H12 O                
Binding energy: -3.09376 out of an optimal -8 kJ/mol.
clear; make test/aniso_test && test/aniso_test "CCC[NH3+]" H12 O                                                                                             
Binding energy: -0 out of an optimal -8 kJ/mol.
electronicsbyjulie commented 1 year ago

The HΓΌckel function is broken. Because of course something is fghjkin broken.

primaryodors commented 1 year ago

Take a break. Maybe go for a walk or listen to some soothing music.

electronicsbyjulie commented 1 year ago

atoms_are_conjugated() is also broken.

electronicsbyjulie commented 1 year ago

You're right.

primaryodors commented 1 year ago

It would appear that the Big Three on their own are not a sufficient test for each PR before merge. The ideal would be to have a master unit test that can be left running (overnight, if necessary) before merging, and a subset unit test of the most easily breakable things, that can be run periodically during development.

electronicsbyjulie commented 1 year ago

Current critera for atom similarity:

    #define both_or_neither_abs_polarity_at_least_point2 15
    #define both_or_neither_abs_polarity_at_least_point333 15
    #define abs_delta_polarity_within_point333 17
     #define one_polar_one_sugarlike_nonpolar 15
    #define same_sgn_charge 20
     #define one_charged_one_neutral_polar 10
     #define opposite_charges -20
    #define both_or_neither_pi_and_both_or_neither_polar 23
     #define neither_pi_one_polar_one_sugary 20
     #define one_polar_both_pi 16
     #define one_polar_only_one_pi 11
    #define abs_delta_elecn_within_point7 10

I think pi bonding should have more influence. Pyrazines for example are grouping some of their aliphatic substituents in with the main ring.

primaryodors commented 1 year ago

Go ahead and mark ready for review when the pyrazine grouping is ready.

electronicsbyjulie commented 1 year ago

I'd like to rewrite the AminoAcid similarity function to compare side chain atom similarities. Maybe something like for each atom of one side chain, pair it with the most similar atom of the other side chain, then average the similarities of all the pairs.

Probably have to suppress aliphatic pairs of charged side chains though.

primaryodors commented 1 year ago

I'd like to rewrite the AminoAcid similarity function to compare side chain atom similarities.

Best to save that for a possible future PR. The amino acid grouping is doing fine at the moment and it's good to follow an "if it works, don't break it" philosophy.

electronicsbyjulie commented 1 year ago

🟩🟨🟩

The TAAR8 dock itself looks great, but the scoring is faulty. The charge on Asp111 is divided equally between the two oxygens, and since the cadaverine ligand only contacts one of them, the app only registers a -30 kJ/mol bond.

primaryodors commented 1 year ago

That's fine for now. Let's put a new feature freeze on this PR unless there's a show stopper. Later development can include some kind of distributed charge class.

electronicsbyjulie commented 1 year ago

Ok.

Retesting the TAAR8 dock to make sure the simulpost commit didn't break it.

electronicsbyjulie commented 1 year ago

🟩🟩🟩

primaryodors commented 1 year ago

Test of neutral trimethylamine and water:

 Created empty molecule named nothing.
# Created molecule named CN(C)C.
# Loaded 0 atoms of CN(C)C into molecule.
# Internal clashes: 0 cu. A.
# Atom N2 has basicity 1
# Bond N2 - C3 can rotate.
# Bond N2 - C4 can rotate.
# Molecule moved to [2.48082,-0.498291,0.124577].
# [1.62395,0.536161,0.476427]
# [2.12395,1.03616,0.976427]
# [2.62395,1.53616,1.47643]
# [3.12395,2.03616,1.97643]
# Loaded test ligand. Intermol clashes: 4.41016 cu. A.
# Rotated molecule 2 by -30 degrees. Intermol clashes: 4.61492 cu. A.
# Initial intermol clashes: 4.61492 cu. A.
# Initial intermol energy level: 12.7049 kJ/mol.

# Post-conformation intermol energy level: 0.524731 kJ/mol.
Energy level above threshold, FAIL.
Saved output.sdf

image

This qualifies as a show stopper. Let's go ahead and include #259 in this PR.

If any other problems arise, note them here for evaluation before starting development on them. Only new serious problems will be approved for this PR; anything else will be written up in a new issue.

electronicsbyjulie commented 1 year ago

image

Scoring is still not correct.

electronicsbyjulie commented 1 year ago

It is still not quite right for hydrogen bonds. OH...O is correct, about 22 kJ/mol, but OH...N only gives about 21 (should be 29), NH...O as much as 11 (should be 8), and NH...N only about 6 (should be 13).

primaryodors commented 1 year ago

It's good enough for now. Later we can mess with partial charges and polarizability.

Testing Big Three; if they all succeed then the PR will be merged.

primaryodors commented 1 year ago

🟩🟩🟩