Open jhjensen2 opened 3 years ago
@jhjensen2 Would be really interesting if you could include a variety of scoring functions.
Do you mean scoring functions in Glide (like SP and HTVS), or other docking programs, like Vina?
I was thinking of GNINA, RF-Score, X-score etc. We could also see if other groups would like to use this dataset to test other custom scoring functions.
Ah, OK. Is the idea to get some kind of consensus model? We have some experience with SMINA, but not the other ones you mention.
Firstly, amazed by your work!
Secondly, If I am not thinking in the wrong way, my understanding of Chris's suggestion is:
Using different scoring functions (rfscore, xscore, similar to Schrodinger's glides SP & XP) to evaluate the same dataset. So, yes, like a consensus model (not sure if I am using the right terminology, like a crossing docking?)
And I remember the SMINA programme has rfscore functionality compiled in it.
IDK if I am right, and hope this would help! And please bear my shallow understandings!
smina does not have rfscore built-in, but it isn't hard to get rfscore working (https://github.com/oddt/rfscorevs).
Do any of these compounds have known activity?
If you grab the gnina binary (https://github.com/gnina/gnina/releases/download/v1.0.1/gnina) you can rescore your already docked poses with
gnina -r receptor.pdb -l docked_poses.sdf.gz --minimize -o gnina_minimized.sdf.gz
Embedded in the output file will be the Vina default empirical score (minimizedAffinity), and estimates of the pose quality (CNNscore) and binding affinity (CNNaffinity) from a convolutional neural network (there is also a CNN_VS field which is CNNscore*CNNaffinity, but I don't know yet if this is a useful thing). This command will be much faster if run on a machine with a GPU, but a GPU is not required.
It would also be interesting to redock the full set with GNINA.
@jhjensen2 would it be possible to provide a link to a file containing the 260K structures that were docked?
@drc007 The structures of the zinc database files used for docking are available here: https://www.dropbox.com/s/yzgv8emshj8391j/zinc_sdf.tar.bz2?dl=0
The enamine database structures are here: https://www.dropbox.com/s/dg34ay1utkaqvtw/enamine_sdf.tar.bz2?dl=0
Great! @cstein do you also have the Enamine compounds?
@dkoes none of the compounds have known activity against MurC AFAIK. The aim is to find possible alternative leads to the compound family found by AZ. We hope to identify good binders using genetic algorithms/docking but the advantage of the compounds in the Enamine set is that they are purchasable.
I have updated my comment above with both structure files from the zinc database and the enamine database.
Great work, great discussion all. We'll be discussing next steps Tue 6th July 2pm London if you'd like to join - see #47. Obvious question: the top scoring compounds identified from Zinc (v nice scores) - how do we most easily get our hands on them?
Copy all the SMILES into the SMILES search in MolPort: https://www.molport.com/shop/find-chemicals-by-smiles
For MTO vendors, you'll have to go through them individually.
I have a python script that can search the ZINC database for vendors.
@drc007 If you send me the script I can try to include it in the notebook
@jhjensen2 I've emailed it to you.
Got it. Looks like I need a ZINC ID though, which I don't have. Any idea how I get that from the SMILES? Or is it possible to modify the script to search with with SMILES?
I presumed they would be with the structures downloaded from ZINC?
This script might do the trick
I've played around with different options for finding vendors for the ZINC compounds and MolPort seems to work the best, i.e. it is most "honest" about what in fact can and cannot be purchased and suggests similar compounds in the latter case. From what I understand, many molecules in ZINC are not actually purchaseable.
ZINC have an option when selecting tranches for "in stock" I don't know how the molecules were selected.
Hi @jhjensen2 @cstein @dkoes @drc007 forgive me if this is a naive question, but the structures of MurC with the Enamine/Zinc structures docked. Are we able to download and visualise in e.g. PyMol, or is proprietary software needed? Interested in the extent of overlap, and whether we're "painting" an interior surface that is available for ligand binding, and hence novel compound design.
@mattodd @jhjensen2 @dkoes If the docked structures can be exported in sdf format then they could be viewed in PyMOL using the 6X9F crystal structure https://www.rcsb.org/structure/6X9F.
Would the “maestro” original file also be helpful if people have the Schrödinger academic free visualiser? Whatever is simple and low effort here, I guess.
@cstein might have saved these files, otherwise we can easily redock a few of the best scoring molecules. He's on vacation this week, though.
I ran the top 20 ZINC molecules through MolPort. Compounds 2, 4 12, 13, and 18 are in-stock at Enamine, while compound 14 is in stock at Eximed.
Let me know if I should check additional ones.
I played around with 6X9F for a bit in Pharmit. It was surprisingly difficult to find ligands that matched the hydrogen bond network of the cognate ligand and had good steric complementarity to the receptor. I didn't find anything in MolPort, but there were a few hits in the make-on-demand libraries (MCULE and Chemspace). They still aren't great and some are decidedly non-drug-like. At best they score about the same as the native ligand, not really better. For reference, the native ligand has a Vina score of -7.3 kcal/mol, CNNscore of 0.8, and CNNaffinity of 5.7.
The hits I find most interesting are QZBDWZONHQAXTD-UHFFFAOYSA-N,XAIARAJIYMRDTF-UHFFFAOYSA-N, and CSC076664421.
I've attached the two pharmit screens (json). The results are from applying these screens, minimizing within pharmit (with filtering criteria), and then doing an offline minimization/scoring with gnina (gnina -r rec.pdb --minimize minimized_results.sdf.gz -o results.sdf.gz
). A PyMOL session file is provided as well.
Let me know if there is anything of interest.
Would the “maestro” original file also be helpful if people have the Schrödinger academic free visualiser? Whatever is simple and low effort here, I guess.
@mattodd I have the pose-viewer files for the ZINC database, but it is 1.4 GB in size. Would the first 100 structures be of interest or another subset I can extract for you? The enamine pose-viewer file is only 55 mb so that is easily sharable 👍 The pose-viewer file contains both the 6x9f prepared structure as well as all ligands bound. These can be viewed in the free Maestro interface.
Here are our results for the Enamine Hit Locator library of 234K molecules. The docking is performed with Glide using the SP scoring methodology using the 6X9F crystal structure. The 1000 best binders from each set are then redocked using the more accurate XP scoring methodology.
Compounds with more than 5 rotateable bonds (bad for accumulation) and docking scores > -7.5 (AZ8074 has a docking score of -7.2) are removed. This leaves 100 compounds. All compounds have a globularity < 0.25, which is good for accumulation.
Here are the top scoring molecules. All 100 molecules can be found in the csv file and analysed further in DataWarrior HLL_top.csv
Here are our results for the genetic algorithm search for molecules with low Glide XP scores. We (@cstein) performed several searches resulting in a total of ca 8000 compounds in the final populations. The starting populations are sampled from the Enamine DDS set.
Compounds with more than 5 rotatable bonds (bad for accumulation) and docking scores > -7.5 (AZ8074 has a docking score of -7.2) are removed. For the remaining 859 molecules we computed the fast synthetic accessibility score from Postera.ai and removed any molecules with a score of 0.40 or higher (molecules with higher scores are harder to synthesise).
From this set we selected the 100 molecules with the best Glide XP score and computed the minimum number of synthetic steps as predicted by Postera.ai's Manifold program. Only at this point did occur to me to check for duplicates, so 22 duplicates were removed, leaving us with 88 molecules. GA_top_SA.csv
Here are the top scoring molecules sorted by number of synthetic steps first, and then by Glide score. According to Manifold the first 4 compounds can be purchased from Enamine (i.e. the number of synthetic steps are 0). The molecules are not in the DDS (or HLL, see previous posts) but are made by the GA algorithm. The fifth molecule, with a Glide score of -9.3, is the best score in the set. The csv file contains links to the retrosynthetic paths computed by Manifold.
Here are the next steps the GA search as I see them, but comments/suggestions welcome.
The main determinant of accumulation is the presence of a primary amine group. So we could modify the genetic algorithm to restrict the search to molecules with primary amine groups (as well as 5 or fewer rotatable bonds and a globularity less than 0.25).
We could also try to optimise binding to MurC, MurD, and MurE simultaneously, but someone more knowledgeable needs to tell use which PDB code to use for MurD and MurE.
This is great work @jhjensen2 - we need to source $ quote for the Enamine compounds and a chemist with some spare time can check out how to make the highest scorer. These would be interesting and useful for in vitro analysis, but with an eye on downstream utility, yes, alignment with the entryway criteria would be a significant bonus. @eyermanncj may have a view on which PDBs to use for the MurD and MurE - presumably the same species as for the MurC PDB structure. @bartrum. Will direct Chris D to this discussion by email. Actually he's on Github, I forgot, at @chrisdowson1 !
Hi @jhjensen2 - Laura Diaz Saez at Warwick says:
"I would proceed with the following PDBs:
EcMurE apo: 7B53
EcMurD apo: 5A5E
SagaMurD apo: 3LK7, or download any of the fragment screening structures: https://fragalysis.diamond.ac.uk/viewer/react/preview/target/MURD"
Is that what you need here?
OK, great. Thanks
The csv file is now updated with XP docking scores for murD (PDB ID 5A5F) and murE (7B6G). I am not sure how comparable these scores since these two x-ray structures have ADP or some other ligand bound, while murC (6X9F) has AZ8074 bound. They would probably be more comparable of all 3 have the same ligand. On the other hand, all else being equal it is probably better to select using these scores compared to random.
Jan,
What do the values in the scores column refer to? MurC?
Did you try docking the AZ compound into murD and murE? Maybe that would be a starting benchmark.
Get Outlook for iOShttps://aka.ms/o0ukef
From: Jan H. Jensen @.> Sent: Monday, August 2, 2021 3:30:19 AM To: opensourceantibiotics/murligase @.> Cc: Eyermann, Charles @.>; Mention @.> Subject: Re: [opensourceantibiotics/murligase] Docking studies on MurC ligase (#46)
The csv file is now updated with XP docking scores for murD (PDB ID 5A5F) and murE (7B6G). I am not sure how comparable these scores since these two x-ray structures have ADP + Mg+2 bound, while murC (6X9F) has AZ8074 bound. They would probably be more comparable of all 3 have the same ligand. On the other hand, all else being equal it is probably better to select using these scores compared to random.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopensourceantibiotics%2Fmurligase%2Fissues%2F46%23issuecomment-890835855&data=04%7C01%7Cc.eyermann%40northeastern.edu%7C71de500fabee41370e1408d9558fc347%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637634898213660490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZJuHONyVLFiDpw0Ck2s5RhMjiJ1kJ8Sa6gAZSFZa0HU%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FARPJVXTKPEGY3XCAU4CZXY3T2ZJRXANCNFSM47WJ2Q3A&data=04%7C01%7Cc.eyermann%40northeastern.edu%7C71de500fabee41370e1408d9558fc347%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637634898213670483%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xxesUtP6YGtmG%2BShm3%2BnOWyJGCN6C0r7mIhar25rIXA%3D&reserved=0.
Yes, murC. Good idea about docking AZ8074 to murD and murE
Jan,
Did you try any constrained docking? Our work was based on satisfying H-bonds to the Asn that H-bonds to ADP.
Joe
Get Outlook for iOShttps://aka.ms/o0ukef
From: Jan H. Jensen @.> Sent: Monday, August 2, 2021 6:46:51 AM To: opensourceantibiotics/murligase @.> Cc: Eyermann, Charles @.>; Mention @.> Subject: Re: [opensourceantibiotics/murligase] Docking studies on MurC ligase (#46)
Yes, murC. Good idea about docking AZ8074 to 6X9F
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopensourceantibiotics%2Fmurligase%2Fissues%2F46%23issuecomment-890960200&data=04%7C01%7Cc.eyermann%40northeastern.edu%7Cc89110f2be1543f3b11608d955ab381b%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637635016130799712%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=d4DyxtHJnCHjqyogEso5J5Lm3eU4zV%2FG3HSXgoE1CZw%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FARPJVXW7TX6FHBZWAR5JVNTT22ASXANCNFSM47WJ2Q3A&data=04%7C01%7Cc.eyermann%40northeastern.edu%7Cc89110f2be1543f3b11608d955ab381b%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637635016130809706%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bSNHL2YM%2BS%2FMGCk%2F6OYGS4jd7ylkeBt2j4vzenlGSHU%3D&reserved=0.
No we haven't. Can Glide do this and, if so, how?
Jan,
Yes Glide can do this. When you generate the grid file you can add a variety of constraints for the docking. H-bonds, distance, excluded volumes etc. take a look at the Glide help to see how this is done. Constrained docking had become quite common- partly came into use for kinase targets where there can sometimes be multiple ways to form the hinge hydrogen bonds.
Good luck!
Joe
Get Outlook for iOShttps://aka.ms/o0ukef
From: Jan H. Jensen @.> Sent: Monday, August 2, 2021 7:16:34 AM To: opensourceantibiotics/murligase @.> Cc: Eyermann, Charles @.>; Mention @.> Subject: Re: [opensourceantibiotics/murligase] Docking studies on MurC ligase (#46)
No we haven't. Can Glide do this and, if so, how?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopensourceantibiotics%2Fmurligase%2Fissues%2F46%23issuecomment-890978434&data=04%7C01%7Cc.eyermann%40northeastern.edu%7C76e6015a343e4594086208d955af5eff%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637635033970375107%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=5m9liAHy3S9o31c0ZX39JucxKyrgoNPTEEWqrXfxbqU%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FARPJVXU57CYJFPI6UFGA67LT22ECFANCNFSM47WJ2Q3A&data=04%7C01%7Cc.eyermann%40northeastern.edu%7C76e6015a343e4594086208d955af5eff%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637635033970375107%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZzPigYR88Dge09KlXmzeeqUrSrvvAKRwDmj7NiDlWTw%3D&reserved=0.
Here are the XP docking scores for AZ8074 from @cstein
system PDB docking_score MurC 6x9f -6.85162 MurD 5a5f -2.15905 MurE 7b6g -4.75534
Here are the top 88 molecules from the GA search with murD and murE scores GA_top_SA.csv
During our last Zoom meeting the question came up about the reliability of docking scores. So Casper and I decided to investigate this using the D4 dataset from this paper.
Here they used DOCKER to dock 138 million molecules to the D4 dopamine receptor (5WIU) and they made 549 of these molecules and measured their activity (% antagonist displacement at 10 μM). Crucially the molecules were selected to span all docking scores and are structurally quite diverse (i.e. no homology series that are so common in activity data sets). 122 molecules (22%) showed significant activity (>50% antagonist displacement).
Here’s a plot of the activity vs docking score from the paper
Here I count the total number and number of active molecules within certain score ranges.
As you can see the proportion of active molecules increases with lower (better) docking scores. So if you pick a random molecule with a docking score < -65 there is a 37% chance that it is active.
Let’s compare this to picking a random molecule from the 138 million (i.e. without using docking). 86% have a score less than -40 and 13% have a score between -50 and -40. A random molecule thus has <1% of being active (0.86*0 + 13*0.06). That’s the value of docking.
Let’s look at how Glide performs. Here are the results for XP with and without using LigPrep (I have adjusted the cutoffs to get roughly the same number of molecules in each bin).
The percentage of actives is a bit higher when using LigPrep, so this is probably what we should use going forward.
The percentage of actives is higher for Glide than for DOCKER. However, notice that the chance of randomly picking an active molecule from the 549 molecule-dataset is 22%, so Glide is not better than random at identifying inactive molecules, while DOCKER is.
Unfortunately there are very few molecules with very low docking scores, so that percentage of actives has quite a large error bar. So it’s not really certain that molecules with a score < -8 is more likely to be active than a molecule with a score between -8 and -6 based on this data.
Using larger bin sizes gives us more precise percentages but for a larger range of scores.
Finally, we’ve talked about using several docking programs to create a consensus score. So here are the same results where I have removed all molecules with DOCKER scores > -55
Unfortunately, using DOCKER doesn’t help to weed out inactive molecules with good Glide scores.
If we assume that these results are representative of murC ligase, then we can say that molecules with docking scores < -7 are likely to have a 50% chance of being active. That is much, much better than picking a molecule at random, but I am not sure how it compares to an expert MedChemist.
There is some indication that molecules with scores < -8 are more likely to be active than those between -8 and -7, but there are too few examples to be certain of this.
There is simply too little data to be able to say anything about scores <-9 vs (-9, -8]. In general, the chances of finding molecules with XP Glide scores <-9 in molecule libraries are very low. GAs can be used to generate many more, so I hope that can be tested.
My code has a bug so the last table in the previous post is wrong. So here are the same results where I have removed all molecules with DOCKER scores > -55
So, yes, the success rate for molecules with good Glide scores can be increased to almost 60% by also using DOCKER.
Another thing we talked about is focusing on molecules that have good docking scores for murC, murD, and murE. If we assume that the success rate is 50% for all three target and we have a molecule with good Glide scores for all three, then there's only a 12.5% chance (0.5^3) that the molecule will be active on all three targets. So, with only a 50% success rate this probably doesn't make sense.
Hello everyone,
I am Kato, an Erasmus Master student from KU Leuven - Belgium, who has recently joined Professor Todd's lab at UCL. As part of my internship and master's thesis, I will be working on this OSA MurLigase project for the next nine months. I am looking forward to this multidisciplinary and international collaboration!
I was asked to look up quotes on MCule or Enamine for some compounds predicted by @jhjensen2. I am still waiting for the quote from Enamine, but attached you can find the one from MCule. However, I noticed that when searching via MolPort, there is a remarkable price difference for the same supplier 'UkrOrgSynthesis'. Quote attached as well, just to be sure!
Welcome aboard @KatoLeonard! Just out of curiosity, which molecules did you pick?
Thank you! I picked the first two of the Enamine series, so Z1603489873 (SMILES: Cc1ccc(CC(=O)Nc2cccc(-n3ccc(C(=O)O)n3)c2)o1 ) and Z2581487631 (SMILES: Cn1cnc(C(=O)NC2(CC(=O)O)CCOCC2)c1 )
I have chosen them randomly, but perhaps if I can access the original data file, I can take into account the different poses of the molecules in MurC?
Would the “maestro” original file also be helpful if people have the Schrödinger academic free visualiser? Whatever is simple and low effort here, I guess.
@mattodd I have the pose-viewer files for the ZINC database, but it is 1.4 GB in size. Would the first 100 structures be of interest or another subset I can extract for you? The enamine pose-viewer file is only 55 mb so that is easily sharable 👍 The pose-viewer file contains both the 6x9f prepared structure as well as all ligands bound. These can be viewed in the free Maestro interface.
Dear Casper (Prof. Steinmann @cstein), sorry we missed your message two months ago, but we would really appreciate it if you could share both the zinc and Enamine pose-viewer files to us! Is that possible that you could share the data through Dropbox to us? Many thanks!
@KatoLeonard This message may answer your questions!
In preparation for our study on using a genetic algorithm to find good binders for MurC (more on that later), Casper Steinmann (@cstein) has docked 250K molecules from the ZINC database and 10K molecules from Enamines Diversity Discovery Set.
The docking is performed with Glide using the SP scoring methodology using the 6X9F crystal structure. The 1000 best binders from each set are then redocked using the more accurate XP scoring methodology. Below we show the 100 molecules with the best (lowest) docking scores.
For comparison AZ8074 has a docking score of -7.2 using the same methodology.
Here are screenshots of the top 20 molecules for Enamine and ZINC, but all the data can be accessed in this notebook.
Enamine
ZINC