opensourceantibiotics / murligase

Everything to do with the Mur Ligase Project
29 stars 6 forks source link

Competition Submission - SBNB & Ersilia #79

Open GemmaTuron opened 2 years ago

GemmaTuron commented 2 years ago

We are working on a generative model to identify new hits that bind the MurD allosteric pocket. This project is a collaboration between the SBNB and Ersilia Our approach consists of 2 phases:

  1. Screening of the Diverse Library to identify new ligands using a combination of docking (by @aleixgimeno) and PocketVec, a method being developed at SBNB by @arnaucoma24
  2. Use the identified hits to train a generative model and filter new molecules based on physicochemical properties and other parameters such as refined docking and inhibitory potential of MurD activity (using the approach developed by @miquelduranfrigola and @GemmaTuron in the Open Source Malaria).

We are currently at step 1, where we have developed pharmacophore models in the allosteric site from the proteins of the structures containing the fragments and we have observed that some pharmacophoric sites are matched by the fragments, while others are not. By looking at the electrostatic complementarity of the fragments with MurD, we have observed that the complementarity in some regions could be further improved. Therefore, we have identified motifs in the fragments that are important to maintain, but we also see an opportunity to optimize the fragments by performing additional interactions in the allosteric site. In addition, we have characterized the MurD allosteric pocket using the apo structure (1) and all structures bound with fragments (4) in the shape of numerical descriptors. We have observed and quantified differences between these structures and additional structures from MurD-Ecoli (with bound substrate in the active site) and MurE-Ecoli (with a fragment bound in a different pocket). In a first instance, we are planning to look for similar pockets in the PDB and use the crystallized ligands as potential hits.

We expect to share newly identified ligands by the end of this week and new fragments using generative models in the next two weeks.

GemmaTuron commented 2 years ago

@edwintse or @mattodd we had a question regarding the MurD inhibition experiments described here. The results are described as % of activity, so interesting compounds are those with % of activity of 20% or lower. Many compounds have a negative value instead, should this values be treated as experimental issues (precipitation etc) and thus discarded? The raw % inhibition data link appears to be broken. And is there any other new available data related to inhibitory potential of the fragments?

Thanks!

mattodd commented 2 years ago

Hi @GemmaTuron - fantastic.

The negative values. I don't see any in the graph shown (but typically this would indeed be experimental error). So, like you, I went looking for the raw data and could not see it. @LauraDS1 @Rebecca-Steventon - help. Could you please dig out the file corresponding to the raw inhibition values for the screen at https://github.com/opensourceantibiotics/murligase/wiki/MurD-Round-1? The broken link is https://github.com/opensourceantibiotics/murligase/blob/master/Fragment%20Evaluations/MurD_Results_5-7-2021.xlsx

More data - @KatoLeonard and @edwintse have been finalising compounds built off the most promising compounds and are nearly ready to send them to @LauraDS1

I'm not aware of any other molecules in this set that have been evaluated in this assay, and which are likely to be hitting the same site. Remember that we are not 100% sure that these compounds are hitting the same "allosteric" pocket, since we have not crystal structures of any compounds bound other than the original fragments. Working hard on getting that confirmation.

Your approach of training the algorithm on the pharmacophore models is v interesting and, I hope, successful.

GemmaTuron commented 2 years ago

Hi @GemmaTuron - fantastic.

The negative values. I don't see any in the graph shown (but typically this would indeed be experimental error). So, like you, I went looking for the raw data and could not see it. @LauraDS1 @Rebecca-Steventon - help. Could you please dig out the file corresponding to the raw inhibition values for the screen at https://github.com/opensourceantibiotics/murligase/wiki/MurD-Round-1? The broken link is https://github.com/opensourceantibiotics/murligase/blob/master/Fragment%20Evaluations/MurD_Results_5-7-2021.xlsx

More data - @KatoLeonard and @edwintse have been finalising compounds built off the most promising compounds and are nearly ready to send them to @LauraDS1

I'm not aware of any other molecules in this set that have been evaluated in this assay, and which are likely to be hitting the same site. Remember that we are not 100% sure that these compounds are hitting the same "allosteric" pocket, since we have not crystal structures of any compounds bound other than the original fragments. Working hard on getting that confirmation.

Your approach of training the algorithm on the pharmacophore models is v interesting and, I hope, successful.

Hi @LauraDS1 @Rebecca-Steventon could you have a look at the MurD inhibition experiments and confirm that compounds: OSA_000803 OSA_000781 OSA_000777 OSA_000807 OSA_000787 OSA_000809 OSA_000791 OSA_000738 OSA_000780 OSA_000810

should be excluded from the analysis?

Thanks!

LauraDS1 commented 2 years ago

Hi all,

I will check with the screening team and get back to you. Best Laura

arnaucoma24 commented 2 years ago

Hi all, Please, find below updated information of our approach.

- Docking approach

We have run docking on a diverse library of compounds and we have calculated 3 properties of the best docked pose for each compound:

  1. Docking score ("score"): Predicted binding energy.
  2. Solvent accessibility ("accessibility"): Surface area of the docked pose that is accessible to a solvent.
  3. Distance to the MurD pocket ("pocket_edist"): The distance of the closest atom in the docked pose of the ligand to the bottom of the allosteric site pocket.

Based on these properties, the compounds of the diverse library will be treated as positives or negatives to build a ML model, depending on the fulfillment of the following criteria:

  1. score <= -15.2959. This is the best docking score obtained among the crystallized fragments. This value range includes compounds with higher predicted affinity than the experimentally validated fragments.
  2. accessibility <= 0.4. This value range should include the compounds with docked poses that are not too exposed to the solvent.
  3. pocket_edist <= 2. This value range should include the docked poses that are able to interact with residues in the bottom of the pocket that defines the allosteric site. After applying these filters, 34,118/300,527 compounds (11%) are considered positives.

- Pocket-based approach

First, all ligand-defined protein binding sites in the PDB have been characterized by means of PocketVec numerical descriptors, which allows for the assessment of pocket similarity. PocketVec descriptors are based on the assumption that similar pockets bind similar ligands. Such descriptors are built upon inverse virtual screening and thus capture the behavior of each binding site against an external set of docked molecules.

We have then collected the most similar sites to the MurD allosteric pocket according to PocketVec descriptors and all ligands occupying such sites are kept as putative hits for the allosteric pocket of interest. In this regard, we have used all Saga MurD structures (4 with crystallized fragments + 1 apo structure) and a cut-off value of 0.15 in terms of the cosine distance between PocketVec descriptors. Finally, 144 molecules have been found to bind with pockets similar to the MurD allosteric pocket. After a filtering process to discard non-biologically-relevant ligands, such molecules will be used to prioritize generated molecules close to them.

Rebecca-Steventon commented 2 years ago

Hi @GemmaTuron - fantastic. The negative values. I don't see any in the graph shown (but typically this would indeed be experimental error). So, like you, I went looking for the raw data and could not see it. @LauraDS1 @Rebecca-Steventon - help. Could you please dig out the file corresponding to the raw inhibition values for the screen at https://github.com/opensourceantibiotics/murligase/wiki/MurD-Round-1? The broken link is https://github.com/opensourceantibiotics/murligase/blob/master/Fragment%20Evaluations/MurD_Results_5-7-2021.xlsx More data - @KatoLeonard and @edwintse have been finalising compounds built off the most promising compounds and are nearly ready to send them to @LauraDS1 I'm not aware of any other molecules in this set that have been evaluated in this assay, and which are likely to be hitting the same site. Remember that we are not 100% sure that these compounds are hitting the same "allosteric" pocket, since we have not crystal structures of any compounds bound other than the original fragments. Working hard on getting that confirmation. Your approach of training the algorithm on the pharmacophore models is v interesting and, I hope, successful.

Hi @LauraDS1 @Rebecca-Steventon could you have a look at the MurD inhibition experiments and confirm that compounds: OSA_000803 OSA_000781 OSA_000777 OSA_000807 OSA_000787 OSA_000809 OSA_000791 OSA_000738 OSA_000780 OSA_000810

should be excluded from the analysis?

Thanks!

Hi @GemmaTuron ,

I can confirm that those fragments produced negative inhibitory rates indicative of them interfering with the assay system, and so were removed from further testing. I will reupload the raw data file for the percentage inhibition of the compounds against MurD.

Thanks

edwintse commented 2 years ago

HI @GemmaTuron, did you have any suggestions for us yet? We have about 48 hours to place any orders for eg. commercial compounds from Enamine or reagents for synthesis.

GemmaTuron commented 2 years ago

Hi @edwintse ! We are doing the final filtering for the molecules (can see all the progress (here)[https://github.com/ersilia-os/osa-murd]), we were aiming to submit on the 30th, which was the deadline if we understood correctly!

mattodd commented 2 years ago

Hi @GemmaTuron - yes, please send in your suggestions ASAP. It'd help if you were able to do a check on commercial availability, to allow us to place an order rapidly for any compounds we can simply buy. We'll look at the syntheses of the others.

arnaucoma24 commented 2 years ago

Hello @MatTodd and @edwintse

We are happy to share a preliminary list of 628 candidates (https://github.com/ersilia-os/osa-murd/blob/main/submission/candidates_628_list.csv) with their associated rigid docking score to MurD. From this list, we have identified two interesting pairs of compounds (32293, 11-6946, 33045, 32295) that retain the aromatic ring+amide from the known MurD inhibitors (see docking pose attached). We are currently working on further filtering these molecules based on flexible docking to provide a shorter -and sorted- version of the list (prioritizing 10-20 hits). We'll get back to you soon.

Unfortunately, we do not have any metric counting for commercial availability at the moment.

poses

mattodd commented 2 years ago

Sounds very good @arnaucoma24. If anyone has a quick way of parsing the final list that comes through for easy ordering (Enamine, in stock), that'd help speed things along, but obviously we're not expecting all compounds to be commercially available in this competition.

drc007 commented 2 years ago

Given a list of structures (smiles/sdf) I can search most vendors

On 30 Jun 2022, at 11:35, Mat Todd @.***> wrote:

Sounds very good @arnaucoma24 https://github.com/arnaucoma24. If anyone has a quick way of parsing the final list that comes through for easy ordering (Enamine, in stock), that'd help speed things along, but obviously we're not expecting all compounds to be commercially available in this competition.

— Reply to this email directly, view it on GitHub https://github.com/opensourceantibiotics/murligase/issues/79#issuecomment-1171052394, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWAURDQAEIDC3QD4PKRKILVRVZ6NANCNFSM5XNRFDIQ. You are receiving this because you are subscribed to this thread.

GemmaTuron commented 2 years ago

Hi @drc007 This would be fantastic many thanks.

Is this list enough? The "smiles" column contains 628 SMILES of the best compounds we have generated. What is a reasonable number of molecules to check?

drc007 commented 2 years ago

Hi @GemmaTuron Here you are 628smiles_vendors.sdf.zip

arnaucoma24 commented 2 years ago

Hello @MatTodd and @edwintse!

We have done a final filtering based on flexible docking and pharmacophoric restraints. This list (https://github.com/ersilia-os/osa-murd/blob/main/submission/final_submission.csv) shows 24 selected molecules with high docking scores. The “commercial” column indicates if @drc007 found them in available databases. As you can see, 7 are readily available. We are also appending the docking poses of the selected candidates (svg file).

In addition, if you want to explore some more options, from the 628 filtered candidates a total of 170 are commercially available (thanks @drc007). You can find them with their associated docking score in this list.

It has been a pleasure to participate in this challenge, we look forward to seeing the results! MurD

edwintse commented 2 years ago

@arnaucoma24 @GemmaTuron thanks for your submission! @mattodd here's the filtered 24 list in order from L to R, T to B. How should we proceed?

Entry Ersilia

aleixgimeno commented 2 years ago

Hello @mattodd and @edwintse and thank you for the report!

We have noticed that 2 of the compounds with unwanted groups have positively charged carbon atoms. This is because the smiles we used for the representation are the ones we reported in the submission, which come directly from the generative model. After compound preparation we obtain the following states for the 2 compounds, in which the positive charge is placed in the N atom:

32293: C=[N+]1CCN(c2ccc(NC(=O)CN3CCc4ccccc4C3)cc2)C1 32293

32295: C=[N+]1CCN1Cc1ccc(NC(=O)CN2CCc3ccccc3C2)cc1 32295

mattodd commented 2 years ago

So - should we Enamine the greens and oranges? That would give us a good, diverse set, and a number of compounds that does not represent a huige advantage over the other entrants? @GemmaTuron @arnaucoma24 @aleixgimeno are there any blue/black ones that are must-haves? 32293 and 32295 are reactive iminium ions, so not realistic for synthesis. You might want to filter those out from the model.

aleixgimeno commented 2 years ago

To prioritize these compounds, we selected hits that matched the highest number of pharmacophoric sites, also considering diversity, except for a) compounds 11-6946, 32293, 32295, 33045, that were selected due to their similarity to known inhibitors; and b) compounds 11-502 and 12-4635, that were selected because they were the only ones that matched a specific pharmacophore site, despite matching less than 4 pharmacophoric sites in total.

To have more diversity in terms of interactions with the target, we would suggest to try to synthesize compound 11-502 (black), as it belongs to this last group. Regarding the rest, we would prioritize compounds 3-1812 (black), 16-9738 (black) and 20-5140 (blue) because of their docking scores and their predicted interactions deep in the pocket.

In addition, I attach here the list of all the compounds that matched 4 sites of the pharmacophore (41 compounds in total): https://github.com/ersilia-os/osa-murd/blob/main/submission/41-hits_4-ph4-matches.csv. We selected most of the compounds from this list and these 41 compounds are the ones that we would prioritize among the 628 hits. This way, you can consider other compounds to replace the ones from the initial 24 compound list that are difficult to synthesize.

Also note that, as @arnaucoma24 pointed out, from the 628 filtered candidates, a total of 170 are commercially available. You may also consider those in case synthesizing compounds with higher priorities is complicated or unfeasible.

arnaucoma24 commented 2 years ago

Here you can find the list with the commercial availability of the 628 candidates (last column), in case you find it useful. @edwintse