Predictive Modelling Competition to Design Binders of MurD, an Antibacterial Target

edwintse commented 2 years ago

Executive Summary We've a protein target. It's promising for the development of new antibiotics. You predict molecules that will bind a particular bit of the protein. You need to use generative methods. We procure (buy or make) the molecules (as best we can) and experimentally test them for binding the protein. We run as many cycles as we can. All 100% open science. We finish at the end of June.

What is this?
We want to identify effective generative methods for the prediction of small molecules that will bind to a target protein. We're going to do this:

1) run a competition, open to anyone. To participate you need to share your molecule suggestions and as much detail about your method as you can; i.e. very open science. 2) You predict molecules. 3) We source them (make or buy). 4) We biologically evaluate them: binding vs. the protein, and enzymatic inhibition if the binding looks strong. 5) We post all the data openly and we run another round (two if possible). 6) We finish at the end of June 2022 and announce winners with fanfare.

Who's Involved? Matthew Todd (@mattodd), Brooks Paige (@tbrx) and Peter Coveney from UCL raised a little money (from UCL/EPSRC) to run this competition. @edwintse will be the project champion and primary liaison with entrants. Biology will take place at the University of Warwick, carried out by a team involving @chrisdowson1 @LauraDS1 @Rebecca-Steventon. Consulting advice provided by @eyermanncj, who has deep knowledge of this target from the pharma industry. All we need now is YOU.

What are the Competition Specifics? The target is a bacterial enzyme, MurD. This is a high value target for new antibiotics. You can read why it's important in the wiki associated with this repository/project. Here in Open Source Antibiotics we're trying multiple approaches towards new inhibitors of the mur enzymes (e.g. the agenda/minutes from our last meeting #70).

Your mission is to take existing crystal structures of MurD, and predict small molecules that bind to an allosteric pocket that was seen to bind several fragments. The relevant structures/datasets are:

1) Structures of the four fragments (349, 373, 374, 378) bound to protein: Data on Fragalysis, wiki page, pdb files. We want molecules bound at this location. (Note that the specific protein here is MurD from Streptococcus agalactiae, sometimes abbreviated by biologists to "Saga". 2) A deposited structure of the protein, (3UAG, protein from E coli), which lacks the fragments. This structure has substrate and ADP bound (not in the allosteric pocket). This structure is useful to consider, as described below.

What's Known Already about Binding to this Allosteric Site? We've made some derivatives of the fragments above and have preliminary data suggesting some inhibit the enzyme, but we still have no direct evidence that binding to this allosteric pocket actually inhibits the enzyme. This is why measurements of predicted compounds in this competition will use surface plasmon resonance (SPR) as the initial assay rather than enzymatic inhibition. Earlier in the project we asked Atomwise to predict molecules binding to a region of the protein in the 3UAG structure (see also #33). We have crystal structures of some of those molecules but binding to a related protein, MurE.

Below is a picture (thanks, @eyermanncj) of the fragments (orange) bound to MurD (Saga, purple) versus the 3UAG structure (E coli, cyan). The purple protein with fragments bound has another helix (top right of the picture) - this is also seen in the Saga MurD without the fragments (PDB: 3LK7). None of the Saga MurD structures (with fragments or without) have substrate or ADP bound, while the 3UAG structure does. Thus the 3UAG structure will give you a sense of what might happen, conformationally, when substrate/ADP binds. You can design molecules that bind either protein, with or without substrate/ADP - i.e. make use of that helix (in the purple structures), or not (in the cyan structure).

There is also a Pymol session from @LauraDS1 with i) Saga MurD apo + the fragments (orange) ii) E coli MurD plus substrate/ADP (3UAG, cyan) iii) Saga MurD apo (3LK7, grey) iv) One of the Atomwise compounds bound to the related MurE enzyme (protein from E coli, yellow). This is just for reference, so you can see the difference in the binding sites.

The primary binding assay will be SPR using apo Saga MurD (i.e. like the conditions that found the fragments) but we can also try other conditions (e.g. with substrate/ADP) or other assays such as crystal soaking and enzymatic inhibition, if those are warranted.

Competition Timeline

The competition will be launched on March 30th 2022
It will run with monthly reviews, synthesis and evaluation of new compounds
It will finish on June 30th 2022

Can we ask questions? Yes! Ideally by starting a new Issue. Interaction between OSA and entrants, and between entrants themselves, is strongly encouraged. For anything technical, get in touch with @edwintse in the first instance.

Submission Rules:

Entries may either be submitted directly to GitHub (here, another issue etc) or be uploaded as a file in this repository. You could also link out from here to another place that hosts the necessary data, e.g. Zenodo.
Entrants can work individually or in teams (no limit to team size).
Entrants must work openly during the competition. This doesn't necessarily mean that inputs have to be logged in real time (although that is strongly encouraged), but entries that have not openly deposited working data on a regular basis prior to the deadline(s) will not be accepted.
Entrants must agree to their work's incorporation into a future OSA journal publication(s), with suitable attribution.
Competition winner(s) will be authors on any relevant future paper(s).
Any valid entries will at least be acknowledged on any relevant future paper(s) and if the contribution is significant may lead to authorship.

How will entries be assessed?

Generated compounds can either be commercially available to purchase, or be made here in the lab at UCL.
The molecules do not need to be structurally related to the fragments (in fact, all other things being equal, it would be more interesting if they were not) but need to bind in the same region.
The molecular weight of the generated compounds should be roughly 250-350 Da. This is a balance between fewer atoms (probably easier to source) vs more atoms (probably more contact points with protein). If your suggestions are based on an existing fragment structure known to bind the protein, your suggestions must be at least 3 heavy (non-H) atoms (so, ca. 50-60 Da) heavier than the starting structure.
The calculated LogP for any compound should be lower than 3.
If suggested compounds are too difficult to synthesise in the time available, we will come back to the suggesting team for alternatives. But try to factor in synthesisability, and take into account that molecule suppliers such as Enamine are currently operating a greatly reduced service owing to Russia's invasion of Ukraine.
Compounds will be evaluated at the University of Warwick at the end of each cycle, with the cycles dependent slightly on how easy it is to source the molecules.
A good level of binding for the compounds would be double digit micromolar, with single digit micromolar being ideal.
In the case of a tie, we will assess not only binding strength but also ligand efficiency (i.e. binding strength as a function of number of contact points needed).

What's the prize Kudos, public acclaim, good karma, professional respect.

Have we done anything like this before? Yes, we recently ran a competition for ML/AI methods in drug discovery as part of Open Source Malaria. The new competition will have similar rules of openness, but will be directed towards a particular protein target (rather than phenotypic) and entrants will be expected to use generative methods (i.e. not large-scale virtual screening of commercial libraries). This project is a focussed, short-term effort that is intended to bring together the generative AI/ML community around a specific problem and experimentally validate predictions. If you're interested in this kind of thing and want to do more in the future, check out CACHE, which is now getting going.

Comments and questions can go below. The above rules/guidance will be periodically updated.

vandan-revanur commented 2 years ago

@mattodd or @edwintse Could you also please upload the PDB files of only the ligand fragments without the MurD protein?

edwintse commented 2 years ago

Hi @vandan-revanur, I'm not really sure about this. @eyermanncj @LizbeK could you advise?

jhjensen2 commented 2 years ago

What's the difference between 374.pdb and 378.pdb? The ligands look identical.

mattodd commented 2 years ago

Hi @jhjensen2 oh that may be an error. @drc007 I think you originally uploaded these? Might there be a copy and paste error? Or @LauraDS1 could you please check? The files are here. (structural comparison is here - they're different structures!)

mattodd commented 2 years ago

A question came in by email: Are we planning on sharing training data? No - we're happy for people to train as they see fit, e.g. using ChEMBL, or Enamine Real, or any other source.

drc007 commented 2 years ago

Hi @jhjensen2 oh that may be an error. @drc007 I think you originally uploaded these? Might there be a copy and paste error? Or @LauraDS1 could you please check? The files are here. (structural comparison is here - they're different structures!) @jhjensen2 @mattodd I just checked and the files here https://github.com/opensourceantibiotics/murligase/tree/master/docs/pdbs_forNGL/MurD. are different

jhjensen2 commented 2 years ago

@drc007 I must be missing something obvious. These two files look identical to me: 374.pdb 378.pdb

drc007 commented 2 years ago

@jhjensen2 All files used on website are here https://github.com/opensourceantibiotics/murligase/tree/master/docs

jhjensen2 commented 2 years ago

Here's the first round of molecules that @cstein and I found with genetic algorithms. They are roughly on order of preference based on the number of protein-ligand interactions in common with the fragments, synthetic accessibility, and, in the case of the last one, docking score.

We chose to focus on primary amines, since they tend to make for better antibiotics. We are happy to look for more based on feedback on the current selections.

MurD_GA1_amines.pdf

Note from coordinators - these suggestions should be discussed further in #72, though there is some preliminary discussion on them below.

mattodd commented 2 years ago

Great, @jhjensen2! FIRST ENTRIES! Are you happy to share something about your method? @edwintse - what do you think? What shall we buy, what shall we make? Can you please analyse ASAP so we can act? And re the files @drc007 it looks like the pdb files here are different but here are the same? Do we need to just replace one of the files here with the correct one, in that case? I can easily do that if I know that's the right thing to do.

jhjensen2 commented 2 years ago

I'll write up the method a next week. I am currently preparing the CACHE-1 submission for Monday

edwintse commented 2 years ago

Thanks @jhjensen2 @cstein for your entries! @mattodd I've done a availability/synthesisability search for each compound and prices are shown below.

GA1_3478: the cyanamide is a bit nasty in the first step GA1_5162: looks to be synthesisable GA1_7046: longer lead times for the starting reagents but looks synthesisable as well ??: the bromide for the Suzuki is available but a bit pricy; also no stereochemistry in the purchasable reagent GA1_1509: currently unsure about synthesis; will keep looking GA1_5159: looks to be synthesisable GA1_5806: long lead times for both routes; may be quicker/cheaper to make the pyridine reagent via Suzuki?; the piperidine also has no defined stereochemistry in the purchasable reagent

Let me know your thoughts

Untitled Wiley-9

jhjensen2 commented 2 years ago

Here are 2 alternative routes for GA1_3478 and GA1_1509 Actually, I don't think the stereochemistry of GA1_5806 is important, i.e. I seem to recall that we get good docking scores with both enantiomers. But let me double check tomorrow, when I have access to Maestro

jhjensen2 commented 2 years ago

Here's another route for GA1_5806

mattodd commented 2 years ago

Hi @edwintse @cstein @jhjensen2 so we can discuss the details, I've budded off the discussion of these suggestions and next steps to #72

LauraDS1 commented 2 years ago

Dear all,

Apologies for the delay.

I have updated the PDBs here. Data for 374 and 378 should be now correct. Please, let me know if there are any issues.

@vandan-revanur The PDBs for the compounds, not containing the protein atoms or other molecules, are now uploaded into the same link.

Please, let me know if I can help with anything else.

Best, Laura

vandan-revanur commented 2 years ago

@edwintse When will the winners of the competition be announced?

opensourceantibiotics / murligase

Predictive Modelling Competition to Design Binders of MurD, an Antibacterial Target #69