ikalvet / heme_binder_diffusion

76 stars 17 forks source link

No models left ligand binding design with ligandMPNN #11

Open ElkeDeZitter opened 1 month ago

ElkeDeZitter commented 1 month ago

Hi,

I tried your pipeline several times but I never get to the end because no files are left. I first tried the notebook as is, but then I had to change filter values such as SASA_limit when anaylysing the diffusion outputs in order to have backbones to take to step two. However, none of the structures would get a reasonable alphafold2 model.

In the SI of the manuscript (Krishna et al.) I noticed different parameters, such as num_designs and contigmap parameters. I tried these, but also here nothing that surviced at a given point (heme_scoring.py) after binding site design with ligandMPNN.

Then I tried to even further allow variation in the diffusion step with contigmap: contigs: ["30-200,A15-15,30-200"] inpaint_str: null length: "150-300"

However, I got stuck in the same step as before. I could again play with the parameters for filtering, but I believe this is not the way to go since I would like to have output that makes sense (i.e. something that is worth expressing and purifying).

Could you provide some parameters that work? I would like to go through the full pipeline with the provided example, so that I understand each step before adapting the pipeline to my project.

Thanks in advance, Elke

ikalvet commented 1 month ago

Thank you for trying it out! The unfortunate situation with this pipeline is that its success relies on generating large numbers of designs. Our average production runs would start with generating thousands of backbones, out of which perhaps 10-20% are truly viable. Finally perhaps 1-5% of unique backbones would survive the entire pipeline. The example in the notebook (with the default number of designs) unfortunately may not lead to any passing designs after filtering for final metrics. Switching to longer sequences (as you already tried) is one way to slightly increase success rates, at least in case of heme binders. The other parameter you could play with is the contact potential "guide decay". Switching from "cubic" to "quadratic" will have an effect on the structures, making them more compact around the ligand. This will occasionally cause more clashes between the protein backbone and the ligand, and pull secondary structure elements too close to each other. But it would be a way to further increase the backbone diversity, in hopes of finding something that passes.

And finally, there are also options to get more out of the initially diffused backbones. 1) Generating more sequences with proteinMPNN may allow you to find more AF2-passing hits. 2) You could consider running partial diffusion on the diffused backbones, to further diversify them. That's done by using diffuser: partial_T=<10-20>, omitting diffuser T=200, and creating the contigs such that the motif residue remains fixed. The 10-20 timestep partial diffusion will wiggle the backbone around a little bit. 3) You could also run partial diffusion after having identified well-folding backbones, to further subsample these. 4) There are also ways you could play with the active site design script (scripts/design/heme_pocket_ligMPNN.py). Such as increasing the number of ligandMPNN-generated sequences (line 431-432), changing the ligandMPNN temperature (line 430), with maybe even using a decaying temperature, as a function of design iteration (such as {0: 0.4, 1: 0.2, 2: 0.1, 3: 0.1, 4: 0.1}).

ElkeDeZitter commented 1 month ago

Thank you very much. I tested with much larger number of initial design and now I could get to the end of the pipeline :-) I still have to test your other suggestions.