rifdock / rifdock

Rifdock Library for Conformational Search
Other
123 stars 41 forks source link

rifgen & rifdock docking #149

Open ahorvath opened 5 months ago

ahorvath commented 5 months ago

Dear @bcov77 and @srwarner,

I have gone through the Nat Comm pipeline and got quite promising results for miniprotein-protein docking. I'd like to kindly ask for your help with two questions.

Many thanks in advance.

Best, Attila

ahorvath commented 5 months ago

This seems like the way to do it:

rifdock_test native_docking | false | B| Best way to do docking with | | | only native AAs. Added by | | | bcov 2021.

bcov77 commented 5 months ago

Good work looking through the source. It's the best documentation.

I'd agree that -native_docking is the best way to do it. It's still pretty sketchy because rifdock isn't setup to do this and I think it might be pretty slow, but I believe everything it outputs will have the native sequence.

I don't think it's been heavily tested. I think I added it when I added the flag -pssm_enforce_no_ala because I realized that it was only going to be like 10 extra lines of code.

Very curious if it works for you!

bcov77 commented 5 months ago

RNA:

I think in theory this can be done, it might even work by default (I forget if Will made this work). So I'd say just try it, if it runs, it's correct, if it doesn't run, then, it's probably hard enough that you don't want to actually make it work.

ahorvath commented 5 months ago

Thanks for your feedback, @bcov77.

The main thing in the config to make it work for native AAs was to decrease this parameter (-0.5 at default)

-bonus_to_native_scaffold_res -1.5

Note, a strange message keeps appearing in the log file: ZMISSING ROTZMISSING ROTZMISSING ROTZMISSING ROTZMISSING

How do you think I could fix this and get a refined dock for the corresponding PatchDock seed?

bcov77 commented 5 months ago

Does it still finish with the ZMISSING ROT message? That was sort of a debugging statement that only occurs in weird situations that defy the RIF. Basically, that error message is saying that "Your scaffold has a rotamer that seems to have come from no where", which is exactly correct in this situation.

If it still finishes, I can go mute that error when -native_docking is turned on.

ahorvath commented 5 months ago

The message is often there even if it gives good docks (attached logPOS_h4.log )

How can this situation happen? Is it something with the patchdock initial state?

Would it make sense to further decrease this parameter?

-bonus_to_native_scaffold_res -1.5

bcov77 commented 5 months ago

Ok, if you pull the latest I muted those errors if you have that flag enabled.

It has nothing to do with patchdock, it's because -native_docking is a hack. Typically the way rifdock works is that it moves the scaffold to a position, looks into the RIF to see what's available, and packs with those (with ALA being the default "I didn't find anything"). However, with -native_docking enabled, ALA isn't allowed even if nothing is found, so the native rotamer is manually added. Later on when rifdock is doing sanity checks (to make sure that everything we found is available in the RIF at that position), it realizes that rotamers have appeared that weren't present (because we added them). This triggers that error which usually implies something is broken internally.

The more I think about -bonus_to_native_scaffold_res the more I realize you should set it to 0. It's working for the wrong reason. What's actually happening is that you're getting "points" for putting the scaffold into positions where it's finding native rotamers in the RIF no matter how good they are. This means that crappy docks with lots of "hits" in the RIF can score better than good docks with only a few interactions.

What you should do instead is set these flags to high values: -global_score_cut -hackpack_score_cut -cluster_score_cut

Maybe try 100 and see what happens. It might be the case that all of your docks have positive scores and it's only when you start giving the bonus that any make it below 0 (which is the default value for those scores).

ahorvath commented 5 months ago

Many thanks for the suggestions, it looks good now.

In some cases, rifdock produces less strong docks (measured by prodigy) from the patchdock seeds. What can be the reason for that?

If I understand correctly, rosetta has a different scoring system. Do you think it could be harmonised somehow?

bcov77 commented 5 months ago

Rifdock uses a really fast approximate scorefunction that doesn't even agree with Rosetta very well. You can think of Rifdock as a quick way to move docks around into positions where they can likely be designed by Rosetta to have better energies. It was never designed to be super accurate in an energy sense. When we use it for design, we output like 3,000 docks, design them all, and then take the top 10 or fewer.

No one works on Rosetta anymore, it's been entirely superseded by deep learning methods. So there's not going to be a harmonization unfortunately.

If I already knew the sequence of both binding partners, I'd just use AlphaFold2 Multimer. https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb

ahorvath commented 4 months ago

I see, thanks for clarification. How would you handle ~10k putative binders if the native sequence can't be changed?

bcov77 commented 4 months ago

If it's only 10K, I'd probably just alphafold them all. Even when I do the math down below it's not that bad.

If you only need rough docks for each one, then, you could cluster them by sequence and only predict a few. But, I guess that's not a whole lot better than using Rifdock or other methods.

My strategy if compute time is very limiting would be to run a single AF2 multimer model without MSA. We get this to run in 5 GPU seconds per structure, which only comes out to 13 GPU hours for the whole bunch. Even if it's 4x that long, it's only 2 days on a GPU. It also runs on CPUs if you have access to more of those with about a 50x speed difference CPU-core/GPU.

The mean pae across the interface is the metric you care about for confidence. See our paper here: https://www.nature.com/articles/s41467-023-38328-5 . I think collabfold also reports an iptm which is good too.

I think there's a way to run collabfold locally. I'm not exactly sure how it's done, but, we run collabfold on our cluster. I think you can also download multimer directly from deep mind. The key though to get these to run in a reasonable timeframe is to not use MSAs.

ahorvath commented 1 month ago

Hi @bcov77,

Is there any option in rifdock to avoid mutating disulfide bridges/cysteine residues?

Many thanks in advance.

Best, Attila

bcov77 commented 1 month ago

Honestly I think it's a bug that it mutates disulfides. Probably they aren't being detected, and so rifdock thinks they're free cysteines, and then it mutates them.

I changed it so that it won't touch CYS at all. Likely this will fix your issue. Just pull latest and rebuild (should take like 15 seconds to rebuild).

You're using this right? -scaffold_res_use_best_guess If you're manually specifying the list of residues to mutate with -scaffold_res, well, I've got another solution for you...

ahorvath commented 1 month ago

Yes, I just started a run with the new binaries and will check the outcome.

I'd like to ask another question about the motif picking as well. After this step,

$CAO_2021_PROTOCOL/motif_selection.py cluster_results.list '../motif_extraction//.json' -num_motifs 1000 > selected_motifs.list 0 / 7260

I run into this error message:

Traceback (most recent call last): File "/data/supplemental_files/cao_2021_protocol//motif_selection.py", line 260, in final_results = select_best_motif( cluster, min_motif_len = 7, verbose=False, bonus_key=None) File "/data/supplemental_files/cao_2021_protocol//motif_selection.py", line 92, in select_best_motif average_ddg[new_index] += info['per_res_ddg'][ires] IndexError: list index out of range

I don't understand why the first residue is set to 50.

#lets set 50 to be the first residue Apologies for the multiple questions. I'm almost at the end of the pipeline.

bcov77 commented 1 month ago

Hmmm... I could read more into that script to figure out why they are 50 (Longxing wrote this), but, it's worked for years like that. I think it's something like "the motifs are variable size, so say this one starts at 50 in case we hit someone that has residues before the start". That way you don't need negative indices.

Looking at that function, my best guess is that average_ddg is running out of size. It gets initialized to size 100 at the start, maybe initialize it to size 1000 and change that reference from 50 to 500? Idk why your motifs are that big though, that itself seems like a problem.

ahorvath commented 1 month ago

Thanks, @bcov77, initialising with 1000 has solved it and I could proceed to the grafting step. Now I have the motifing chunks with the paper_motif_graft.xml command and I bumped into this error:

[ ERROR ]: Caught exception:

File: src/protocols/motif_grafting/movers/MotifGraftMover.cc:546
For this scaffold there are not suitable scaffold grafts within your constrains

AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.

It seems it can't produce any refined scaffold. Why do you think this can happen?

Many thanks in advance.

bcov77 commented 1 month ago

That's normal. A given scaffold usually can't accommodate a motif, you have to try a bunch.

This is why in the paper we do 3,000 motifs by 20,000 scaffolds. That usually results in about 6M outputs. (So 10% I guess)

If you're dead set on using that scaffold, increase the rmsd numbers inside the XML. I think there are two on the motif graft mover. You may not get a very good result, but I mean, the choice is yours.

These days we'd use diffusion to build a new scaffold around the motif. But, you'll have to figure that one out yourself if you go that route.

ahorvath commented 1 month ago

You were right. It ran on the whole data set and did come with some hits. Now, I'm at the step before the last one when I need to provide the grafted scaffolds to FastDesign. It gave the attached error message.

Can you please help me with this?

Many thanks for your help. [Uploading ROSETTA_CRASH.log…]()

bcov77 commented 1 month ago

I mean, that's not a full crash log lol. I'm not sure how I'm gonna debug that.

ahorvath commented 1 month ago

Also attached the logs. Would this be more help logs.zip ful?

bcov77 commented 1 month ago

I've never seen that error before. But, the json loader shouldn't be getting used, so that's confusing too.

Is this the same command you ran the first time around? It seems like you got it to work once. Everything should be exactly the same.

If I had to guess, I'd say maybe you didn't build Rosetta with hdf5 support? I'm not sure though.