martinpacesa / BindCraft

User friendly and accurate binder design pipeline
MIT License
221 stars 44 forks source link

Peptide settings not outputting MPNN failures? #73

Open kyleford8 opened 1 day ago

kyleford8 commented 1 day ago

When running with the default settings (4stage_multimer), if a "No accepted MPNN designs found for this trajectory." occurs, it's usually associated with a print statement like: "Base AF2 filters not passed for XXXX, skipping interface scoring".

When running with the peptide settings, I keep observing "No accepted MPNN designs found for this trajectory", but with no associated print statements. In addition, when I look at the failure_csv.csv I can't seem to see any explanation as to why these MPNN designs are failing for a given trajectory.

Is there something I'm missing? I currently have the same target running with both settings (only difference, keeping the filtering the same), and I quickly find hits for the standard settings, and have seen zero hits accepted for the peptide settings. Anecdotally, across multiple targets I have had little success with the peptide settings, although that could well just be real biology.

image

martinpacesa commented 1 day ago

One reason I see that this could because "force_reject_AA" is enabled in peptide settings. So the omit_AAs setting to exclude amino acids is basically just reducing the probability, but if AF2 thinks that amino acid should definitely be there it will place it there, for example a cysteine at the interface. With the force setting we remove it. Perhaps you are getting cysteines with the default settings?

kyleford8 commented 20 hours ago

None of the sequences in trajectory_stats (for peptide or default 4stage_multimer settings) or the final_design_stats (for the 4stage multimer run which yielded hits) contain cysteines, so I would be surprised if it's a consistent cysteine issue. Linking another MPNN issue which I've also observed, which may be related? #74 For the linked issue, even though the print statements make it seem like it's only testing one sequence repeatedly, when I look in the output Accepted folder I still have mpnn1-20 suffixed files.

If I have time later I can try to reproduce with the PDL1 example .pdb file.