nrbennet / dl_binder_design

MIT License
223 stars 54 forks source link

How to use the '-fix_FIXED_res' parameter in the dl_interface_design_multi_seq.py #11

Closed Lyang556 closed 1 year ago

Lyang556 commented 1 year ago

Hi Nate,

As far as I know, in the 2-chain hallucination results, the positions of the residues we want to fix are not static in the generated PDB files, they change, and we can see the exact positions of these fixed residues in the trb files. So how should we pass these positions to '-fix_FIXED_res'? As you know, we usually design a lot of proteins and these results are converted to a silent file, and there is no information about these fixed residues in the silent file.

sincerely,

Lei yang

nrbennet commented 1 year ago

You will have to add FIXED labels to the pdbs before you collect them into a silent file (you can collect these after you have made the silent file but it requires a bit more Rosetta code so I won't go into that here).

You will want to iterate though the pairs of .pdb and .trb files, extract the boolean mask of designed residues and run a loop like this to add residue labels to the .pdb file:

with open('my.pdb.file', 'a') as f: for resi in range(len(boolean_mask)): f.write('REMARK PDBinfo-LABEL:%5s FIXED\n'%(resi+1)) # NOTE, these labels must be 1-indexed since they are meant to work with Rosetta which is 1-indexed

Lyang556 commented 1 year ago

@nrbennet Thank you for your prompt reply. I have written a Python script based on your suggestion, can you help me see if this script conflicts with your suggestion? Here is my script: import sys import os import glob import numpy as np seeds = glob.glob('diffusion/*.pdb') for seed in seeds: trb = np.load(seed.replace('.pdb', '.trb'), allow_pickle=True) sample_masks = trb['mask_1d'] sample_masks = np.array(sample_masks) fixed_positions = np.where(sample_masks == True)[0] with open(seed, 'a') as f: for resi in fixed_positions: f.write('REMARK PDBinfo-LABEL:%5s FIXED\n' % ( resi + 1)) # NOTE, these labels must be 1-indexed since they are meant to work with Rosetta which is 1-indexed

Lyang556 commented 1 year ago

I found that the generation sequences don't fix the resideus we want to fix. Maybe the '-fix_FIXED_res' isn't passed to mpnn.

nrbennet commented 1 year ago

There was a bug in the MPNN script where it was not correctly reading the FIXED labels. This is fixed in the PR today. I have also added a helper script which allows for the simple parsing of RFdiffusion outputs to FIXED residue labels

Lyang556 commented 1 year ago

Thank you for taking the time to fix this bug. I have tried to use the new script. My script is

python3 mpnn_fr/dl_interface_design.py -silent r1.silent \ -output_intermediates \ -checkpoint_path mpnn_fr/ProteinMPNN/vanilla_model_weights/v_48_020.pt \ -omit_AAs 'CX' -fix_FIXED_res

and i get a error like this:

Traceback (most recent call last): File "dl_binder_design/mpnn_fr/dl_interface_design.py", line 259, in <module> main( pdb, silent_structure, mpnn_model, sfd_in, sfd_out ) File "dl_binder_design/mpnn_fr/dl_interface_design.py", line 202, in main dl_design( pose, pdb, silent_structure, mpnn_model, sfd_out ) File "dl_binder_design/mpnn_fr/dl_interface_design.py", line 184, in dl_design seqs_scores = sequence_optimize( pdbfile, chains, mpnn_model, fixed_positions_dict ) File "dl_binder_design/mpnn_fr/dl_interface_design.py", line 103, in sequence_optimize sequences = mpnn_util.generate_sequences( model, device, feature_dict, arg_dict, masked_chains, visible_chains, fixed_positions_dict ) File "dl_binder_design/mpnn_fr/util_protein_mpnn.py", line 267, in generate_sequences X, S, mask, lengths, chain_M, chain_encoding_all, chain_list_list, visible_list_list, masked_list_list, masked_chain_length_list_list, chain_M_pos, omit_AA_mask, residue_idx, dihedral_mask, tied_pos_list_of_lists_list, pssm_coef, pssm_bias, pssm_log_odds_all, bias_by_res_all, tied_beta= tied_featurize( File "dl_binder_design/mpnn_fr/ProteinMPNN/protein_mpnn_utils.py", line 310, in tied_featurize fixed_position_mask[np.array(fixed_pos_list)-1] = 0.0 TypeError: unsupported operand type(s) for -: 'dict' and 'int'

Is this a new bug?

Lyang556 commented 1 year ago

I also found another bug. In the dl_interface_design_multi_seq.py, in line 94. Is it should be sequences = mpnn_util.generate_sequences( model, device, feature_dict, arg_dict, masked_chains, visible_chains, fixed_positions_dict )?

nrbennet commented 1 year ago

The latest PR fixes these. Thanks for pointing this out