ketatam / DiffDock-PP

Implementation of DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models in PyTorch (ICLR 2023 - MLDD Workshop)
https://arxiv.org/abs/2304.03889
188 stars 36 forks source link

How can i use the code to dock two proteins i am interested in? #11

Closed hima111997 closed 1 year ago

hima111997 commented 1 year ago

Hi Thank you for providing the code. but I have a hard time figuring out how to run it on two proteins of interest. there are many .sh and config files. could you tell me which file with which parameters I should use to dock two proteins?

Thanks

hima111997 commented 1 year ago

i have read the closed issues and manage to run it but now it produces this error:

Traceback (most recent call last): File "/content/DiffDock-PP/src/main_inf.py", line 620, in <module> main() File "/content/DiffDock-PP/src/main_inf.py", line 354, in main dump_predictions(args,results) File "/content/DiffDock-PP/src/main_inf.py", line 383, in dump_predictions with open(args.prediction_storage, 'wb') as f: FileNotFoundError: [Errno 2] No such file or directory: 'storage/run_on_pdb_pairs.pkl'

these two files are generated: splits_test_cache_v2_b.pkl splits_test_esm_b.pkl

this is the whole output:

SCORE_MODEL_PATH: checkpoints/large_model_dips/fold_0/
CONFIDENCE_MODEL_PATH: checkpoints/large_model_dips/fold_0/
SAVE_PATH: ckpts/run_on_pdb_pairs
14:51:04 Starting Inference
14:51:04 Using Bound structures
data loading: 100%|█| 1/1 [00:00<00:00, 17549.39it
14:51:04 Loaded cached ESM embeddings
14:51:04 finished tokenizing residues with ESM
14:51:04 finished tokenizing all inputs
14:51:04 1 entries loaded
14:51:04 finished loading raw data
14:51:04 running inference
14:51:04 finished creating data splits
/usr/local/envs/diffdock_pp/lib/python3.10/site-packages/torch/jit/_check.py:181: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in `__init__`. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in `torch.jit.Attribute`.
  warnings.warn("The TorchScript type system doesn't support "
14:51:06 loaded model with kwargs: 
checkpoint checkpoints/large_model_dips/fold_0/model_best_338669_140_31.084_30.347.pth
14:51:06 loaded checkpoint from checkpoints/large_model_dips/fold_0/model_best_338669_140_31.084_30.347.pth
14:51:10 loaded model with kwargs: 
checkpoint checkpoints/confidence_model_dips/fold_0/model_best_0_6_0.241_0.887.pth
14:51:10 loaded checkpoint from checkpoints/confidence_model_dips/fold_0/model_best_0_6_0.241_0.887.pth
14:51:10 finished loading model
args.temp_sampling: 2.439
  0% 0/1 [00:00<?, ?it/s]14:51:37 Completed 0 out of 40 steps
14:51:52 Completed 1 out of 40 steps
14:52:07 Completed 2 out of 40 steps
14:52:21 Completed 3 out of 40 steps
14:52:35 Completed 4 out of 40 steps
14:52:50 Completed 5 out of 40 steps
14:53:03 Completed 6 out of 40 steps
14:53:15 Completed 7 out of 40 steps
14:53:27 Completed 8 out of 40 steps
14:53:36 Completed 9 out of 40 steps
14:53:46 Completed 10 out of 40 steps
14:53:53 Completed 11 out of 40 steps
14:54:01 Completed 12 out of 40 steps
14:54:08 Completed 13 out of 40 steps
14:54:15 Completed 14 out of 40 steps
14:54:21 Completed 15 out of 40 steps
14:54:27 Completed 16 out of 40 steps
14:54:33 Completed 17 out of 40 steps
14:54:39 Completed 18 out of 40 steps
14:54:44 Completed 19 out of 40 steps
14:54:50 Completed 20 out of 40 steps
14:54:55 Completed 21 out of 40 steps
14:55:00 Completed 22 out of 40 steps
14:55:06 Completed 23 out of 40 steps
14:55:11 Completed 24 out of 40 steps
14:55:16 Completed 25 out of 40 steps
14:55:22 Completed 26 out of 40 steps
14:55:27 Completed 27 out of 40 steps
14:55:32 Completed 28 out of 40 steps
14:55:37 Completed 29 out of 40 steps
14:55:42 Completed 30 out of 40 steps
14:55:47 Completed 31 out of 40 steps
14:55:52 Completed 32 out of 40 steps
14:55:58 Completed 33 out of 40 steps
14:56:03 Completed 34 out of 40 steps
14:56:08 Completed 35 out of 40 steps
14:56:13 Completed 36 out of 40 steps
14:56:18 Completed 37 out of 40 steps
14:56:23 Completed 38 out of 40 steps
14:56:28 Completed 39 out of 40 steps
loader len:  40

  0% 0/40 [00:00<?, ?it/s]
  2% 1/40 [00:03<02:16,  3.51s/it]
  5% 2/40 [00:05<01:36,  2.54s/it]
  8% 3/40 [00:05<00:55,  1.51s/it]
 12% 5/40 [00:06<00:30,  1.14it/s]
 18% 7/40 [00:06<00:17,  1.90it/s]
 22% 9/40 [00:06<00:11,  2.81it/s]
 28% 11/40 [00:06<00:07,  3.89it/s]
 32% 13/40 [00:07<00:05,  4.91it/s]
 35% 14/40 [00:07<00:05,  5.05it/s]
 40% 16/40 [00:07<00:03,  6.35it/s]
 45% 18/40 [00:07<00:02,  7.55it/s]
 50% 20/40 [00:07<00:02,  8.56it/s]
 55% 22/40 [00:08<00:01,  9.46it/s]
 60% 24/40 [00:08<00:01, 10.15it/s]
 65% 26/40 [00:08<00:01, 10.65it/s]
 70% 28/40 [00:08<00:01, 11.25it/s]
 75% 30/40 [00:08<00:00, 11.35it/s]
 80% 32/40 [00:08<00:00, 11.07it/s]
 85% 34/40 [00:09<00:00, 11.36it/s]
 90% 36/40 [00:09<00:00, 11.55it/s]
 95% 38/40 [00:09<00:00, 11.93it/s]
100% 40/40 [00:09<00:00,  4.20it/s]
14:56:38 Finished Complex!
100% 1/1 [05:26<00:00, 326.53s/it]
14:56:38 Finished run run_on_pdb_pairs
temp sampling, temp_psi, temp_sigma_data_tr, temp_sigma_data_rot: (2.439, 0.216, 0.593, 0.228)
filtering_model_path: checkpoints/confidence_model_dips/fold_0/
Total time spent: 333.6226415634155
ligand_rmsd_summarized: {'mean': 70.51095, 'median': 70.51095, 'std': 0.0, 'lt1': 0.0, 'lt2': 0.0, 'lt5': 0.0, 'lt10': 0.0}
complex_rmsd_summarized: {'mean': 24.50482, 'median': 24.50482, 'std': 0.0, 'lt1': 0.0, 'lt2': 0.0, 'lt5': 0.0, 'lt10': 0.0}
interface_rmsd_summarized: {'mean': 23.4743, 'median': 23.4743, 'std': 0.0, 'lt1': 0.0, 'lt2': 0.0, 'lt5': 0.0, 'lt10': 0.0}
Traceback (most recent call last):
  File "/content/DiffDock-PP/src/main_inf.py", line 620, in <module>
    main()
  File "/content/DiffDock-PP/src/main_inf.py", line 354, in main
    dump_predictions(args,results)
  File "/content/DiffDock-PP/src/main_inf.py", line 383, in dump_predictions
    with open(args.prediction_storage, 'wb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'storage/run_on_pdb_pairs.pkl'

this is the .sh file I am using:

NUM_FOLDS=1  # number of seeds to try, default 5
SEED=0  # initial seed
CUDA=0  # will use GPUs from CUDA to CUDA + NUM_GPU - 1
NUM_GPU=1
BATCH_SIZE=1  # split across all GPUs
NUM_SAMPLES=40

NAME="single_pair_inference"  # change to name of config file
RUN_NAME="run_on_pdb_pairs"
CONFIG="config/${NAME}.yaml"

SAVE_PATH="ckpts/${RUN_NAME}"
VISUALIZATION_PATH="visualization/${RUN_NAME}"
STORAGE_PATH="storage/${RUN_NAME}.pkl"

FILTERING_PATH="checkpoints/confidence_model_dips/fold_0/"
SCORE_PATH="checkpoints/large_model_dips/fold_0/"

echo SCORE_MODEL_PATH: $SCORE_PATH
echo CONFIDENCE_MODEL_PATH: $SCORE_PATH
echo SAVE_PATH: $SAVE_PATH

python src/main_inf.py \
    --mode "test" \
    --config_file $CONFIG \
    --run_name $RUN_NAME \
    --save_path $SAVE_PATH \
    --batch_size $BATCH_SIZE \
    --num_folds $NUM_FOLDS \
    --num_gpu $NUM_GPU \
    --gpu $CUDA --seed $SEED \
    --logger "wandb" \
    --project "DiffDock Tuning" \
    --visualize_n_val_graphs 25 \
    --visualization_path $VISUALIZATION_PATH \
    --filtering_model_path $FILTERING_PATH \
    --score_model_path $SCORE_PATH \
    --num_samples $NUM_SAMPLES \
    --prediction_storage $STORAGE_PATH \
    #--entity coarse-graining-mit \
    #--debug True # load small dataset

this is the yaml file:

    ---
 # file is parsed by inner-most keys only
 data:
     dataset: db5
     data_file: datasets/single_pair_dataset/splits_test.csv
     data_path: datasets/single_pair_dataset
     resolution: residue
     no_graph_cache: True
     knn_size: 20
     use_orientation_features: False
     multiplicity: 1
     use_unbound: False
 model:
     model_type: e3nn
     no_torsion: True
     no_batch_norm: True
     lm_embed_dim: 1280
     dropout: 0.0
     dynamic_max_cross: True
     cross_cutoff_weight: 3
     cross_cutoff_bias: 40
     cross_max_dist: 80
     num_conv_layers: 4
     ns: 16
     nv: 4
     dist_embed_dim: 32
     cross_dist_embed_dim: 32
     sigma_embed_dim: 32
     max_radius: 5.
 train:
     patience: 2000
     epochs: 2000
     lr: 1.e-3
     weight_decay: 0.
     tr_weight: 0.5
     rot_weight: 0.5
     tor_weight: 0.
     val_inference_freq: 10
     num_steps: 40
     actual_steps: 40
 diffusion:
     tr_s_min: 0.01
     tr_s_max: 30.0
     rot_s_min: 0.01
     rot_s_max: 1.65
     sample_train: True
     num_inference_complexes_train_data: 1200
 inference:
     mirror_ligand: False
     run_inference_without_confidence_model: False
     wandb_sweep: False
     no_final_noise: True
     # optimized for without conf_model
     temp_sampling: 2.439 # default 1.0. Set this to 1.0 to deactivate low temp sampling
     temp_psi: 0.216 # default 0.0
     temp_sigma_data_tr: 0.593 # default 0.5
     temp_sigma_data_rot:  0.228 # default 0.5
    #  temp_sampling: 5.33 # default 1.0. Set this to 1.0 to deactivate low temp sampling
    #  temp_psi: 1.05 # default 0.0
    #  temp_sigma_data_tr: 0.40 # default 0.5
    #  temp_sigma_data_rot:  0.64 # default 0.5

the .csv file contains this line:

path,split
7c8d,test
ketatam commented 1 year ago

Hi! Thanks for your interest in our work and for raising this issue.

You get this error because the folder storage does not exist and was not pushed along with the code. You can simply solve it by creating a folder named storage in the main repo.