Evaluation for reproducing the paper's result

I am currently conducting research on structure-based drug design using proteins.

I find the concept of fragmentation linking to be a valuable approach in drug design, and I am particularly impressed with your work's ability to consider the conditioning on the protein pocket. Thank you for hard working on it!

I have a few questions regarding your research:

First, I attempted to reproduce the results from your paper using the Sampling section (https://github.com/igashov/DiffLinker#sampling). However, I noticed that the results for the ZINC and GEOM datasets differ significantly from the paper's reported results, especially concerning the SA score. While the paper's SA score is approximately 3.x, my results yielded a score of 6.x. I'm unsure why these results are different. Is there an additional step required to accurately reproduce the paper's findings?
Unfortunately, as mentioned in the readme, there is no pocket linker prediction model available. Therefore, I was unable to conduct experiments with the Pocket dataset. Could you provide some suggestions on how I can reproduce the paper's results without this model? Additionally, I am curious about the linker prediction model used in the Table 5 pocket section.
I came across Figure 2 in the paper, which showcases examples of linkers sampled by DiffLinker conditioned on pocket atoms. I attempted to replicate these results using the same molecule fragments from the MOAD dataset and the awesome hugging face. However, I couldn't achieve the same results as presented in the paper, even when utilizing the same protein anchors. Could you please guide me on how to accurately reproduce the results shown in Figure 2?
I am interested in reproducing the results shown in Figure 4 and 5 from the paper. However, I encountered a challenge as there is no index provided in the paper, which prevents me from attempting the test. Could you kindly provide me with the necessary information about the fragments used in the Figure 4 and 5 datasets? This would be immensely helpful in my efforts to replicate the results accurately

If you require any specific information or have any additional questions to aid in reproducing my experiments, please let me know, and I will promptly provide the requested details.

Thank you for your time and consideration.

@igashov

@minju-hits thank you for your message and kind words! :)

Could you please share the exact commands you used to run the evaluation? If possible, could you please also send your samples and the evaluation results? I would like to take a look at how exactly you ran the evaluation but potentially relevant things I would recommend to look at are the following:
- Did you use the right model to generate linkers? Depending on the dataset you sample for, the models are different and are provided here.
- Did you process the samples with OpenBabel prior to running compute_metrics.py? Basically, did you follow all the steps explained here?
All three models used for pocket conditioning experiments are available on Zenodo as specified here. In particular, there is the direct download link for the model used in Table 5.
Obtaining exactly the same molecules as in Figure 2 can be tricky for several reasons:
- Unfortunately, we did not track the random seed when sampling which means that there is a chance that you won't get exactly same structures when resampling.
- We sampled 100 linkers per input and then manually selected those to put in the figure based on the SA and QED scores.
- Based on the previous question, you most likely used a different model. Please consider using a pocket-conditioned model.
From our side, if it can be somewhat helpful, we can provide a table with the samples we generated for this experiment, where the molecules from Figure 2 are present.
Here are the indices of the input fragments:
- Figure 4 (GEOM test set): uuids of the input sets of fragments from this table are 0, 311, 1289, 156, 200, 788, 1205, 83.
- Figure 5 (ZINC test set): uuids of the input sets of fragments from this table are 0 and 200.
- Figure 5 (CASF test set): uuids of the input sets of fragments from this table are 120 and 0.

I hope it helps!

Thank you so much for dedicating your time and attention to my question.

I have followed the README's sampling and evaluation section and made the necessary modifications to suit the specific dataset requirements. Consequently, I conducted tests on three datasets (GEOM, ZINC, MOAD). Below is the exact command I used, with details adjusted according to the respective datasets:

1.1 sampling

Download test dataset (the exact the same command)
Download the necessary models (the exact the same command)
Create necessary directories mkdir -p samples
execute the command

[zinc, difflinker]

python -W ignore sample.py \
                 --checkpoint models/zinc_difflinker.ckpt \
                 --samples samples \
                 --data datasets \
                 --prefix zinc_final_test \
                 --n_samples 1 \
                 --device cuda:0

[zinc, difflinker(sampled_size)]

python -W ignore sample.py \
                 --checkpoint models/zinc_difflinker.ckpt \
                 --linker_size_model models/zinc_size_gnn.ckpt \
                 --samples samples \
                 --data datasets \
                 --prefix zinc_final_test \
                 --n_samples 1 \
                 --device cuda:0

[zinc, difflinker(given_anchors)]

python -W ignore sample.py \
                 --checkpoint models/zinc_difflinker_given_achors.ckpt \
                 --samples samples \
                 --data datasets \
                 --prefix zinc_final_test \
                 --n_samples 1 \
                 --device cuda:0

[zinc, difflinker(given_anchors, sampled_size)]

python -W ignore sample.py \
                 --checkpoint models/zinc_difflinker_given_achors.ckpt \
                 --linker_size_model models/zinc_size_gnn.ckpt \
                 --samples samples \
                 --data datasets \
                 --prefix zinc_final_test \
                 --n_samples 1 \
                 --device cuda:0

1.2. The .xyz files is generated in the .samples To evaluate this generated molecules, I follow the evaluation section.

download ground-truth SMILES and SDF files of the molecules in the datasets directory. In this resource
run OpenBabel to reformat the data mkdir -p formatted

[zinc, difflinker]

python -W ignore reformat_data_obabel.py \
                 --samples samples \
                 --dataset zinc_final_test \
                 --true_smiles_path datasets/zinc_final_test_smiles.smi \
                 --checkpoint zinc_difflinker \
                 --formatted formatted

[zinc, difflinker(sampled_size)]

python -W ignore reformat_data_obabel.py \
                 --samples samples \
                 --dataset zinc_final_test \
                 --true_smiles_path datasets/zinc_final_test_smiles.smi \
                 --checkpoint zinc_difflinker \
                 --formatted formatted \
                 --linker_size_model_name zinc_size_gnn

[zinc, difflinker(given_anchors)]

python -W ignore reformat_data_obabel.py \
                 --samples samples \
                 --dataset zinc_final_test \
                 --true_smiles_path datasets/zinc_final_test_smiles.smi \
                 --checkpoint zinc_difflinker_given_anchors \
                 --formatted formatted

[zinc, difflinker(given_anchors, sampled_size)]

python -W ignore reformat_data_obabel.py \
                 --samples samples \
                 --dataset zinc_final_test \
                 --true_smiles_path datasets/zinc_final_test_smiles.smi \
                 --checkpoint zinc_difflinker_given_anchors \
                 --formatted formatted \
         --linker_size_model_name zinc_size_gnn

run evaluation scripts [zinc, difflinker]

python -W ignore compute_metrics.py \
             ZINC \
             formatted/zinc_difflinker/zinc_final_test.smi \
             datasets/zinc_final_train_linkers.smi \
             5 1 None \
             resources/wehi_pains.csv \
             diffusion

[zinc, difflinker(sampled_size)]

python -W ignore compute_metrics.py \
                 ZINC \
                 formatted/zinc_difflinker/sampled_size/zinc_size_gnn/zinc_final_test.smi \
                 datasets/zinc_final_train_linkers.smi \
                 5 1 None \
                 resources/wehi_pains.csv \
                 diffusion

[zinc, difflinker(given_anchors)]

python -W ignore compute_metrics.py \
                 ZINC \
                 formatted/zinc_difflinker_given_anchors/zinc_final_test.smi \
                 datasets/zinc_final_train_linkers.smi \
                 5 1 None \
                 resources/wehi_pains.csv \
                 diffusion

[zinc, difflinker(given_anchors, sampled_size)]

python -W ignore compute_metrics.py \
                 ZINC \
                 formatted/zinc_difflinker_given_anchors/sampled_size/zinc_size_gnn/zinc_final_test.smi \
                 datasets/zinc_final_train_linkers.smi \
                 5 1 None \
                 resources/wehi_pains.csv \
                 diffusion

I have attached the output files and evaluation summary files for your reference. formatted.zip moad_test_full.zip zinc_final_test.zip geom_multifrag_test.zip

Yes, I can download the diffusion model conditioning on the pockets in this list on the Github. However, I noticed that there is no checkpoint available for "[Pockets] Size GNN" used to predict the linker size. Also, In Table 5 and 6 of the paper, there is no notation about the sampled size. I am curious about how you conducted the experiments based on protein pockets and managed to predict the linker size without this information.
Thank you for sharing the information about these things. If possible, could you provide an example of the generated results for these experiments? It would be helpful for my current research.
Thank you so much for providing this information. I am currently searching for the molecule in the geom_multifrag_test_frag.sdf file, and having access to this will save me a considerable amount of time.
I have also conducted sampling using Hugging Face. By examining the fragments in 3D, I sampled 231 samples from the GEOM test set. Following this, I converted this sdf files to smiles file and data pickle file using this scrip(reformat_smi.py). Despite computing the modified script(compute_metrics_hf.py)
```
python -W ignore compute_metrics_hf.py \
             GEOM \
     [path]/geom_multifrag_output/hf_geom_final_test.smi \
             datasets/geom_multifrag_train_linkers.smi \
             5 1 None \
             resources/wehi_pains.csv \
             diffusion
```
The result differs from the one presented in the paper's Table. I have attached the output files for your reference. 5.huggingface_scipt_summary.zip

I would appreciate any guidance on how to achieve better results and obtain more favorable metric values. Alternatively, I wonder if I might have overlooked any steps in the evaluation process.

If there's anything else you'd like to share or if you need further assistance, please let me know.

Sincerly, Minju

I checked samples for [zinc, difflinker] and apparently you have a different version of OpenBabel. Basically, I first used your data (incl OpenBabel output), reran evaluation, and got the same numbers as you. Then I reran OpenBabel using your .xyz files and then again reran the evaluation script and got different numbers that are very close to what we report in the paper (recovery and uniqueness are of course different as you sampled only one linker per input). Please find my files attached. If you compare .sdf files produced by your and my OpenBabel, you will see some differences that cause the further changes in numbers. Therefore, please check that your OpenBabel version in 3.0.0.

Samples and results with my OpenBabel version:
- samples.tar.gz
- formatted.tar.gz
Experiments with Pockets dataset were conducted with the ground-truth linker sizes, not sampled ones.
The uuid of the input set of fragments from this Pockets test set table used in Figure 2 is 34038. I am attaching the tables of all samples made by DiffLinker (conditioned and unconditioned) on the Pockets test set. Unfortunately, these tables do not contain uuid, but you can find the relevant samples by searching for the right smiles of input fragments: C1CCCCC1.CC(O)N[C@H]1CCC2OCNC2C1.

Tables with samples: samples_used_for_Figure_2.tar.gz
Happy to help!
In order to reproduce the paper results on a specific dataset I would recommend to use command line and the code provided in this repository.

Best regards Ilia

I am pleased to share that I finally achieved the desired result. I truly appreciate your kind and detailed assistance. (It was not important, I did encounter a small issue where the "formatted.tar.gz" file turned out to be empty, but I managed to create the formatted data using the samples you provided.)
,3 4. 5. I understand now. Thank you for responding to my questions and for consistently providing the results.

When I first discovered your work, I was surprised by the excellent implementation using Hugging Face and the well-organized repository. However, I encountered some issues with reproducibility. Before reaching out to you, I had lost my way in conducting the experiments. Thankfully, with the exact and detailed answer you provided, I was able to resolve this problem.

Thank you once again for your invaluable help. I am looking forward to your future works.

Best regards MinJu

Oh indeed it's empty. Here it is – should be with tables: formatted.tar.gz

Thank you for your nice feedback. I am glad to hear that you liked our work and that you managed to solve the issues!

Best regards Ilia

igashov / DiffLinker

Evaluation for reproducing the paper's result #5