timkartar / DeepPBS

Geometric deep learning of protein–DNA binding specificity
BSD 3-Clause "New" or "Revised" License
46 stars 5 forks source link

Reproducing the results on 3Q05 #10

Closed WangJiuming closed 1 month ago

WangJiuming commented 1 month ago

Hi,

Thank you for the exciting work!

I have been trying to reproduce the results on the PDB: 3Q05 as shown in Fig. 4 of the paper, but it seems that my results are a little bit different from the Source data for Fig. 4. Here is what I've done.

  1. set up the environment as the instructed
  2. download the 3Q05's structure from RCSB: 3Q05.pdb
  3. run the pipeline successfully without errors: ./process_and_predict.sh and ./vis_interpret.sh 3Q05
  4. use the post-processing code in #8 to map the atom indices to the RI differences

Here is the relevant part of the output (sorted):

Screenshot 2024-09-11 at 19 15 05

In comparison, the source data shows that the LYS120C residue should have the maximum RI, followed by LYS120B, ARG280A, and then ARG280D.

Is this something that may be expected on different devices or are there any steps that I am doing incorrectly that may be causing the inconsistency? Thank you for your help in advance!

timkartar commented 1 month ago

Hi there,

Thanks for pointing this out. The model always outputs a forward and a reverse prediction. Either of which could be used in these tasks (or an average of both).

Turns out the github version of interpret.py was not using the same version used to generate source data for 3Q05. I changed that to be same, see commit 738a9d272057c10196a0f0766c0b522743c5403b . If you incorporate this change, it will result in values consistent in order, with the source data. There will still be some numerical differences, because the per atom perturbation values are extremely small.

I shall note that the Max aggregation is probably not very robust, compared to average. Also, Lys120 and Arg280 are both important residues and i would not make any strong scientific inference based on their order.

Closing this issue now. Feel free to reopen if you have any further questions :)