Reproducing the results on 3Q05

timkartar / DeepPBS

Geometric deep learning of protein–DNA binding specificity

BSD 3-Clause "New" or "Revised" License

46 stars 5 forks source link

Hi,

Thank you for the exciting work!

I have been trying to reproduce the results on the PDB: 3Q05 as shown in Fig. 4 of the paper, but it seems that my results are a little bit different from the Source data for Fig. 4. Here is what I've done.

set up the environment as the instructed
download the 3Q05's structure from RCSB: 3Q05.pdb
run the pipeline successfully without errors: ./process_and_predict.sh and ./vis_interpret.sh 3Q05
use the post-processing code in #8 to map the atom indices to the RI differences

Here is the relevant part of the output (sorted):

In comparison, the source data shows that the LYS120C residue should have the maximum RI, followed by LYS120B, ARG280A, and then ARG280D.

Is this something that may be expected on different devices or are there any steps that I am doing incorrectly that may be causing the inconsistency? Thank you for your help in advance!

Hi there,

Thanks for pointing this out. The model always outputs a forward and a reverse prediction. Either of which could be used in these tasks (or an average of both).

Turns out the github version of interpret.py was not using the same version used to generate source data for 3Q05. I changed that to be same, see commit 738a9d272057c10196a0f0766c0b522743c5403b . If you incorporate this change, it will result in values consistent in order, with the source data. There will still be some numerical differences, because the per atom perturbation values are extremely small.

I shall note that the Max aggregation is probably not very robust, compared to average. Also, Lys120 and Arg280 are both important residues and i would not make any strong scientific inference based on their order.

Closing this issue now. Feel free to reopen if you have any further questions :)

timkartar / DeepPBS

Reproducing the results on 3Q05 #10