kiharalab / DOVE

A Deep-learning based dOcking decoy eValuation mEthod
GNU General Public License v3.0
54 stars 11 forks source link

Ligand channel slicing does not take into account the atom type #7

Closed bzoracler closed 3 years ago

bzoracler commented 3 years ago

Issue description

https://github.com/kiharalab/DOVE/blob/2ff844377d1d7f7765df188d4c4bb5a34ae0302c/data_processing/prepare_input.py#L350-L365

The assignment on Line 355,

atom_type=llist[i]

results in a numpy.ndarray being stored in atom_type., and == comparisons of this to 'C', 'CA', 'N', and 'O' will always return False, which means that slicing logic will always go to the else clause on Lines 378-384.

Adding print(f"Atom type = {atom_type}") after this line and operating on Web/Example/Correct.pdb gives:

waiting dealing1
     1  888_goap.pdb                      -59665.59    -31894.54   -27771.05
     1  complex.888.pdb                   -59665.59    -31894.54   -27771.05
in total, we have 210 residues in receptor, 122 residues in ligand
in the interface 10A cut off, we have 63 residue, 550 atoms in the receptor
in the interface 10A cut off, we have 42 residue, 354 atoms in the ligand
after processing, we only remained 550 atoms in receptor, 354 atoms in ligand
271 atoms actually used in this receptor
Atom type = [37.801 -9.438 -1.106]
...DOVE/data_processing/prepare_input.py:361: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if atom_type=='C' or atom_type=='CA':
...DOVE/data_processing/prepare_input.py:367: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  elif atom_type=='N':
...DOVE/data_processing/prepare_input.py:373: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  elif atom_type=='O':
Atom type = [ 3.7409e+01 -8.5740e+00 -4.0000e-03]
Atom type = [37.88  -9.18   1.306]
Atom type = [37.162 -9.808  2.091]
Atom type = [35.89  -8.471  0.066]
Atom type = [35.371 -7.599  1.205]
Atom type = [36.211 -6.872  1.765]
Atom type = [34.158 -7.598  1.429]
Atom type = [35.294 -8.933  9.703]
Atom type = [33.988 -8.302  9.876]
Atom type = [33.293 -8.917 11.063]
Atom type = [ 33.264 -10.107  11.262]
Atom type = [33.2   -8.519  8.58 ]
Atom type = [31.93  -7.697  8.538]
Atom type = [31.191 -7.927  7.214]
Atom type = [32.057 -7.52   6.136]
Atom type = [32.325 -7.919  4.888]
Atom type = [31.715 -8.907  4.291]
Atom type = [33.239 -7.247  4.178]
...
*****Please contact me for details: wang3702@purdue.edu*****
['Correct.pdb', 0.9226311, 0.9772231, 0.25234342, -1, 0.56493485, -1, -1, -1]

Although the high probability scores matches the example in the front page README,

image

there should be clarification whether this is a result of the intention of the algorithm.


Suggested issue resolution

I believe that Line 355 should instead be

atom_type=llist2[i]

in which case, the output is:

waiting dealing1
     1  888_goap.pdb                      -59665.59    -31894.54   -27771.05
     1  complex.888.pdb                   -59665.59    -31894.54   -27771.05
in total, we have 210 residues in receptor, 122 residues in ligand
in the interface 10A cut off, we have 63 residue, 550 atoms in the receptor
in the interface 10A cut off, we have 42 residue, 354 atoms in the ligand
after processing, we only remained 550 atoms in receptor, 354 atoms in ligand
271 atoms actually used in this receptor
Atom type = N
Atom type = CA
Atom type = C
Atom type = O
Atom type = CB
Atom type = CG
Atom type = OD1
Atom type = OD2
Atom type = N
Atom type = CA
Atom type = C
Atom type = O
Atom type = CB
Atom type = CG
Atom type = CD
Atom type = NE
Atom type = CZ
Atom type = NH1
...
*****Please contact me for details: wang3702@purdue.edu*****
['Correct.pdb', 0.55048513, 5.8436563e-06, 0.25234342, -1, 0.5078626, -1, -1, -1]

But this leads to much lower probability scores in the Web/Example/Correct.pdb example.

wang3702 commented 3 years ago

Thanks a lot for you pointing out this! This is my mistake! This means 4 channel results are put into one channel. That somehow I believe decreases the performance.