isayevlab / pKa-ANI

Accurate prediction of protein pKa with representation learning
Other
40 stars 10 forks source link

pKa-ANI produces bogus pKa predictions for non-titratable residues because chain ID is not accounted for #5

Closed sastrys1 closed 6 months ago

sastrys1 commented 7 months ago

Say you have a list of titratable residues [3, 4, 1, 2, 3, 1, 2] out of a list of 4 residues each (1, 2, 3, 4) for 3 separate chains. In such a case, the current code will try to compute a pKa prediction for residues 1 and 2 for chain 1, even though they are not listed. This is because the list of titratable residues will be [3, 4, 1, 3, 1, 2] and in the current code it only checks if 1 and 2 are in the titratable list, doesn't check for chain ID.

See the response of the program to the test PDB files that I have given. The first one gives a pKa prediction of a bunch of extraneous residues (simply copying the previous titratable residue's descriptors and model). The second one runs into a runtime error because it tries to find a prediction/model when none has been assigned yet (because the "if titratable" clause gets triggered by a future residue with the same number on a different chain).

1brs.pdb.txt 6oge_clean.pdb.txt

I will shortly make a pull request where I tweak the storage of titratable residues to also include chain, so now the pKa prediction will only happen if the residue is one of the titratable residues, and the check for the exact residue and chain is more precise. You can check the output log for the difference in pKa prediction list new vs. old version of the calculate_pkaani() function.

sastrys1 commented 6 months ago

@HGokcan @isayev has anyone been able to review this/test on their end if they are getting the same errors?

HGokcan commented 6 months ago

@sastrys1 thank you for bringing this issue to our attention. I have merged the request.