stebliankin / piston

Evaluating Protein Binding Interfaces with Transformer Networks
Other
40 stars 5 forks source link

In CAPRI-score dataset, why do different docking results have different protein sequences? #3

Open xtzhang0216 opened 1 year ago

xtzhang0216 commented 1 year ago

Hi, congratulations on this fantastic model! I'm very interested in your dataset, and I have a question.

I download CAPRI-score dataset(version 3) from https://zenodo.org/record/7948337. I randomly select target T53 and its PDBid is 4jw2. So I download native structure from PDB database. At the same time, I download docking results from /capri_score/piston_prepare/00-raw_pdbs/. I find the sequence of native protein starts with "PEKAE...", the sequence of T53-1097 starts with "HHHHT...",and the sequence of T53-147 starts with "MRGSH...". I think the docking results for the same complex should only be different in the poses of the receptor and ligand, but in this dataset, there are differences in both structure and sequence of each single chain. Am I wrong?

Looking forward to your reply! thanks in advance

stebliankin commented 1 year ago

Hello,

This is a great observation! Thank you for the detailed inspection. The CAPRI-score dataset consists of docking models predicted by 47 different predictor groups of the CAPRI challenge. Each group had different protocols for side-chain refinement and pre-processing. They could have executed different algorithms for predicting missing residues and fixing clashes. However, if you visualize T53-1097 and T53-147 in PyMol, you will notice that both proteins from a complex look identical. We downloaded the raw PDBs as it is from the reference paper.