Failed to predict protein-peptide interactions in PPD-bench

xiaoxiao349 commented 1 year ago

Hi Twopin, I tried to use the CAMP.h5 model you uploaded here to predict the protein-peptide interactions in PPD-bench which contained 133 pdb files. But I got AUC=0.52, much lower than the 0.86 reported in the paper. Could you please give some advice or opinions on the following aspects?

The SSpro can only predict peptide longer than 30 aa, so I deliberately added "AAAAAAAAAAAAAAAAAAAA" sequence to the peptide shorter than 30 aa. Would this influence the result? If so, may I ask how to circumvent this problem to get the secondary structure of the peptide ?
May I ask why there's only 111 proteins and 110 peptides in the PPD-bench as mentioned in your paper?

twopin commented 1 year ago

Sure. I used PPDbench directly from previous study. I don't understand "he SSpro can only predict peptide longer than 30 aa". Why? I didn't have such length limitations. I think this accuracy is weired and I'll upload my PPDbench dataset, inference script and result log here this week. I hope this can help you solve the problem.

xiaoxiao349 commented 1 year ago

Sure. I used PPDbench directly from previous study. I don't understand "he SSpro can only predict peptide longer than 30 aa". Why? I didn't have such length limitations. I think this accuracy is weired and I'll upload my PPDbench dataset, inference script and result log here this week. I hope this can help you solve the problem.

OK. Thank you so much.

twopin commented 1 year ago

Sure. I used PPDbench directly from previous study. I don't understand "he SSpro can only predict peptide longer than 30 aa". Why? I didn't have such length limitations. I think this accuracy is weired and I'll upload my PPDbench dataset, inference script and result log here this week. I hope this can help you solve the problem.

OK. Thank you so much.

Hi, I hope the discussion yesterday can solve your concerns. There are several key points may cause the difference: 1. the SS feature you generated are derived from different version of SCRATCH, after comparing we noticed that there are many differences; 2. Although we used the same PDDbench list, we used different sampling methods to generate negatives. 3. CAMP adopts UniProt sequences instead of PDB fasta sequences. 4. The Intrinsic Disorder values are different. To solve these problems, I already send you my PDDbench data (including negatives), my inference results and evaluation scripts. Hope these stuff can help you.

twopin / CAMP

Failed to predict protein-peptide interactions in PPD-bench #35