westlake-repl / SaProt

[ICLR'24 spotlight] Saprot: Protein Language Model with Structural Alphabet
MIT License
271 stars 25 forks source link

EC GO results #23

Closed Heisenburger2020 closed 2 months ago

Heisenburger2020 commented 3 months ago

Dear Sir,

Why is the EC GO result in Saprot so much lower than the original paper "Enhancing Protein Language Model with Structure-based Encoder and Pre-training"? I wonder if the dataset is different.

LTEnjoy commented 3 months ago

Hello!

Yes, our dataset is different from their original paper. We have mentioned that in our paper: image

This pre-processing resulted in different datasets in all tasks such as EC and GO. Therefore the results reported in our paper should not be directly compared to other papers.

LTEnjoy commented 2 months ago

Hi!

After careful examination, we found out that there was a silght difference for EC and GO evaluation.

Specifically, we copied the evaluation function from GearNet.

1713410596007

This function requires the input shape to be (B, N), where B is the number of proteins and N is the number of labels. However, our predictions and targets were flatten before evaluation, which means their shape were (1, B*N). This wouldn't cause an error but would lead to the reported results to be lower. Intuitively, this evaluation is like on a global level and previous evaluation is like an averaged result among proteins.

We have revised our evaluation code by reshaping the input tensors, as shown below: image

Kindly note that our key conclusion will not change, namely that Saprot still remains a SOTA model under the new setting (will update new results soon).

Thank you again for pointing out such problem! :)

LTEnjoy commented 2 months ago

Hi! New results have been updated!