zjunlp / OntoProtein

[ICLR 2022] OntoProtein: Protein Pretraining With Gene Ontology Embedding
MIT License
141 stars 22 forks source link

Inconsistent data statistics between the downloaded dataset and the reported statistics #28

Closed acharkq closed 1 year ago

acharkq commented 1 year ago

Hi Authors,

Thanks for the great work!

As I was checking your dataset, I found that the dataset statistics of downloaded dataset are different from those reported in this link.

Specifically, the downloaded valid set and test set is much larger than the reported size. I also found that the valid set contained data from the training set.

Alexzhuan commented 1 year ago

Hi,

The statistics presented on this webpage are no longer the most current. Our dataset has already been updated previously, implying that the statistics should align with the most recent dataset provided.

Furthermore, given that our dataset's validation and test sets are collected based on time, the triplets within the final test set should be obtained by excluding the triplet data found in the training set.

acharkq commented 1 year ago

Thanks for the information