snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 397 forks source link

Detailed sentence bert model for the MAG240M dataset #476

Open tjb-tech opened 3 months ago

tjb-tech commented 3 months ago

Hi,the team of ogb. Thanks for your efforts on collecting and processing these valuable datasets! You said you used a RoBERTa sentence encoder to generate a 768-dimensional vector for each paper node in the MAG240M dataset. May I ask which RoBERTa sentence encoder you used? Could you give me a specific huggingface link? Best regards!

image

@weihua916 @rusty1s

weihua916 commented 3 months ago

https://huggingface.co/sentence-transformers/roberta-base-nli-stsb-mean-tokens

from sentence_transformers import SentenceTransformer
SentenceTransformer('roberta-base-nli-stsb-mean-tokens')
tjb-tech commented 3 months ago
mean

Thank you very much for the quick reply! Because this model is deprecated, do you consider to provide the original feature processing script for the MAG240M paper nodes? So that we can process the feature with more powerful sentence bert modelšŸ¤— Best regards!