tangjianpku / LINE

LINE: Large-scale information network embedding
1.05k stars 408 forks source link

How to evaluate the output embeddings? #15

Open long4glasgow opened 7 years ago

chihming commented 7 years ago

It depends on the task you test. Try using the keywords such as:

long4glasgow commented 7 years ago

I'm trying to reproduce the result of the LINE paper, but fail to find the code for classification and word analogy tasks. For the youtube dataset, I would like to try the classification experiment. For wordembedding task, I would like to try semantic/syntactic accuracy. Do you guys have any idea where can I get the Wikipedia network as well? Many Thanks, Long

chihming commented 7 years ago

A common way is to use libfm to do one-versus-rest classification. For word analogy, this Python toolkit could help you. As to the wiki dataset, you can get the dump data from here or find the preprocess one here. In addition, I maintain a list related to embedding models, which may provide you some info. Check from here.