sehsanm / embedding-benchmark

Word Embedding benchmark project By Shahid Beheshti University NLP Lab
GNU General Public License v3.0
6 stars 16 forks source link

Build or Collect Analogy test data set #11

Open sehsanm opened 5 years ago

sehsanm commented 5 years ago

See : Persian Word Embedding Evaluation Benchmarks They have proposed analogy task. We can simply use their dataset. Otherwise (if not available) or copyright issue we can build one ourselves. The data set must include both semantic and synthetic analogies. Also it must contain categories.

The file should be stored in data/analogy folder

maryambiabani commented 5 years ago

Hi, These are some analogy test set: word-analogy.zip

sehsanm commented 5 years ago

Hi please put the file in repo (/data/analogy) and create a pull request