Build or find Semantic Similarity Dataset for Persian

sehsanm / embedding-benchmark

Word Embedding benchmark project By Shahid Beheshti University NLP Lab

GNU General Public License v3.0

6 stars 16 forks source link

Build or find Semantic Similarity Dataset for Persian #12

Open sehsanm opened 5 years ago

sehsanm commented 5 years ago

See Persian Word Embedding Evaluation Benchmarks for semantic Relatedness dataset

Also see: J. Camacho-Collados, M. T. Pilehvar, N. Collier, and R. Navigli, “Semeval-2017 task 2: Multilingual and cross-lingual semantic word similarity,” in Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval 2017). Vancouver, Canada, 2017.

The data must be stored in data/wordsim folder

zahramajd commented 5 years ago

I just uploaded Semantic Similarity Dataset file to OneDrive, column 1 and column 2 are the pair words and column 3 is the score of their relatedness or similarity. (@kibamin please consider this format) Unfortunately, I made a mistake and forgot to convert it to .csv, @sehsanm please give me the access to remove and change these files on OneDrive.

sehsanm commented 5 years ago

@zahramajd Please put the file in the repository(data/similarity) and not onedrive (only corpus and models will go there as they are large) I will delete the file from the OneDrive folder