thunlp / OpenHowNet

Core Data of HowNet and OpenHowNet Python API
https://openhownet.thunlp.org/
MIT License
608 stars 89 forks source link

Synonym extraction based on similarity #13

Closed jind11 closed 4 years ago

jind11 commented 4 years ago

Hi, I am pretty interested in looking into the synonym extractions based on the sememe tree similarity using HowNet. I am wondering whether you or the original authors of the HowNet have benchmarked this method on some standard similarity evaluation dataset such as the SimLex-999 dataset and compared this method with some other popular methods for synonym extractions such as counter-fitting word embeddings. It would be great to have your thoughts on this topic. Thanks a lot!

Fanchao-Qi commented 4 years ago

Hi, we haven't evaluated the sememe-based word similarity computation method on English datasets. In fact, there are some "old" papers that focus on Chinese word similarity computation with the help of sememes, e.g., Liu et al. 2002 [pdf], and Liu et al. 2013 [pdf]. But to the best of our knowledge, no previous work tries experimenting on English.