eg: Last weekened I was in NY.
I am traveling to new york next weekend.
If you train a word2vec model on this or do any sort of NLP it will treat NY and new york as 2 different words.
Instead if you create a synonym dictionary like:
eg: NY=>new york
new york=>new york
Then you can extract NY and new york as the same text.
ERROR IN IDEA for overlapping synonims sets
FOR EXAMPLE IN CASE WE HAVE
a0 synonims -> a2,a6,a7
a6 synonims -> a0,a7, b23
THEN IF IN TEXT WE SEE a7
THEN WHAT SHOULD BE SUBSTITUTED INSTEAD OF a7
OR
a0
or
a6
as you mentined in https://github.com/vi3k6i5/synonym-extractor Why Say you have a corpus where similar words appear frequently.
eg: Last weekened I was in NY. I am traveling to new york next weekend. If you train a word2vec model on this or do any sort of NLP it will treat NY and new york as 2 different words.
Instead if you create a synonym dictionary like:
eg: NY=>new york new york=>new york Then you can extract NY and new york as the same text.
ERROR IN IDEA for overlapping synonims sets FOR EXAMPLE IN CASE WE HAVE a0 synonims -> a2,a6,a7 a6 synonims -> a0,a7, b23
THEN IF IN TEXT WE SEE a7 THEN WHAT SHOULD BE SUBSTITUTED INSTEAD OF a7 OR a0 or a6