@benschmidt has written an interesting blog post on the use of a method he calls 'vector rejection' to separate words with ambiguous meanings.
During experimentation with a Nepali news corpus dataset, I found his method to be more useful to discard unwanted vectors than the existing method with most_similar.
I have recreated his method (which he has in R) in this gist and have been working with it for the last few days. In my (admittedly limited) series of experiments it seems to have quite a lot of value. Yoav Goldberg has a twitter thread about the operation/post here.
I bring this up because someone might want to look it over/possibly see if this aligns with the project? Please close the issue if you believe otherwise.
@benschmidt has written an interesting blog post on the use of a method he calls 'vector rejection' to separate words with ambiguous meanings.
During experimentation with a Nepali news corpus dataset, I found his method to be more useful to discard unwanted vectors than the existing method with most_similar.
I have recreated his method (which he has in R) in this gist and have been working with it for the last few days. In my (admittedly limited) series of experiments it seems to have quite a lot of value. Yoav Goldberg has a twitter thread about the operation/post here.
I bring this up because someone might want to look it over/possibly see if this aligns with the project? Please close the issue if you believe otherwise.
edit: correct link.