This is a nice addition, but in practice it would be useless : most vocab that we are trying to match with the glove vectors usually have missing words (not in the glove vocab). 99% of the time, we would have to read the entire file.
Also, because glove files are alphabetically ordered, if you have zoo in your vocab for instance (somewhat a likely event), you will have to read till this line in the glove file (again, here reading maybe something like 98% of the words).
I prefer to deny this PR in order to keep the code simple as this addition would not yield significant boost in performance.
Hey @FranciscoBorges ,
Thanks for contributing!
This is a nice addition, but in practice it would be useless : most vocab that we are trying to match with the glove vectors usually have missing words (not in the glove vocab). 99% of the time, we would have to read the entire file. Also, because glove files are alphabetically ordered, if you have
zoo
in your vocab for instance (somewhat a likely event), you will have to read till this line in the glove file (again, here reading maybe something like 98% of the words). I prefer to deny this PR in order to keep the code simple as this addition would not yield significant boost in performance.