guillaumegenthial / tf_ner

Simple and Efficient Tensorflow implementations of NER models with tf.estimator and tf.data
Apache License 2.0
924 stars 275 forks source link

Stop reading Glove file if vocabulary is complete #7

Closed FranciscoBorges closed 5 years ago

guillaumegenthial commented 5 years ago

Hey @FranciscoBorges ,

Thanks for contributing!

This is a nice addition, but in practice it would be useless : most vocab that we are trying to match with the glove vectors usually have missing words (not in the glove vocab). 99% of the time, we would have to read the entire file. Also, because glove files are alphabetically ordered, if you have zoo in your vocab for instance (somewhat a likely event), you will have to read till this line in the glove file (again, here reading maybe something like 98% of the words). I prefer to deny this PR in order to keep the code simple as this addition would not yield significant boost in performance.