koaning / embetter

just a bunch of useful embeddings
https://koaning.github.io/embetter/
MIT License
469 stars 15 forks source link

Added Word2Vec #76

Closed x-tabdeveloping closed 1 year ago

x-tabdeveloping commented 1 year ago

I added Gensim word embedding models to the package, that can be used in a similar manner to spaCy vectorizers in that you can pool together embeddings in a document with the same API.

One can load pretrained models from gensim's repositories or can use custom Word2Vec or KeyedVectors instances.

x-tabdeveloping commented 1 year ago

I have removed the print statement, fixed the list annotations with Python 3.8 and added a unit test.

x-tabdeveloping commented 1 year ago

Okay model loading should work now, I also added a test for it.

koaning commented 1 year ago

Ah. My bad. I just merged the keras-nlp stuff which is causing minor merge conflicts. I think it's starting to look good, my only comment is to set the new version to 0.5.1. Going to v0.6 makes more sense if there's a bigger, maybe more breaking change.

koaning commented 1 year ago

I'll try and prep a release later today for this component and the KerasNLP stuff. It's just that there have been some changes to the docs that I'll fix personally.

Thanks for the PR!

x-tabdeveloping commented 1 year ago

Thanks for the collab, it was my pleasure :D