koaning / embetter

just a bunch of useful embeddings
https://koaning.github.io/embetter/
MIT License
465 stars 15 forks source link

Support for word embeddings #26

Closed stephantul closed 1 year ago

stephantul commented 1 year ago

Hi,

Do you think it would be a good idea to add support for static word embeddings (word2vec, glove, etc.)? The embedder would need:

The second and third parameters could easily have sensible defaults, of course. If you think it's a good idea, I can do the PR somewhere next week.

Stéphan

koaning commented 1 year ago

The whatlies library supports that, which I've also written. Downside of supporting everything is that many of those models are trained on dated datasets and that pooling word embeddings for longer sentences diminishes the information.

stephantul commented 1 year ago

Ok, cool, I guess that means it's a no go. I didn't know whatlies contained static word embedders, nice.

koaning commented 1 year ago

In a way, whatlies is the precursor to this package. But the goal for embetter is also to embed more than just text and to also keep it relatively simple by mainly focusing on sensible defaults.