Closed stephantul closed 1 year ago
The whatlies library supports that, which I've also written. Downside of supporting everything is that many of those models are trained on dated datasets and that pooling word embeddings for longer sentences diminishes the information.
Ok, cool, I guess that means it's a no go. I didn't know whatlies contained static word embedders, nice.
In a way, whatlies is the precursor to this package. But the goal for embetter is also to embed more than just text and to also keep it relatively simple by mainly focusing on sensible defaults.
Hi,
Do you think it would be a good idea to add support for static word embeddings (word2vec, glove, etc.)? The embedder would need:
glove.6b.100d.txt
)TfIdfVectorizer
splits words).The second and third parameters could easily have sensible defaults, of course. If you think it's a good idea, I can do the PR somewhere next week.
Stéphan