oborchers / Fast_Sentence_Embeddings

Compute Sentence Embeddings Fast!
GNU General Public License v3.0
616 stars 83 forks source link

from the Results, CBOW is best, therefore why use SIF? #66

Closed MrRace closed 2 years ago

MrRace commented 2 years ago

As far as know, SIF(smooth inverse frequency) just modify the vectors trained by Word2Vec、Glove or other word vector methods. Therefore why CBOW is best in the Results? If CBOW is best, why need SIF?

image

oborchers commented 2 years ago

@MrRace: This fully depends. Paranmt has a very small vocabulary, whereas you can have a much larger vocabulary with fasttext. So it depends on usecase and data.