Closed fros1y closed 4 years ago
Hi,
Thanks for the question. Are you referring to the following functions in the evaluation code? https://github.com/yumeng5/Spherical-Text-Embedding/blob/b0f88207189373d0500208ddacf46aa9c2bbd9da/cluster.py#L82
For baselines that produce document/sentence embeddings (like SIF and JoSE), we directly take their document/sentence embeddings as features for clustering/classification. The above functions (averaged word embedding) are used to produce sentence embeddings only for word embedding baselines (word2vec) that cannot naturally learn sentence representations. They are actually not used anywhere in the evaluation code (I should have deleted them to avoid confusion).
Please let me know if you have any further questions!
Best, Yu
I noticed that you are calculating sentence embedding using an average of the individual word vectors when performing clustering, etc. Did you happen to evaluate whether SIF or uSIF would be advantageous over averaging?