Closed namespace-Pt closed 2 years ago
Hi @namespace-Pt ,
Thanks for the question. You are right that each word can have multiple contextualized embeddings, and they can be mapped to different clusters during the clustering step in our algorithm. However, when deriving the final results, we take the average of the latent contextualized embeddings as the (context-free) representation for each word, which is then used for computing the topic-word distribution.
I hope this helps. Please let me know if anything remains unclear.
Best, Yu
Ok, I got it, thank you.
So the average step is like in the paper ''Tired of Topic Models Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too''? Did you reweight the averaged token embeddings? Also, how do you deal with subwords?
So the average step is like in the paper ''Tired of Topic Models Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too''?
Yes. The difference is that TopClus uses contextualized embeddings (instead of context-free embeddings as in that paper) for clustering.
Did you reweight the averaged token embeddings?
No, we do not have any reweighing steps.
Also, how do you deal with subwords?
We remove subwords from the vocabulary when deriving the final results, so our results will not contain subwords.
Thank you.
I like your paper but I think it's confusing that how to tackle multiple embeddings of the same word/token. I wonder is there any chance that different embeddings of the same word are mapped to different clusters and all of them are quite close to the cluster center in the spherical space. How do you deal with that?