Is there a specific technique used to ensure that the sizes of static target and source embeddings and XLMR target and source embeddings are equal, or is it simply a matter of trimming?
Thanks for your question. The steps we take to obtain static word embeddings and contextual representations are as follows:
We use WikiExtractor to extract plain text from Wikipedia dumps and use these corpora to train static word embeddings by using fastText.
We restrict the dictionary induction process to the 20,000 most frequent words, corresponding to the top 20,000 fastText embeddings. (you may want to have a look at it too) -> https://arxiv.org/abs/2805.06297
For each of the 20,000 words, we randomly sample k sentences containing the word from Wiki corpora, and then extract its contextual representation as described in our paper, so the sizes of static embeddings and contextual representations are equal.
Is there a specific technique used to ensure that the sizes of static target and source embeddings and XLMR target and source embeddings are equal, or is it simply a matter of trimming?