How do I use word embeddings?

Hi @steveguang, sorry for the delay. So, if you have a sequence of word embeddings and want to compute a sentence embedding there are a few things you can do

You can average the sequence of word embeddings
You can use an LSTM or a bi-LSTM
You can use a bunch of CNNs with different filter sizes

Generally, I would advise for using the simple averaging strategy as it costs nothing and is generally good enough. But if you have enough training data you may try adding LSTM or CNN layers on top as these would require training from scratch.

However, since you are using a variant of BERT, if you format your input as [CLS] token_1 token_2 ... [SEP] then you can use the pooler layer output (which transforms the output embedding of the [CLS] token) as a feature vector for your entire input sequence.

My general intuition would be that using the pooler output would work better for classification tasks and using the average of the token embeddings would work better for similarity tasks. But you'll need to check and see 😊

helboukkouri / character-bert

How do I use word embeddings? #13