helboukkouri / character-bert

Main repository for "CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters"
Apache License 2.0
195 stars 47 forks source link

How do I use word embeddings? #13

Closed steveguang closed 3 years ago

steveguang commented 3 years ago

Hi, @helboukkouri , So from the example, I get embeddings for each token. I am thinking to get the representation of a whole sentence for downstream tasks, such as sentence similarity, classification, etc. My idea is to use word embeddings directly with a lstm layer to represent a whole sentence. Do you think this is the right way? Thanks!

helboukkouri commented 3 years ago

Hi @steveguang, sorry for the delay. So, if you have a sequence of word embeddings and want to compute a sentence embedding there are a few things you can do

Generally, I would advise for using the simple averaging strategy as it costs nothing and is generally good enough. But if you have enough training data you may try adding LSTM or CNN layers on top as these would require training from scratch.

However, since you are using a variant of BERT, if you format your input as [CLS] token_1 token_2 ... [SEP] then you can use the pooler layer output (which transforms the output embedding of the [CLS] token) as a feature vector for your entire input sequence.

My general intuition would be that using the pooler output would work better for classification tasks and using the average of the token embeddings would work better for similarity tasks. But you'll need to check and see 😊