Open Abdelrahmanrezk opened 2 years ago
Good question I too have the same one in my mind, but as stated in the book the CLS token captures the entire context of the input hence we can use it as an embedding to represent the input. I guess you are not getting any answers to your question try to compare the model's performance when using the first and last embeddings as features :)
Information
The question or comment is about chapter:
Question or comment
I just would like to know why we use just the [CLS] to represent each tweet ? and why we use the last token is it make sense of embedding the representation of the tweet ?