Why using [CLS] in chapter 2 to represent the whole tweet ?

Abdelrahmanrezk commented 2 years ago

Information

The question or comment is about chapter:

[ ] Introduction
[x] Text Classification
[ ] Transformer Anatomy
[ ] Multilingual Named Entity Recognition
[ ] Text Generation
[ ] Summarization
[ ] Question Answering
[ ] Making Transformers Efficient in Production
[ ] Dealing with Few to No Labels
[ ] Training Transformers from Scratch
[ ] Future Directions

Question or comment

I just would like to know why we use just the [CLS] to represent each tweet ? and why we use the last token is it make sense of embedding the representation of the tweet ?

Kirushikesh commented 1 year ago

Good question I too have the same one in my mind, but as stated in the book the CLS token captures the entire context of the input hence we can use it as an embedding to represent the input. I guess you are not getting any answers to your question try to compare the model's performance when using the first and last embeddings as features :)

carlos-aguayo commented 1 year ago

See here: https://discuss.huggingface.co/t/common-practice-using-the-hidden-state-associated-with-cls-as-an-input-feature-for-a-classification-task/14003

nlp-with-transformers / notebooks

Why using [CLS] in chapter 2 to represent the whole tweet ? #38

Information

Question or comment