nlp-with-transformers / notebooks

Jupyter notebooks for the Natural Language Processing with Transformers book
https://transformersbook.com/
Apache License 2.0
3.7k stars 1.13k forks source link

Why using [CLS] in chapter 2 to represent the whole tweet ? #38

Open Abdelrahmanrezk opened 2 years ago

Abdelrahmanrezk commented 2 years ago

Information

The question or comment is about chapter:

Question or comment

I just would like to know why we use just the [CLS] to represent each tweet ? and why we use the last token is it make sense of embedding the representation of the tweet ?

Kirushikesh commented 1 year ago

Good question I too have the same one in my mind, but as stated in the book the CLS token captures the entire context of the input hence we can use it as an embedding to represent the input. I guess you are not getting any answers to your question try to compare the model's performance when using the first and last embeddings as features :)

carlos-aguayo commented 1 year ago

See here: https://discuss.huggingface.co/t/common-practice-using-the-hidden-state-associated-with-cls-as-an-input-feature-for-a-classification-task/14003