tahmid0007 / VisualTransformers

A Pytorch Implementation of the following paper "Visual Transformers: Token-based Image Representation and Processing for Computer Vision"
181 stars 29 forks source link

Classification tokens #9

Open LexaNagiBator228 opened 3 years ago

LexaNagiBator228 commented 3 years ago

Screenshot from 2021-03-09 13-25-34

In 228, why do you use only first token for classification?

ssram50 commented 3 years ago

Only the first token is used for classification. Please refer to the paper.

LexaNagiBator228 commented 3 years ago

Only the first token is used for classification. Please refer to the paper.

Screenshot_2021-03-15 2006 03677 pdf

I believe in the paper they use average polling. Of course using only 1st token still might provide you with great results, but using information from all 16 tokens should be better