hunto / image_classification_sota

Training ImageNet / CIFAR models with sota strategies and fancy techniques such as ViT, KD, Rep, etc.
Apache License 2.0
81 stars 14 forks source link

split global token and image token #9

Closed kunsaram01 closed 2 years ago

kunsaram01 commented 2 years ago

https://github.com/hunto/image_classification_sota/blob/36539b63cc8b851bd3fc93251bba60528813bb36/lib/models/lightvit.py#L245

Hello, When you split global token and image token from the input x, shouldn't it be split into [B, :NT, C] and [B, NT: , C]? But the code in the forward_feature function, it is split from the channel dim for x_glb.

So, assuming x has the shape of [1,3134,64], then global token shape will be [1,8,64] and image token shape will be [1,3136,64]. Please let me know if I am wrong.

hunto commented 2 years ago

Hi @kunsaram01 ,

Sorry for the confusion we've made. Actually the x_glb in line 245 is no use, and a new x_glb is generated at line 253. So there's no mistake in the computation of global tokens.

We have removed this line through a new commit.

Thanks for your detailed check.

kunsaram01 commented 2 years ago

@hunto, thanks for quick response!