Closed kunsaram01 closed 2 years ago
Hi @kunsaram01 ,
Sorry for the confusion we've made. Actually the x_glb
in line 245 is no use, and a new x_glb
is generated at line 253. So there's no mistake in the computation of global tokens.
We have removed this line through a new commit.
Thanks for your detailed check.
@hunto, thanks for quick response!
https://github.com/hunto/image_classification_sota/blob/36539b63cc8b851bd3fc93251bba60528813bb36/lib/models/lightvit.py#L245
Hello, When you split global token and image token from the input x, shouldn't it be split into [B, :NT, C] and [B, NT: , C]? But the code in the forward_feature function, it is split from the channel dim for
x_glb
.So, assuming x has the shape of [1,3134,64], then global token shape will be [1,8,64] and image token shape will be [1,3136,64]. Please let me know if I am wrong.