Closed XpracticeYSKM closed 7 months ago
And can you provide specific scirpts for reproducing? I don't find the class token position, number of context tokens and CSC in implementation details of this paper.
For 1-shot ImageNet, you should obtain 1000 sample for training actually. The image encoder extracted the same augmented image for 10 times, thereby resulting in 10000. But you tuple demonstrate is 9920. It is strange. The class token position, number of context tokens and CSC are not required. It does not be utilized in our work. It is not clean now. You can remove it in scripts and train.py. I will remove it when I am free.
I find that _get_base_image_features
obtain the images from train_loader_x
, and you will run 10 epoch on train_loader_x
to obtain the image feature list. In your config, batch size is set to 256 so that len(train_loader_x)
is 3. So you will obtain 10*3*256 image features.
Please note that, for data loader, if the lens of dataset is 1000 and batchsize is 256, at the last time, it will load 1000-256-256 images because we do not active its fill pattern。
So it may caused by data loader. When i step into build_data_loader
in dassl, drop_last
in train_loader_x
is True which is not consistent with your statement. Can you share dassl source code you used?
I manually set drop_last=False
and run this scripts, this problem will be solved but meet new issue.
Original Traceback (most recent call last):
File "/anaconda3/envs/dassl/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/anaconda3/envs/dassl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/GraphAdapter/trainers/baseclip_graph_v1.py", line 326, in forward
text_features, image_features = self.graph_learner(image_features)
File "/anaconda3/envs/dassl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/GraphAdapter/trainers/baseclip_graph_v1.py", line 224, in forward
graph_o_tt = self.GCN_tt(feat_tt, edge_tt)
File "/anaconda3/envs/dassl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/GraphAdapter/trainers/baseclip_graph_v1.py", line 144, in forward
pre_sup = torch.matmul(x, self.gcn_weights) # [m+1, 1000, 1024]
RuntimeError: mat1 dim 1 must match mat2 dim
x shape:torch.Size([1000, 251, 1024]), self.gcn_weights:torch.Size([512, 512])
. I don't modify any code expect for drop_last
. This seems strange.
The reason for that is: the dimension for ResNet-50 and others are different. For ResNet-50(RN50), it utilize 1024, and for ViT-B, it utilize 512, you can adjust it manually by revising the gcn_weights. It follows the CoOp/CoCoOp setting. Thanks
Thanks! I reproduce the result of Imagenet on 16-shot setting, but i only get 63.7% acc which is not consistent with 65.7% in your paper. I have set hyper-parameters according to this paper. any ideas?
It is our reproduced results:
You can utilize another config "adamw" in configs we left. We guess it is caused by unstable training with adam optimizer and the randomness from the machine and environment.
This one "rn50_ep20_b256_lr_0_001_adamw.yaml". It is more stable.
Thanks! And can you provide the yaml config for ViT?
Of Cause, you can substitute the 'RN50' in config with 'ViT-B/16' or 'ViT-B/32' directly, and change the dimension of GraphAdapter with 512, which is easy to achieve this.
Thanks for your patience! I will close this issue!
When I tried to reproduce the 1-shot result of imagenet, it seems to have some bugs. I don't modify any code, can you give some advice? Thanks.
issue: RuntimeError: shape '[1102, 9]' is invalid for input of size 9920
scripts: