raoyongming / DynamicViT

[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
https://dynamicvit.ivg-research.xyz/
MIT License
551 stars 69 forks source link

Can't reproduce the accuracy of pre-trained models #17

Closed xiyiyia closed 2 years ago

xiyiyia commented 2 years ago

Tried arch: deit_small, deit_256 Dataset: Imagenet-1k-val File structure:

│ILSVRC2012_val/
├──val/
│  ├── 1(image label)
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── 2(image label)
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......

When I ran python3 infer.py --data-path /home/ubuntu/datasets/ILSVRC2012_val/ --arch deit_small --model-path /home/ubuntu/models/dynamic-vit_384_r0.7.pth --base_rate 0.7, the result is Acc@1 0.080 Acc@5 0.582.

The filenames ( image label ) are decided by ILSVRC2012_validation_ground_truth.txt in the development kit. Is this problem due to the wrong file name, which causes the model to predict a different label than the real one? Should I modify the filename to WNID? But the Val dataset has no WNID, how could I confirm that?

Thanks

raoyongming commented 2 years ago

Hi, thanks for your interest in our work. Your accuracy is close to random prediction (1/1000 for top-1 and 5/1000 for top-5), so I think it is likely that you are using the incorrect code or dataset. The names of folders in ImageNet val should be wordnet IDs like n01440764. The class id predicted by our networks should be consistent with the argsort results of these wordnet IDs. I think you can follow the instructions here to prepare the ImageNet dataset.

xiyiyia commented 2 years ago

Hi, thanks for your interest in our work. Your accuracy is close to random prediction (1/1000 for top-1 and 5/1000 for top-5), so I think it is likely that you are using the incorrect code or dataset. The names of folders in ImageNet val should be wordnet IDs like n01440764. The class id predicted by our networks should be consistent with the argsort results of these wordnet IDs. I think you can follow the instructions here to prepare the ImageNet dataset.

Thanks a lot!