raoyongming / DynamicViT

[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
https://dynamicvit.ivg-research.xyz/
MIT License
551 stars 69 forks source link

ViT on CIFAR-100 #39

Closed King4819 closed 9 months ago

King4819 commented 10 months ago

Excuse me, I want to ask is it able to use ViT on CIFAR-100 in your repo? Since ImageNet training takes to long

I saw your code has CIFAR part, but seems like all of your ViTs have nb_classes=1000

Thank you!

raoyongming commented 10 months ago

Hi @King4819, we didn't try our model on CIFAR. Our model is designed to accelerate ViT inference by reducing redundant patches. Since the images in CIFAR are too small, I would still recommend trying our method on ImageNet. Maybe you can choose a subset of ImageNet (e.g., a 100-class subset, 10% data) for faster experiments.

King4819 commented 10 months ago

Thanks for you reply, but is it able to change the output nb_classes to 100?

Since I have tried in the args argument, but the output classes is still 1000

raoyongming commented 10 months ago

To use the pre-trained classifier, you should not change the class id of ImageNet but use a subset of them.

King4819 commented 10 months ago

@raoyongming Sorry, my question is that if I want to use ImageNet-100 dataset (100 classes), where can I change the classifier output nb_classes? Since your ViT model seems like all have nb_classes=1000

raoyongming commented 9 months ago

@King4819, you should keep the class id in ImageNet unchanged. For example, if you select the 200th class in the original 1k classes to your new ImageNet-100, the class id of the class should also be 200-1=199. To use the pretrained classifiers, you should strictly align your class ids with the original ones. You'd better not to change the nb_classes and other model configurations to correctly use the pretrained weights.

King4819 commented 9 months ago

@raoyongming Thanks for you reply, my original thoughts is to change the classifier output nb_classes and then fine-tune the classifier part, but I understand your point.

Sorry, I have another question: what is the naming "VisionTransformerDiffPruning" about? I can't understand the meaning of "DiffPruning" .

Thank!

raoyongming commented 9 months ago

DiffPruning means differentiable pruning here.