microsoft / CvT

This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.
MIT License
533 stars 120 forks source link

About the pretrained model #14

Closed Y1YU closed 2 years ago

Y1YU commented 2 years ago

I use the pretrained model CvT-13-224x224-IN-1k.pth, and test on Imagenet as the guide, but the result is terrible "TEST: Loss 8.5690 Error@1 98.926% Error@5 97.844% Accuracy@1 1.074% Accuracy@5 2.156%"

Does anyone else have tested? Why is it?

Y1YU commented 2 years ago

I use the jpeg image, it is different from the data type, but I think it doesn't matter.

Y1YU commented 2 years ago

Well, I find the problem. It's because the order of the classes folders.

hyh807 commented 1 year ago

Hi Y1YU,

I have the same issue. My ImageNet folder hierarchy as following:

  |-imagenet
    |-train
    | |-n01440764
    | | |-ILSVRC2012_val_00000341.JPEG
    | | |-ILSVRC2012_val_00009603.JPEG
    | | |-...
    | |-n01824575
    | | |-ILSVRC2012_val_00000341.JPEG
    | | |-ILSVRC2012_val_00009603.JPEG
    | | |-...
    | |-...
    |-val
    | |-n01440764
    | | |-ILSVRC2012_val_00000341.JPEG
    | | |-ILSVRC2012_val_00009603.JPEG
    | | |-...
    | |-n01824575
    | | |-ILSVRC2012_val_00000341.JPEG
    | | |-ILSVRC2012_val_00009603.JPEG
    | | |-...
    | |-...

which is different from the hierarchy show in this repo.

Would you please kindly let me know how you solve this issue? Thank you.

hyh807 commented 1 year ago

Problem solved, I change the imagenet path to 'imagenet/val'. Then I can inference correctly.