xxxnell / how-do-vits-work

(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"
https://arxiv.org/abs/2202.06709
Apache License 2.0
798 stars 77 forks source link

Image #18

Closed 123456789-qwer closed 2 years ago

123456789-qwer commented 2 years ago

https://user-images.githubusercontent.com/930317/158025258-e9a5a454-99de-4d22-bc93-b217cdf06abb.jpeg Where can I find other pictures?

xxxnell commented 2 years ago

Hi @123456789-qwer! The file name of this image is ILSVRC2012_val_00046145 and it is from the kit fox class of the ImageNet validation dataset. I found this image at MAE. If you want to get more rigorous results in the Fourier analysis experiment, I recommend repeating the experiment on multiple images (eg from the ImageNet validation dataset) and averaging the results. Or you can simply borrow some example images from knowyourdata, e.g.,

xs = [
    "https://user-images.githubusercontent.com/930317/158025258-e9a5a454-99de-4d22-bc93-b217cdf06abb.jpeg",
    "https://knowyourdata-tfds.withgoogle.com/serve_image?&id=ILSVRC2012_val_00042330.JPEG&dataset=imagenet2012",
    "https://knowyourdata-tfds.withgoogle.com/serve_image?&id=ILSVRC2012_val_00020406.JPEG&dataset=imagenet2012",
]

To prepare the ImageNet dataset, you can refer to the following docs for example: "Download, pre-process, and upload the ImageNet dataset" and "Preparation of ImageNet (ILSVRC2012)". Please feel free to leave a comment if you have any problems.