xxxnell / how-do-vits-work

(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"
https://arxiv.org/abs/2202.06709
Apache License 2.0
798 stars 77 forks source link

relative log magnitude #39

Closed zhenyuan1234 closed 1 year ago

zhenyuan1234 commented 1 year ago

hello! How is the relative log magnitude calculated? Is the first layer subtracted from the feature map of each layer?

xxxnell commented 1 year ago

Hello @zhenyuan1234,

Thank you for your comment. The relative log magnitude refers to the difference between the log amplitude at a normalized frequency of 0.0π (which is the center or low-frequency) and the log amplitude at a specific frequency point, such as 1.0π (representing the boundary or high frequency). Therefore, we do not use information from another layer when we calculate the relative amplitude. Please also refer to the fourier_analysis.ipynb (Colab notebook):

latent = fourier(latent)  # latent.shape is (b, c, h, w)
latent = shift(latent).mean(dim=(0, 1))
latent = latent.diag()[int(h/2):]  # only use the half-diagonal components
latent = latent - latent[0]  # visualize 'relative' log amplitudes 
                             # (i.e., low-freq amp - high freq amp)