ra1ph2 / Vision-Transformer

Implementation of Vision Transformer from scratch and performance compared to standard CNNs (ResNets) and pre-trained ViT on CIFAR10 and CIFAR100.
103 stars 9 forks source link

Mean attention distance #1

Open nakashima-kodai opened 3 years ago

nakashima-kodai commented 3 years ago

Hi,

Thanks for sharing the code! The code for visualization is especially helpful. The authors of ViT also compute the mean attention distance. Are there any plans to support mean attention distance in this repository?

Thanks.

ra1ph2 commented 3 years ago

Hi,

Really sorry for the late reply. I had tried to incorporate mean attention distance but could not come up with methodology to do so at that time. Currently I am preoccupied with other work from my intern and university, so would not be able to add this in near future. If you have done it and can create a pull request, then it would be great.