PyTorch implementation for Vision Transformer[Dosovitskiy, A.(ICLR'21)] modified to obtain over 90% accuracy(, I know, which is easily reached using CNN-based architectures.) FROM SCRATCH on CIFAR-10 with small number of parameters (= 6.3M, originally ViT-B has 86M). If there is some problem, let me know kindly :) Any suggestions are welcomed!
Install packages
$git clone https://github.com/omihub777/ViT-CIFAR.git
$cd ViT-CIFAR/
$bash setup.sh
Train ViT on CIFAR-10
$python main.py --dataset c10 --label-smoothing --autoaugment
CSVLogger
.)$python main.py --api-key [YOUR COMET API KEY] --dataset c10
Dataset | Acc.(%) | Time(hh:mm:ss) |
---|---|---|
CIFAR-10 | 90.92 | 02:14:22 |
CIFAR-100 | 66.54 | 02:14:17 |
SVHN | 97.31 | 03:24:23 |
Accucary
Loss
Accuracy
Loss
Accuracy
Loss
Param | Value |
---|---|
Epoch | 200 |
Batch Size | 128 |
Optimizer | Adam |
Weight Decay | 5e-5 |
LR Scheduler | Cosine |
(Init LR, Last LR) | (1e-3, 1e-5) |
Warmup | 5 epochs |
Dropout | 0.0 |
AutoAugment | True |
Label Smoothing | 0.1 |
Heads | 12 |
Layers | 7 |
Hidden | 384 |
MLP Hidden | 384 |
"TransGAN: Two Transformers Can Make One Strong GAN", Jiang, Y., et. al, (2021)