omihub777 / ViT-CIFAR

PyTorch implementation for Vision Transformer[Dosovitskiy, A.(ICLR'21)] modified to obtain over 90% accuracy FROM SCRATCH on CIFAR-10 with small number of parameters (= 6.3M, originally ViT-B has 86M).
MIT License
170 stars 26 forks source link

ViT-CIFAR

PyTorch implementation for Vision Transformer[Dosovitskiy, A.(ICLR'21)] modified to obtain over 90% accuracy(, I know, which is easily reached using CNN-based architectures.) FROM SCRATCH on CIFAR-10 with small number of parameters (= 6.3M, originally ViT-B has 86M). If there is some problem, let me know kindly :) Any suggestions are welcomed!

1. Quick Start

  1. Install packages

    $git clone https://github.com/omihub777/ViT-CIFAR.git
    $cd ViT-CIFAR/
    $bash setup.sh
  2. Train ViT on CIFAR-10

$python main.py --dataset c10 --label-smoothing --autoaugment
$python main.py --api-key [YOUR COMET API KEY] --dataset c10

2. Results

Dataset Acc.(%) Time(hh:mm:ss)
CIFAR-10 90.92 02:14:22
CIFAR-100 66.54 02:14:17
SVHN 97.31 03:24:23

2.1 CIFAR-10

2.2 CIFAR-100

2.3 SVHN

3. Hyperparams

Param Value
Epoch 200
Batch Size 128
Optimizer Adam
Weight Decay 5e-5
LR Scheduler Cosine
(Init LR, Last LR) (1e-3, 1e-5)
Warmup 5 epochs
Dropout 0.0
AutoAugment True
Label Smoothing 0.1
Heads 12
Layers 7
Hidden 384
MLP Hidden 384

4. Further improvements

5. Ref.