This is the code for the paper "Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup" accepted at ICML'20 (paper, talk, blog). Some parts of the codes are borrowed from manifold mixup (link).
@inproceedings{kimICML20,
title= {Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup},
author = {Kim, Jang-Hyun and Choo, Wonho and Song, Hyun Oh},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2020}
}
./imagenet
). --mp [n_procs]
in the command. This code has been tested with
python 3.6.8
pytorch 1.1.0
torchvision 0.3.0
gco-wrapper (https://github.com/Borda/pyGCO)
matplotlib 2.1.0
numpy 1.13.3
six 1.12.0
We provide a checkpoint of adversarial Puzzle Mix with PreActResNet18 trained on CIFAR-100. The model has 80.34% clean test accuracy and 42.89% accuracy against FGSM with 8/255 l-infinity epsilon-ball.
CIFAR-100 dataset will automatically be downloaded at [data_path]
. To test corruption robusetness, download the dataset at here. Note that the corruption dataset should be downloaded at [data_path]
with the folder name of Cifar100-C (for CIFAR100) and tiny-imagenet-200-C (for Tiny-ImageNet).
To test the model, run:
cd checkpoint
python test_robust.py --ckpt preactresnet18 --datapath [data_path]
The other models trained with Puzzle Mix can be also downloaded:
Dataset | Model | Method | Description | Model file |
---|---|---|---|---|
CIFAR-100 | WRN-28-10 | Puzzle Mix [Table 2] | 84.0% (top-1) | drive |
CIFAR-100 | WRN-28-10 | Puzzle Mix + Adv training [Table 2] | 84.0% (Top-1) / 52.8% (FGSM) | drive |
CIFAR-100 | WRN-28-10 | Puzzle Mix + Augmentation [Table 7] | 83.7% (Top-1) / 71.1% (CIFAR100-C) | drive |
CIFAR-100 | PreActResNet-18 | Puzzle Mix [Table 3] | 80.4% (Top-1) | drive |
CIFAR-100 | PreActResNet-18 | Puzzle Mix + Adv training [Table 3] | 80.2% (Top-1) / 42.9% (FGSM) | drive |
Tiny-ImageNet | PreActResNet-18 | Puzzle Mix [Table 4] | 63.9% (Top-1) | drive |
Also, we provide a jupyter notebook, Visualization.ipynb, by which users can visualize Puzzle Mix results with image samples.
Detailed descriptions of arguments are provided in main.py
. Below are some of the examples for reproducing the experimental results.
To test with ImageNet, please refer to ./imagenet_fast
or ./imagenet
(for 300 epochs training). ./imagenet
contains the most concise version of Puzzle Mix training code.
Dataset will be downloaded at [data_path]
and the results will be saved at [save_path]
. If you want to run codes without saving results, please set --log_off True
.
To reproduce Puzzle Mix with PreActResNet18 for 1200 epochs, run:
python main.py --dataset cifar100 --data_dir [data_path] --root_dir [save_path] --labels_per_class 500 --arch preactresnet18 --learning_rate 0.1 --momentum 0.9 --decay 0.0001 --epochs 1200 --schedule 400 800 --gammas 0.1 0.1 --train mixup --mixup_alpha 1.0 --graph True --n_labels 3 --eta 0.2 --beta 1.2 --gamma 0.5 --neigh_size 4 --transport True --t_size 4 --t_eps 0.8
To reproduce Puzzle Mix with PreActResNet18 for 600 epochs, run:
python main.py --dataset cifar100 --data_dir [data_path] --root_dir [save_path] --labels_per_class 500 --arch preactresnet18 --learning_rate 0.2 --momentum 0.9 --decay 0.0001 --epochs 600 --schedule 350 500 --gammas 0.1 0.1 --train mixup --mixup_alpha 1.0 --graph True --n_labels 3 --eta 0.2 --beta 1.2 --gamma 0.5 --neigh_size 4 --transport True --t_size 4 --t_eps 0.8
To reproduce adversarial Puzzle Mix with PreActResNet18 for 1200 epochs, run:
python main.py --dataset cifar100 --data_dir [data_path] --root_dir [save_path] --labels_per_class 500 --arch preactresnet18 --learning_rate 0.1 --momentum 0.9 --decay 0.0001 --epochs 1200 --schedule 400 800 --gammas 0.1 0.1 --train mixup --mixup_alpha 1.0 --graph True --n_labels 3 --eta 0.2 --beta 1.2 --gamma 0.5 --neigh_size 4 --transport True --t_size 4 --t_eps 0.8 --adv_p 0.1 --adv_eps 10.0
Below are commands to reproduce baselines.
To reproduce Vanilla with PreActResNet18 for 1200 epochs, run:
python main.py --dataset cifar100 --data_dir [data_path] --root_dir [save_path] --labels_per_class 500 --arch preactresnet18 --learning_rate 0.1 --momentum 0.9 --decay 0.0001 --epochs 1200 --schedule 400 800 --gammas 0.1 0.1 --train vanilla
To reproduce input mixup with PreActResNet18 for 1200 epochs, run:
python main.py --dataset cifar100 --data_dir [data_path] --root_dir [save_path] --labels_per_class 500 --arch preactresnet18 --learning_rate 0.1 --momentum 0.9 --decay 0.0001 --epochs 1200 --schedule 400 800 --gammas 0.1 0.1 --train mixup --mixup_alpha 1.0
To reproduce manifold mixup with PreActResNet18 for 1200 epochs, run:
python main.py --dataset cifar100 --data_dir [data_path] --root_dir [save_path] --labels_per_class 500 --arch preactresnet18 --learning_rate 0.1 --momentum 0.9 --decay 0.0001 --epochs 1200 --schedule 400 800 --gammas 0.1 0.1 --train mixup_hidden --mixup_alpha 2.0
To reproduce CutMix with PreActResNet18 for 1200 epochs, run:
python main.py --dataset cifar100 --data_dir [data_path] --root_dir [save_path] --labels_per_class 500 --arch preactresnet18 --learning_rate 0.1 --momentum 0.9 --decay 0.0001 --epochs 1200 --schedule 400 800 --gammas 0.1 0.1 --train mixup --mixup_alpha 1.0 --box True
For WRN28_10 with 400 epoch, set --arch wrn28_10
, --epochs 400
, and --schedule 200 300
. For WRN28_10 with 200 epoch, set --epochs 200
, --schedule 120 170
, and --learning_rate 0.2
.
The following process is forked from (link).
python load_data.py
To reproduce Puzzle Mix with PreActResNet18 for 1200 epochs, run:
python main.py --dataset tiny-imagenet-200 --data_dir [data_path] --root_dir [save_path] --labels_per_class 500 --arch preactresnet18 --learning_rate 0.2 --momentum 0.9 --decay 0.0001 --epochs 1200 --schedule 600 900 --gammas 0.1 0.1 --train mixup --mixup_alpha 1.0 --graph True --n_labels 3 --eta 0.2 --beta 1.2 --gamma 0.5 --neigh_size 4 --transport True --t_eps 0.8 --clean_lam 1
To reproduce Puzzle Mix with PreActResNet18 for 600 epochs, run:
python main.py --dataset tiny-imagenet-200 --data_dir [data_path] --root_dir [save_path] --labels_per_class 500 --arch preactresnet18 --learning_rate 0.2 --momentum 0.9 --decay 0.0001 --epochs 600 --schedule 300 450 --gammas 0.1 0.1 --train mixup --mixup_alpha 1.0 --graph True --n_labels 3 --eta 0.2 --beta 1.2 --gamma 0.5 --neigh_size 4 --transport True --t_eps 0.8 --clean_lam 1
To reproduce adversarial Puzzle Mix with PreActResNet18 for 1200 epochs, run:
python main.py --dataset tiny-imagenet-200 --data_dir [data_path] --root_dir [save_path] --labels_per_class 500 --arch preactresnet18 --learning_rate 0.2 --momentum 0.9 --decay 0.0001 --epochs 1200 --schedule 600 900 --gammas 0.1 0.1 --train mixup --mixup_alpha 1.0 --graph True --n_labels 3 --eta 0.2 --beta 1.2 --gamma 0.5 --neigh_size 4 --transport True --t_eps 0.8 --adv_p 0.15 --adv_eps 10.0 --clean_lam 1
To reproduce adversarial Puzzle Mix with PreActResNet18 for 600 epochs, run:
python main.py --dataset tiny-imagenet-200 --data_dir [data_path] --root_dir [save_path] --labels_per_class 500 --arch preactresnet18 --learning_rate 0.2 --momentum 0.9 --decay 0.0001 --epochs 600 --schedule 300 450 --gammas 0.1 0.1 --train mixup --mixup_alpha 1.0 --graph True --n_labels 3 --eta 0.2 --beta 1.2 --gamma 0.5 --neigh_size 4 --transport True --t_eps 0.8 --adv_p 0.15 --adv_eps 10.0 --clean_lam 1
Below are commands to reproduce baselines.
To reproduce Vanilla with PreActResNet18 for 1200 epochs, run:
python main.py --dataset tiny-imagenet-200 --data_dir [data_path] --root_dir [save_path] --labels_per_class 500 --arch preactresnet18 --learning_rate 0.2 --momentum 0.9 --decay 0.0001 --epochs 1200 --schedule 600 900 --gammas 0.1 0.1 --train vanilla
To reproduce input mixup with PreActResNet18 for 1200 epochs, run:
python main.py --dataset tiny-imagenet-200 --data_dir [data_path] --root_dir [save_path] --labels_per_class 500 --arch preactresnet18 --learning_rate 0.2 --momentum 0.9 --decay 0.0001 --epochs 1200 --schedule 600 900 --gammas 0.1 0.1 --train mixup --mixup_alpha 0.2
To reproduce manifold mixup with PreActResNet18 for 1200 epochs, run:
python main.py --dataset tiny-imagenet-200 --data_dir [data_path] --root_dir [save_path] --labels_per_class 500 --arch preactresnet18 --learning_rate 0.2 --momentum 0.9 --decay 0.0001 --epochs 1200 --schedule 600 900 --gammas 0.1 0.1 --train mixup_hidden --mixup_alpha 0.2
To reproduce CutMix with PreActResNet18 for 1200 epochs, run:
python main.py --dataset tiny-imagenet-200 --data_dir [data_path] --root_dir [save_path] --labels_per_class 500 --arch preactresnet18 --learning_rate 0.2 --momentum 0.9 --decay 0.0001 --epochs 1200 --schedule 600 900 --gammas 0.1 0.1 --train mixup --mixup_alpha 0.2 --box True
MIT License