kaist-dmlab / Prune4Rel

MIT License
26 stars 17 forks source link

Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy (NeurIPS 2023, PDF)

by Dongmin Park1, Seola Choi1, Doyoung Kim1, Hwanjun Song1, 2, Jae-Gil Lee1.

1 KAIST, 2 Amazon AWS AI

Brief Summary

How to run

Prune4ReL

Please follow Table 7 for hyperparameters. For CIFAR-10N dataset with SOP+ as Re-labeling model,

python3 main_label_noise.py --gpu 0 --model 'PreActResNet18' --robust-learner 'SOP' -rc 0.9 -rb 0.1 \
          --dataset CIFAR10 --noise-type $noise_type --n-class 10 --lr-u 10 -se 10 --epochs 300 \
          --fraction $fraction --selection Prune4Rel --save-log True \
          --metric cossim --uncertainty LeastConfidence --tau 0.975 --eta 1 --balance True

More detailed scripts for other datasets can be found in scripts/ folder.

Data Pruning Baselines: Uniform, SmallLoss, Margin, Forgetting, GraNd, Moderate, etc

Basically, the script is similar to that of Prune4ReL. For example,

python3 main_label_noise.py --gpu 0 --model 'PreActResNet18' --robust-learner 'SOP' -rc 0.9 -rb 0.1 \
          --dataset CIFAR10 --noise-type $noise_type --n-class 10 --lr-u 10 -se 10 --epochs 300 \
          --fraction $fraction --selection *$pruning_algorithm* --save-log True \

where *$pruning_algorithm* must be from [Uniform, SmallLoss, Uncertainty, Forgetting, GraNd, ...], each of which is a class name in deep_core/methods/~~.py.

Citation

@article{park2023robust,
  title={Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy},
  author={Park, Dongmin and Choi, Seola and Kim, Doyoung and Song, Hwanjun and Lee, Jae-Gil},
  journal={NeurIPS 2023},
  year={2023}
}

References

We thank the DeepCore library, on which we built most of our repo. Hope our project helps extend the open-source library of data pruning.