Open howardyclo opened 3 years ago
Motivation. ImageNet label is noisy: An image may contain multiple objects but is annotated with image-level single class label.
Intuition. A model trained with the single-label cross-entropy loss tends to predict multi-label outputs when training label is noisy.
Relabel. They propose to use a strong image classifier that trained on extra data (super-ImageNet scale, JFT-300M, InstagramNet-1B) + fine-tuned on ImageNet, to generate multi-labels for ImageNet images. Obtain pixel-wise multi-label predictions before the final global pooling layer (offline preprocessing once).
Novel training scheme -- LabelPooling. Given a random crop during training, pool multi-labels and their corresponding probability scores from the crop region of the relabeled image.
Results. Trained on relabeled images with multi-and-localized labels can obtains 78.9% accuracy with ResNet-50 (+1.4% improvement over baseline trained with original labels), and can be boosted to 80.2% with CutMix, new SoTA on ImageNet of ResNet-50.
They tried diverse architectures:
Once label map is pre-computed, we can train a new network by the following procedure:
Metadata