howardyclo / papernotes

My personal notes and surveys on DL, CV and NLP papers.
128 stars 6 forks source link

Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels #76

Open howardyclo opened 3 years ago

howardyclo commented 3 years ago

Metadata

howardyclo commented 3 years ago

Highlights

Related work: Better evaluation protocol for ImageNet

The above works have identified 3 categories for the erroneous single labels

  1. An image contains multiple objects
  2. Exists multiple labels that are synonymous or hierarchically including the other
  3. Inherent ambiguity in an image makes multiple labels plausible.

Difference from this work

  1. This work also refines training set while previous work only refine validation set.
  2. This work correct labels while previous work remove erroneous labels.

Related work: Distillation (I hand-picked some by their practical usefulness in my opinion)

Difference from this work

  1. Previous work did not consider a strong, SoTA network as a teacher.
  2. Distillation approach requires forwarding teacher on-the-fly, leading to heavy computation.
howardyclo commented 3 years ago

Relabeling Details

Network architecture modification for generating label map

Generating label map using different architectures

They tried diverse architectures:

  1. SoTA EfficientNet-{B1,B3,B5,B7,B8}
  2. EfficientNet-L2 trained with JFT-300M
  3. ResNeXT-101_32x{32d, 48d} trained with InstagramNet-1B And train ResNet-50 with the above label maps from diverse classifiers. Finally label map generated from EfficientNet-L2 is chosen due to its best quality for obtaining the final best accuracy. (Can we ensemble these label maps?)

Important Findings

howardyclo commented 3 years ago

Training with LabelPooling

Once label map is pre-computed, we can train a new network by the following procedure:

  1. Load image & label map (15x15xC)
  2. Augmented image = Random crop image (with a bounding box [x, y, w, h]) and resize to (224x224)
  3. New target = ROIAlign(label map, bounding box) [h, w, C] + global pooling [1, 1, C] + softmax
  4. Train model with <Augmented image, New target> with cross-entropy loss

Discussion on Design Choices

  1. Isn't 15x15 label map too small? Due to expensive storage consumption for ImageNet.
  2. Why not use knowledge distillation? Due to expensive training time for ImageNet.
  3. Can new network also be trained with local labels instead of global ones (same as FCN in relabeling)?