pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.92k stars 6.91k forks source link

Normalization for object detection #2397

Open pmeier opened 4 years ago

pmeier commented 4 years ago

Migrated from discuss.pytorch.org. Requests were made by @mattans.

📚 Documentation

The reference implementations for classification, segmentation, and video classification all use a normalization transform. In contrast, object detection does not use any normalization.

  1. Consider explaining why the pretrained detection models are the only ones that don’t require image normalization (I understand that the training set was not normalized. But again, why?)
  2. Worth mentioning that no normalization is needed. The classification, segmentation and detection pretrained models are trained on ImageNet, so one may think all of them require ImageNet normalization, when in fact only the classification and segmentation models require normalization. Perhaps it’s best to put this info in a table, since the pretrained video models also have a normalization, but different.
fmassa commented 4 years ago

Hey,

So, the issue is that we embed the normalization (and other transforms) inside the model itself, see https://github.com/pytorch/vision/blob/e212cc86b80baf1a46681442db1312ebce5a21bb/torchvision/models/detection/transform.py#L104-L105

This inconsistency is unfortunate, but was kind of necessary in order to make it easier for users to use the detection models. My thinking was that we might want at some point in the future to make all the models have data transformations inside them, as the way you normalize the inputs is tied with the pre-trained weights that we provide.

For now, I think we might want to improve the documentation to potentially clarify any confusions

mattans commented 4 years ago

OK, thanks. I also think it's worth updating the docs.

pmeier commented 4 years ago

@mattans We are happy to accept a PR for that. Would you like to send one?

mattans commented 4 years ago

@mattans We are happy to accept a PR for that. Would you like to send one?

Yes, I will do it in the following days. Thank you very much.

mattans commented 4 years ago

Just to make sure: @fmassa , what will happen if I use the object detection models without pretraining? Will it still auto-normalize the inputs? Also, does this auto-normalization apply for both training and inference?

pmeier commented 4 years ago

[W]hat will happen if I use the object detection models without pretraining? Will it still auto-normalize the inputs?

Yes. The normalization transform is "hard coded" into the models:

https://github.com/pytorch/vision/blob/131ba1320b8208f10eb58d5feb7416c90ed839bb/torchvision/models/detection/faster_rcnn.py#L227-L233

KeypointRCNN and MaskRCNN inherit from FasterRCNN (shown above) and thus also have this behavior.


Also, does this auto-normalization apply for both training and inference?

Yes. The model is created the same for training and inference

https://github.com/pytorch/vision/blob/131ba1320b8208f10eb58d5feb7416c90ed839bb/references/detection/train.py#L95-L98

and the transform is also applied unconditionally:

https://github.com/pytorch/vision/blob/131ba1320b8208f10eb58d5feb7416c90ed839bb/torchvision/models/detection/generalized_rcnn.py#L79