Open crowsonkb opened 3 years ago
Any update on this issue?
I guess it makes sense to use ImageNet statistics here since AVA and ImageNet doesn't differ much in terms of domain. Did observe some improvement
You should be using ImageNet statistics for any input because that's what VGG-16 was trained on, you should only use different statistics if you trained or fine-tuned VGG-16 on a dataset where you normalized with those different statistics during training. If you are training a model on VGG's outputs the inputs to VGG still need to use the statistics VGG was trained with.
As per the torchvision documentation:
Not doing this will cause VGG-16 to output the wrong feature maps and you will probably get worse results. If you add this transform you will have to retrain though.