yunxiaoshi / Neural-IMage-Assessment

A PyTorch Implementation of Neural IMage Assessment
508 stars 92 forks source link

The torchvision pretrained VGG-16 requires normalization of inputs and you do not do this #26

Open crowsonkb opened 3 years ago

crowsonkb commented 3 years ago

As per the torchvision documentation:

The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. You can use the following transform to normalize:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

Not doing this will cause VGG-16 to output the wrong feature maps and you will probably get worse results. If you add this transform you will have to retrain though.

sanchit88 commented 3 years ago

Any update on this issue?

yunxiaoshi commented 3 years ago

I guess it makes sense to use ImageNet statistics here since AVA and ImageNet doesn't differ much in terms of domain. Did observe some improvement

crowsonkb commented 3 years ago

You should be using ImageNet statistics for any input because that's what VGG-16 was trained on, you should only use different statistics if you trained or fine-tuned VGG-16 on a dataset where you normalized with those different statistics during training. If you are training a model on VGG's outputs the inputs to VGG still need to use the statistics VGG was trained with.