According to docs, all the resnet models were trained on RGB (normalised) images.
All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225].
As far as I can tell, the DOTA dataset loader uses cv2.imread, meaning the images are fed into the pretrained network as BGR non-normalised data.
Checking the code I did not see any channel re-ordering or normalisation, but I might have missed something. Could you please confirm if the DataLoader for DOTA should be fixed?
Hi, I think I didn't do re-ordering and normalization because with Batch Normalization it would be fine. But your point is great, it would make more stable training and it may reduce the loss NAN problem. Thanks!
Dear @yijingru
According to docs, all the resnet models were trained on RGB (normalised) images.
As far as I can tell, the DOTA dataset loader uses
cv2.imread
, meaning the images are fed into the pretrained network as BGR non-normalised data.Checking the code I did not see any channel re-ordering or normalisation, but I might have missed something. Could you please confirm if the DataLoader for DOTA should be fixed?