pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.31k stars 6.97k forks source link

What's the input format of the fasterrcnn_resnet50_fpn? I mean RGB or BGR. #1608

Closed kangkang59812 closed 5 years ago

kangkang59812 commented 5 years ago

pytorch>=1.1

I notice that both the RGB and BGR input of [n,c,h,w] can get a good result(BGR is slightly higher).

model = fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

## RGB
img1 = Image.open('image1.jpg')
## BGR
img2 = np.array(img1)[:, :, [2, 1, 0]].copy()

x1= [transforms.ToTensor()(img1)]
x2= [transforms.ToTensor()(img2)]

predictions1 = model(x1)
predictions2 = model(x2)

It seems that predictions2 is better. So, should I use the BGR format to fine-tuning and eval ? I can't find this information in the code and I only know the size is [n,c,h,w]. In the config of the detectron2 of facebook, it says

# Values to be used for image normalization (BGR order).
# To train on images of different number of channels, just set different mean & std.
# Default values are the mean pixel value from ImageNet: [103.53, 116.28, 123.675]
_C.MODEL.PIXEL_MEAN = [103.530, 116.280, 123.675]

So BGR is the one we should choose?

fmassa commented 5 years ago

The code expects images in RGB format, in 0-1 range, as the other models in the torchvision modelzoo. This is the setup where the models were trained, and I would expect that in general it would give best performances.