leeyeehoo / CSRNet-pytorch

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes
642 stars 259 forks source link

Bug in val code #25

Open aditya-malte opened 5 years ago

aditya-malte commented 5 years ago

While executing the val code, one gets an unexpected output for crowd count(something random). The probable cause is incorrect normalization values that are substracted. Using the same normalization as while training (using torch transform) seems to solve the problem.

A code akin to this should solve the issue:

im = Image.open(path).convert('RGB')

im = np.array(im)

im = im/255.0

im[:,:,0]=(im[:,:,0]-0.485)/0.229
im[:,:,1]=(im[:,:,1]-0.456)/0.224
im[:,:,2]=(im[:,:,2]-0.406)/0.225

Thank you!

leeyeehoo commented 5 years ago

Thanks

linqiaozhou commented 5 years ago

img = transform(Image.open(img_paths[i]).convert('RGB')).cuda()

while i just used the comment sentence, it can get the right results, and it not divide by 255. why do you think divided by 255 is needed?

aditya-malte commented 5 years ago

Hello, The transform method is defined in the val code as follows:

transform=transforms.Compose([
                       transforms.ToTensor(),transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225]),
                   ])

The transforms function actually scales the values to [0,1] from [0,255] (range of RGB) before normalization. In order to reduce dependencies(for another project), I had directly performed the operations myself and hence uploaded this code:


im = Image.open(path).convert('RGB')

im = np.array(im)

im = im/255.0

im[:,:,0]=(im[:,:,0]-0.485)/0.229
im[:,:,1]=(im[:,:,1]-0.456)/0.224
im[:,:,2]=(im[:,:,2]-0.406)/0.225

Thank you

linqiaozhou commented 5 years ago

Well, I got, tks~

aditya-malte commented 5 years ago

You are welcome :)

amiltonwong commented 5 years ago

@leeyeehoo , Could you commit the updated snippet into repo?