leeyeehoo / CSRNet-pytorch

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes
656 stars 259 forks source link

What does the following code mean?img[0,:,:]=img[0,:,:]-92.8207477031 #56

Open CheungYooo opened 5 years ago

CheungYooo commented 5 years ago

img[0,:,:]=img[0,:,:]-92.8207477031 img[1,:,:]=img[1,:,:]-95.2757037428 img[2,:,:]=img[2,:,:]-104.877445883

@leeyeehoo Thank you for your sharing. When you validate CSRNet with 'val.ipynb',you use the code above.So,my question is why you minus the specific values above(92.8207477031,95.2757037428,104.877445883).What is the mean of the values?Why can't they be other values?

wzhings commented 5 years ago

Do you understand it now? I also get confused about these values.

Cli98 commented 4 years ago

you should not use those value. As the author uses pretrain weights instead of training from zero. However, if you train from zero, you need those values then.

Muhibullah1 commented 3 years ago

it seems he is taking mean pixel values across each channel (R, B, G) and then subtract them for the sake of normalization

baosongliang commented 2 years ago

@CheungYooo I think you should use following code instead:

`from torchvision import datasets, transforms transform=transforms.Compose([ transforms.ToTensor(),transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ])

mae = 0 for i in range(len(img_paths)): img = transform(Image.open(img_paths[i]).convert('RGB')).cuda() gt_file = h5py.File(img_paths[i].replace('.jpg','.h5').replace('images','ground_truth'),'r') groundtruth = np.asarray(gt_file['density']) output = model(img.unsqueeze(0)) mae += abs(output.detach().cpu().sum().numpy()-np.sum(groundtruth)) print(i,mae) print (mae/len(img_paths))`