leeyeehoo / CSRNet-pytorch

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes
642 stars 259 forks source link

Why resolution of density map is reduce by 1/8 comparing to input image ? #54

Closed ttpro1995 closed 5 years ago

ttpro1995 commented 5 years ago

I run the train_loader, with is create in train.py

train_loader = torch.utils.data.DataLoader(
    ListDataset(train_list,
                        shuffle=True,
                        transform=transforms.Compose([
                            transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                                                        std=[0.229, 0.224, 0.225]),
                        ]),
                        train=True,
                        seen=model.seen,
                        batch_size=args.batch_size,
                        num_workers=args.workers),
    batch_size=args.batch_size)
for i, (img, target) in enumerate(train_loader):
    print(img.shape)
    print(target.shape)
    break

The result show that resolution of ground truth density map is reduce by 1/8 comparing to input image.

torch.Size([1, 3, 768, 1024])
torch.Size([1, 96, 128])

As in paper, the author mentions:

Since the output (density maps) of CSRNet is smaller (1/8 of in- put size), we choose bilinear interpolation with the factor of 8 for scaling and make sure the output shares the same resolution as the input image.

visionxyz commented 5 years ago

Because there are three pooling layers.

ttpro1995 commented 5 years ago

I mean author said that " we choose bilinear interpolation", does it mean bilinear interpolation be a layer in model ? So the output of last layer will have same size at input image.

Pipoderoso commented 5 years ago

The bilinear interpolation is not part of the model, the last trainable layers are the strided convolutions. You can add it as a post-processig task, however the count has to be calculated before.

ttpro1995 commented 5 years ago

yeah I think @Pipoderoso right. I code with Keras, add Upsampling layer and wonder why it does not get better (no trained).

When I seriously check Pytorch code and notice that output is 1/8 input.