leeyeehoo / CSRNet-pytorch

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes
642 stars 259 forks source link

Why there is no relu activation at output_layer ? #53

Open ttpro1995 opened 5 years ago

ttpro1995 commented 5 years ago

In model.py (https://github.com/leeyeehoo/CSRNet-pytorch/blob/master/model.py)

I noted that:

self.output_layer = nn.Conv2d(64, 1, kernel_size=1) does not come with relu.

Others layers are:

[conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]

Can someone help me understand why ReLU is not require at output_layer, as it is so useful to have density map always greater than 0.

doubbblek commented 4 years ago

In my opinion, ReLU function is used to filter out redundant information in the features, so it is widely used in the intermediate layers., where we can eliminate the useless features. But for the output layer, after the 1 by 1 convolution, the network gives us the final density map instead of feature map and there is no useless information.

ttpro1995 commented 4 years ago

But the density map must be greater than 0. The Relu function ensure that no value <0 in density map.

In my opinion, ReLU function is used to filter out redundant information in the features, so it is widely used in the intermediate layers., where we can eliminate the useless features. But for the output layer, after the 1 by 1 convolution, the network gives us the final density map instead of feature map and there is no useless information.

doubbblek commented 4 years ago

But the density map must be greater than 0. The Relu function ensure that no value <0 in density map.

In my opinion, ReLU function is used to filter out redundant information in the features, so it is widely used in the intermediate layers., where we can eliminate the useless features. But for the output layer, after the 1 by 1 convolution, the network gives us the final density map instead of feature map and there is no useless information.

Yes, you're right. Actually you can try adding a ReLU function to the output layer. The network can still work but may take more time to converge.

For example, there is a pixel in density map and its target value is 0.5. During training, the prediction of this pixel value by network is -0.01. In this case, if ReLU function is added, it will set this pixel as 0 and thus stop the backpropagation to the related parameters.

But if the size of your training data is large enough, I think it can still work.