Open jianzhuwang opened 2 years ago
On a forward pass all ReLU does is clip negative values, so I'm not sure I've seen a true comparison of using the output of the conv layer vs the clipped output of the conv layer. Either way, feel free to submit a PR that updates this.
On a forward pass all ReLU does is clip negative values, so I'm not sure I've seen a true comparison of using the output of the conv layer vs the clipped output of the conv layer. Either way, feel free to submit a PR that updates this.
Thanks for your reply.
The crucial code are as follows:
if conv_index == '22': self.vgg = nn.Sequential(modules[:8]) elif conv_index == '54': self.vgg = nn.Sequential(modules[:35])
According to some explanations like in https://paperswithcode.com/method/vgg-loss, we usually use the feature maps activated by a function like 'ReLU' to compute the perceptual similarity. More specifically, the conv_index 'i,j' are regarded to take the j-th convolution (after activation) before the i-th maxpooling layer. If so, back to the code, '22' will correspond to (modules[:9], which refers to the layer after ReLU activation), and similarly, '54' will correspond to (modules[:36], which refers to the layer after ReLU activation).
I am not sure whether my understanding is correct or not. Your help would be highly appreciated.
Best.