The variable y is the output of the style model and x_c is a copy of the image. Both are run through the VGG network that outputs 4 arrays with 64, 128,512 and 512 maps. To get the content loss, only the second ( features_y[1], features_xc[1].data).
Why was this specific layer selected for the content loss computation, rather that the one(s) with more maps?
The variable
y
is the output of the style model andx_c
is a copy of the image. Both are run through the VGG network that outputs 4 arrays with 64, 128,512 and 512 maps. To get the content loss, only the second (features_y[1]
,features_xc[1].data
).Why was this specific layer selected for the content loss computation, rather that the one(s) with more maps?