Number of out channels - Githubissues

I am trying to use the pretrained LSUN neural networks to write my own sampling algorithm. When I call model.forward on an image of size 1x3x256x256 with some time, I get an output of size 1x6x256x256. However, shouldn't the output of the network be the noise epsilon associated with the image and therefore be of the same size as the input image? What am I missing here?