why the model use only first three channels of the last layer output ?

huggingface / pytorch-pretrained-BigGAN

🦋A PyTorch implementation of BigGAN with pretrained weights and conversion scripts.

MIT License

1.02k stars 180 forks source link

why the model use only first three channels of the last layer output ? #9

Open apple2373 opened 5 years ago

apple2373 commented 5 years ago

https://github.com/huggingface/pytorch-pretrained-BigGAN/blob/1e18aed2dff75db51428f13b940c38b923eb4a3d/pytorch_pretrained_biggan/model.py#L245-L246

I'm trying to understand the model by reading code. I noticed that conv_to_rgb has actually 128 channels but only first three are used for the final RGB image. Why do you do this? What the other 125 channels for?

thomwolf commented 5 years ago

They are dropped. This is done several times in the model actually, also here: https://github.com/huggingface/pytorch-pretrained-BigGAN/blob/1e18aed2dff75db51428f13b940c38b923eb4a3d/pytorch_pretrained_biggan/model.py#L192-L194

If you read the latest version of the BigGAN paper, you will see it part of the changes in the new "deep" versions of BigGAN.

apple2373 commented 5 years ago

Thanks for the reply! I think I am confused. If you will simply drop channels, why don't you use that smaller channels in the training time? I mean, in the last layer, for example, why don't you use just nn.Conv2d(128,3) instead of training nn.Conv2d(128,128) and dropping 125 channels in the inference time?

Could you point to specific page and line where authors are explaining this part? I tried to find it in the 1809.11096v2, but I could not find it. The table 9.a just says BN, ReLU, 3 × 3 Conv ch → 3.

apple2373 commented 5 years ago

I still can't understand why this repository use strange channel dropping trick. Is it this repository owner's invented trick that training larger channel and dropping at inference time ?

I checked the BigGAN author's implementation but he does not seem to use channel dropping... https://github.com/ajbrock/BigGAN-PyTorch/blob/ba3d05754120e9d3b68313ec7b0f9833fc5ee8bc/BigGANdeep.py#L68-L93

thomwolf commented 5 years ago

Well I'm not very familiar with Andy's implementation but I see a channel dropping part here: https://github.com/ajbrock/BigGAN-PyTorch/blob/ba3d05754120e9d3b68313ec7b0f9833fc5ee8bc/BigGANdeep.py#L54-L56

I'm not sure Andy's implementation can load the --deep models, which is what the present repo is based on (see https://github.com/ajbrock/BigGAN-PyTorch/issues/10).

Maybe you would be better off asking in the issues of https://github.com/ajbrock/BigGAN-PyTorch ?

apple2373 commented 5 years ago

Oh, I missed that part. If the original one uses the channel drop, it makes sense to use it here. Thanks! I'll ask authors directly.

apple2373 commented 5 years ago

The original author answered. It's because tensorflow will be faster when the number of input and output channels are the same. I think it's okay to delete the unused channels from this repository as it just wastes the computational resources when it comes to pytorch.