Incorporating labels in ACGAN: Does torch.mul() have the same effect as concatenation and adding as additional channels? #3

ghost commented 2 years ago

Hi, first of all, thanks for your amazing work!

I am not sure if this is the right place to ask this question, but here I go; I have a question regarding a difference between your implementation and other implementations of ACGAN I have seen. In almost all implementations, I noticed the use of an embedding layer for including labels in the generator. However, I have often seen this in combination with a concatenation of the embedding and the noise vectors. In your code you use a multiplication, as below;

x = torch.mul(self.label_embedding(labels), inputs) x = self.linear(x) x = x.view(x.shape[0], self.feature_size * 2, 4, 4)

Is this the same as the concatenation and if not, what is the difference and would you prefer one over the other?

Thank you in advance!

kad99kev commented 2 years ago

Hello, you bring up a valid point! I've noticed that a lot of places either use concatenation, but I have seen some places that do use multiplication. When you use multiplication, you are essentially keeping the same shape, that is when the embedding vector and noise vector is "merged" (ie multiplied), the output shape is the same. For example:

>>> l = torch.randint(0, 10, (64,))
>>> l.shape
torch.Size([64])
>>> em = torch.nn.Embedding(10, 100)
>>> em(l).shape
torch.Size([64, 100])
>>> r = torch.randn(size=(64, 100))
>>> r.shape
torch.Size([64, 100])
>>> torch.mul(em(l), r).shape
torch.Size([64, 100])

However, when you concatenate, you get additional dimensions (they add up). For example:

>>> l = torch.randint(0, 10, (64,))
>>> em = torch.nn.Embedding(10, 50)
>>> em(l).shape
torch.Size([64, 50])
>>> r = torch.randn(size=(64, 100))
>>> r.shape
torch.Size([64, 100])
>>> torch.cat((em(l), r), 1).shape
torch.Size([64, 150])

From my experience with concatenation, whenever I wanted to generate a particular class after training, the images I obtained were not too varied. Every image looked very similar for its respective class (if I wanted to generate 1, all the 1s that were generated would look the same). However with multiplication, whenever I wanted to generate a particular class, I would get different images for the same class (if I wanted to generate 1, all the 1s that were generated had a different style while still representing 1).

I hope this answers the question! Thank you!

ghost commented 2 years ago

Thank you very much for your fast reply, that makes sense! I have implemented the multiplication and it seems to work.

I have one more question regarding your code; why do you use dropout in the discriminator and not in the generator? In a repository with useful GAN practices (https://github.com/soumith/ganhacks), they advice to use dropout in the Generator. What is the reason for using dropout in the discriminator? (my intuition would be to avoid overfitting, but I am not sure). Also, do you happen to know the reason/advantage of using dropout in the Generator?

Thank you again!

kad99kev commented 2 years ago

That is an interesting observation! As they've mentioned, GANs are pretty unstable. While I was building the model, I found that adding Dropout in the Discriminator helped stabilise my training! You could try swapping it out and adding them in the Generator, then see if it stabilises! When I was training GANs, I noticed that one architecture did well on certain data, but it collapsed on other data. Sometimes using noisy labels did not help me, but sometimes it did! And you're right about Dropout, it is used to improve generalisation and prevent overfitting.

I hope this answers your question. Do let me know if you have any more questions!

ghost commented 2 years ago

Thanks once again for your explanation, these are very helpful to me!

At the moment I am trying out different architectures for a around 15 epochs each (training still takes up to 1 hour per epoch) to see which one works best. Would you mind if I share some of my intermediate results with you to discuss?

Also, how exactly do you determine that training has stabilized? Do you do so by checking generated images, losses of D and G and the values of D(x) and D(G(z))?

Thanks you very much, I really appreciate your help!

kad99kev commented 2 years ago

Yes, I am happy to discuss your results if you'd like! Also regarding stabilisation, you've got that right. I usually check the losses and the images that have been generated. To keep a track of that I use a tool called Weights and Biases (it's a brilliant tool, I use it for most of my projects). If it would help, you could also check out this project (my final year project) - FGTD. I performed some visualisations here using Weights and Biases, it could give you a better idea of how I track my training runs.

ghost commented 2 years ago

Thank you very much! Once I have tried out different architectures I will post them here. I will take a look at that tool, it indeed looks very useful.

ghost commented 2 years ago

Hi, hope you are doing great!

Sorry it took so long, running the models to try out different architectures took me quite some time. I selected the best models and ended up with 5 different architectures that all produce results that look relatively good to the eye. For all these models, I added gaussian noise to layers in the discriminator of the GAN. However, the architectures differ in the number of layers to which I add the noise, and they also differ in how much decay I applied to the standard deviation of the gaussian noise. For each model I made a small folder that contains a grid of generated images to show output, and I also included graphs of the losses of G and D, together with values of D(x) and D(G(z)) over the iterations.

If you find the time and don't mind discussing these results, would you be able to advice me which model to pick to train for around 500 epochs based on my output thus far? (For training for 500 epochs, I have access to a cluster computer luckily.) Here is the link to my output; https://drive.google.com/drive/folders/1GIiOzWdSmfcEQiOQ0doWawtAwDfmUr2W?usp=sharing

I am hoping you can provide me with some of your insights, which have greatly helped me so far!

Thank you very much in advance!

kad99kev commented 2 years ago

Hello! I had a look at the images and they look pretty good! One way to actually evaluate your images is by using scoring methods like Inception Score and FID Score. People usually prefer using FID Score over Inception (https://datascience.stackexchange.com/questions/69506/inception-score-is-and-fr%C3%A9chet-inception-distance-fid-which-one-is-better-f), but I suggest you evaluate using both.

Here are some links that should be helpful:

https://github.com/GaParmar/clean-fid
https://github.com/toshas/torch-fidelity (it supports Inception as well as FID)

The links above are not the original implementation, but they are still pretty close to the original ones. Using these scores as a reference should give you a better idea about the quality of your generated images with respect to the original ones. I hope this helps!

ghost commented 2 years ago

Thank you very much, those definitely helped! I computed the scores and was able to choose one model!

kad99kev commented 2 years ago

That's great! Glad I was able to help! All the best with your project 😄

kad99kev / pytorch_gan_trainer

Incorporating labels in ACGAN: Does torch.mul() have the same effect as concatenation and adding as additional channels? #3 #1