val-iisc / deligan

This project is an implementation of the Generative Adversarial Network proposed in our CVPR 2017 paper - DeLiGAN : Generative Adversarial Networks for Diverse and Limited Data. DeLiGAN is a simple but effective modification of the GAN framework and aims to improve performance on datasets which are diverse yet small in size.
MIT License
111 stars 28 forks source link

Latent Space #8

Open Bob-RUC opened 5 years ago

Bob-RUC commented 5 years ago

Hi, I noticed the paper said the trained latent space is a mixed Gaussian distribution with trainable variance and expectation:

In particular, we propose a reparameterization of the latent space as a Mixture- of-Gaussians model.

However, it seems that in the script the latent space applied here is a uniform distribution with trainable variance and expectation: display_z = np.random.uniform(-1.0, 1.0, [batchsize, z_dim]).astype(np.float32) I don't quite understand this inconsistency.

swami1995 commented 5 years ago

Hi Bob,

thanks for pointing out the issue. The distribution used for training in mnist is actually defined in the following line. It is indeed the standard normal distribution : https://github.com/val-iisc/deligan/blob/68451c8923650b9239a87efb3b88f04b6969e54b/src/mnist/dg_mnist.py#L190 The line you pointed out is actually initializing the variable for evaluation and that's most probably a bug in our code. I think that bug was a result of some experiments that we were doing after submission.. but the results in the paper correspond to the case where display_z was sampled from the standard normal distribution as well. We will correct this bug in the repository soon.

However, the codes for the other datasets (cifar-10 and sketches) don't have that bug. Feel free to use them as is.

Thanks for your interest in our paper.

Bob-RUC commented 5 years ago

Thank you very much for your reply. I've fixed it as you said and it worked well as the paper had presented. However I have one more question concerning the optimization codes. I noticed that two parameters, t1 and thres are used to control the range of generator loss, where t1 is used to control thres and thres directly controls generator loss. I found it a particularly delicate controlling method for GAN but I can't figure out how it was developed to fit the model. Could you please give my some tuition on this issue?

swami1995 commented 5 years ago

Hi Bob,

I essentially used those variables to provide a curriculum during training. thres was used to decide whether to update the generator v/s the discriminator. This was decided based on the generator loss. Simultaneously, the value of thres was increased/decreased after each iteration of generator/discriminator to ensure that one of them doesn't get overtrained. t1 was just a constant that provided a lower bound for thres and was heuristically chosen.

Hope that helped with some of the intuition. However I would not recommend using these heursitics. You'd be better off using the more modern GAN frameworks to stabilize training as opposed to relying on these heuristics.

Bob-RUC commented 5 years ago

I think I generally has grasped your intuition. Thank you very much for helping me figure out what's happening here!

TanmDL commented 4 years ago

I am new in Tensorflow. While I was running the toy dataset code, I got this error "ValueError: Variable g_z already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?" how do I fixed it?