openai / InfoGAN

Code for reproducing key results in the paper "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"
https://arxiv.org/abs/1606.03657
1.06k stars 308 forks source link

how to express 3 discrete latent codes (each with dimension 20) and visual work ok? #10

Open zdx3578 opened 8 years ago

zdx3578 commented 8 years ago

how to express 10 dimensional categorical variables

code: latent_spec = [ (Uniform(62), False), (Categorical(10), True), (Uniform(1, fix_std=True), True), (Uniform(1, fix_std=True), True), ] is for mnist , this is not enough, in paper: MNIST, we choose to model the latent codes with one categorical code, c1 ⇠ Cat(K = 10, p = 0.1), which can model discontinuous variation in data, and two continuous codes that can capture variations that are continuous in nature: c2 , c3 ⇠ Unif ( 1, 1).

but what to express: Street View House Number (SVHN we make use of four 10 dimensional categorical variables and two uniform continuous variables as latent codes.

CelebA In this dataset, we model the latent variation as 10 uniform categorical variables, each of dimension 10.

append c.3 generator G Input 2 R228 228 how to get 228?

discriminator D / recognition network Q generator G Input 32 ⇥ 32 Color image Input 2 R228 4 ⇥ 4 conv. 64 lRELU. stride 2 FC. 2 ⇥ 2 ⇥ 448 RELU. batchnorm 4 ⇥ 4 conv. 128 lRELU. stride 2. batchnorm 4 ⇥ 4 upconv. 256 RELU. stride 2. batchnorm 4 ⇥ 4 conv. 256 lRELU. stride 2. batchnorm 4 ⇥ 4 upconv. 128 RELU. stride 2. FC. output layer for D, FC.128-batchnorm-lRELU-FC.output for Q 4 ⇥ 4 upconv. 64 RELU. stride 2. 4 ⇥ 4 upconv. 3 Tanh. stride 2.

any one any help? thanks very much !

zdx3578 commented 8 years ago

if isinstance(dist, Gaussian): assert dist.dim == 1, "Only dim=1 is currently supported" c_vals = [] for idx in xrange(10): c_vals.extend([-1.0 + idx * 2.0 / 9] * 10) c_vals.extend([0.] * (self.batch_size - 100)) vary_cat = np.asarray(c_vals, dtype=np.float32).reshape((-1, 1)) cur_cat = np.copy(fixed_cat) cur_cat[:, offset:offset+1] = vary_cat offset += 1 elif isinstance(dist, Categorical): lookup = np.eye(dist.dim, dtype=np.float32) cat_ids = [] for idx in xrange(10): cat_ids.extend([idx] * 10) cat_ids.extend([0] * (self.batch_size - 100)) cur_cat = np.copy(fixed_cat) cur_cat[:, offset:offset+dist.dim] = lookup[cat_ids] offset += dist.dim elif isinstance(dist, Bernoulli): assert dist.dim == 1, "Only dim=1 is currently supported"

zdx3578 commented 8 years ago
embedding_dim = 100

latent_spec = [
    (Uniform(64), False),
    (Categorical(32), True),
]
con_latent_spec = [
    (LatentGaussian(embedding_dim), True)
]

https://github.com/RutgersHan/InfoGAN/blob/dev_auto/launchers/generate_images.py

zdx3578 commented 8 years ago

C.3 CelebA The network architectures are shown in Table 3. The discriminator D and the recognition network Q shares most of the network. For this task, we use 10 ten-dimensional categorical code and 128 noise variables, resulting in a concatenated dimension of 228.

is

latent_spec = [
    (Uniform(128), False),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
]

but how to config C.5 Chairs as below

The network architectures are shown in Table 6. The discriminator D and the recognition network Q shares the same network, and only have separate output units at the last layer. For this task, we use 1 continuous latent code, 3 discrete latent codes (each with dimension 20), and 128 noise variables, so the input to the generator has dimension 189.

elif isinstance(dist, Bernoulli): assert dist.dim == 1, "Only dim=1 is currently supported"

NHDaly commented 8 years ago

The above latent_spec worked okay for me.

c3_celebA_latent_spec = [
    (Uniform(128), False),  # Noise
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
    (Categorical(10), True),
]
c3_celebA_image_size = 32

Can you elaborate a bit more in words what you're having problems with? I'm not sure I understand what's not working for you.

zdx3578 commented 8 years ago

thanks NHDaly ! can share your code?

i think celebA config is ok, now question is about how to config C.5 Chairs as below

The network architectures are shown in Table 6. The discriminator D and the recognition network Q shares the same network, and only have separate output units at the last layer. For this task, we use 1 continuous latent code, 3 discrete latent codes (each with dimension 20), and 128 noise variables, so the input to the generator has dimension 189.

3 discrete latent codes (each with dimension 20) but visual code is below: elif isinstance(dist, Bernoulli): assert dist.dim == 1, "Only dim=1 is currently supported"

We used separate configurations for each learned variation, shown in Table 7. For this task, we found it necessary to use different regularization coefficients for the continuous and discrete latent codes. image

so how to config c.5 chairs,and how to change visual code ? or visual code wont change??

and 2 question is : in C.4 Faces The network architectures are shown in Table 4. The discriminator D and the recognition network Q shares the same network, and only have separate output units at the last layer. For this task, we use 5 continuous latent codes and 128 noise variables, so the input to the generator has dimension 133. We used separate configurations for each learned variation, shown in Table 5.

how to config 'separate configurations for each learned variation' ??

image

NHDaly commented 8 years ago

how to config C.5 Chairs as below

I might be misunderstanding, but it seems like

For this task, we use 1 continuous latent code, 3 discrete latent codes (each with dimension 20), and 128 noise variables, so the input to the generator has dimension 189.

would translate to the following latent_spec. That is, the continuous code is represented by Uniform and the discrete code is represented by Categorical:

c5_chairs_latent_spec = [
    (Uniform(128), False),  # Noise
    (Uniform(1, fix_std=True), True),
    (Categorical(20), True),
    (Categorical(20), True),
    (Categorical(20), True),
]
c3_celebA_image_size = 32

I copied the (Uniform(1, fix_std=True), True) line from the two continuous variables defined in run_mnist_exp.py, which I believe represent the "2 continuous latent codes" referenced from the MNIST section of the paper.

I'm not sure where you got the LatentGaussian from... I don't know if it's necessary? I haven't tried running the Chairs model at all.

NHDaly commented 8 years ago

That said, I am also very curious about the answer to this question:

how to config 'separate configurations for each learned variation' ?

Does this mean that you ran the experiment multiple times with the same number of codes, but each of the codes tends to perform best for each of the provided settings?

neocxi commented 8 years ago

That is, the continuous code is represented by Uniform and the discrete code is represented by Categorical:

This is correct. Thanks @NHDaly !

Does this mean that you ran the experiment multiple times with the same number of codes, but each of the codes tends to perform best for each of the provided settings?

Yes, to better compare with previous supervised results, we select codes from multiple runs that are most similar to categories that previous method (DC-IGN) produces.

zdx3578 commented 8 years ago

for @NHDaly ref this https://github.com/RutgersHan/InfoGAN/blob/dev_auto/launchers/run_flower_exp.py#L49

is your celeba train result is ok?

for @neocxi 1 self.reg_cont_latent_dist = Product([x for x in self.reg_latent_dist.dists if isinstance(x, Gaussian)]) self.reg_disc_latent_dist = Product([x for x in self.reg_latent_dist.dists if isinstance(x, (Categorical, Bernoulli))]) Bernoulli is also discrete. where to use Bernoulli?

2 can @neocxi give a example show which parameter is according to image ?
is info_reg_coeff=1.0, parameter??

3 what cause NAN error? D and G learning rate not equilibrium??

Epoch 14 | discriminator_loss: 0.128064; generator_loss: 2.78964; MI_disc: 20.3559; CrossEnt_disc: 2.66993; MI: 20.3559; CrossEnt: 2.66993; max_real_d: 0.999938; min_real_d: 0.560705; max_fake_d: 0.240968; min_fake_d: 0.0144349 STR: 'avg_log_vals' is [ 1.28064305e-01 2.78963685e+00 2.03559246e+01 2.66993141e+00 2.03559246e+01 2.66993141e+00 9.99938190e-01 5.60704947e-01 2.40968212e-01 1.44348787e-02] STR: 'ganlp' is 2 |ETA: --:--:-- Epoch 15 | discriminator_loss: nan; generator_loss: nan; MI_disc: nan; CrossEnt_disc: nan; MI: nan; CrossEnt: nan; max_real_d: -inf; min_real_d: inf; max_fake_d: -inf; min_fake_d: inf STR: 'avg_log_vals' is [ nan nan nan nan nan nan -inf inf -inf inf] Traceback (most recent call last): File "launchers/run_mnist_exp.py", line 97, in algo.train() File "/home/ubuntu/wordk/InfoGAN/infogan/algos/infogan_trainer.py", line 335, in train raise ValueError("NaN detected!") ValueError: NaN detected!

4 celeba train need how long ? epoch log can share ?

log d loss very small g loss bigger

2016-10-07 7 55 16

zdx3578 commented 7 years ago

A.2 INFOGAN TRAINING To train the InfoGAN network described in Tbl. 1 on the 2D shapes dataset (Fig. 6), we followed the training paradigm described in Chen et al. (2016) with the following modifications. For the mutual information regularised latent code, we used 5 continuous variables ci sampled uniformly from ( 1,1). We used 5 noise variables zi, as we found that using a reduced number of noise variables improved the quality of generated samples for this dataset. To help stabilise training, we used the instance noise trick described in Shi et al. (2016), adding Gaussian noise to the discriminator inputs (0.2 standard deviation on images scaled to [ 1, 1]). We followed Radford et al. (2015) for the architecture of the convolutional layers, and used batch normalisation in all layers except the last in the generator and the first in the discriminator.

from beta-VAE: LEARNING BASIC VISUAL CONCEPTS WITH A CONSTRAINED VARIATIONAL FRAMEWORK

simonzhang0158 commented 7 years ago

I am wondering if there is any update about CelebA dataset. I have the same problem as @zdx3578 when I am trying to train CelebA. @zdx3578 Did you find a way to solve this? The paper you mentioned above I believe is the set up for 2D shapes dataset.

Any help will be appreciate.