maxorange / voxel-dcgan

A deep generative model of 3D volumetric shapes
MIT License
151 stars 34 forks source link

Question about the last operation in the Discriminator #1

Closed thunguyenphuoc closed 7 years ago

thunguyenphuoc commented 7 years ago

Hello there, As I was going through your code, I was a bit puzzled by the last operation if the discriminator: From 'model.py' file: m = tf.matmul(h, self.W['md']) m = tf.reshape(m, [-1, self.n_kernels, self.dim_per_kernel]) abs_dif = tf.reduce_sum(tf.abs(tf.expand_dims(m, 3) - tf.expand_dims(tf.transpose(m, [1, 2, 0]), 0)), 2) f = tf.reduce_sum(tf.exp(-abs_dif), 2) + self.b['md']

    h = tf.concat(1, [h, f])
    y = tf.matmul(h, self.W['h5']) + self.b['h5']
    return y

Could you explain to me this operation?

Thank you :)

maxorange commented 7 years ago

Hi, This operation is minibatch discrimination proposed in Improved Techniques for Training GANs. It is described in section 3.2.

In the original GAN training, the generator could concentrate on generating just a few examples. All output of the generator are very similar (low diversity).

Minibatch discrimination is a way to remedy this problem. In my experiment, minibatch discrimination helps prevent DCGAN from generating a few examples.

thunguyenphuoc commented 7 years ago

Hello,

Thank you for your fast reply. I guessed so as well. I try to implement 3D GAN for ModelNet10 based on your code but still cannot achieve the results like yours :) Did you find adding noise to the Discriminator also helps?

T

maxorange commented 7 years ago

Adding noise helps training somewhat. I increased the number of training examples by dilating voxel models. I used scipy.ndimage.morphology.binary_dilation for dilation.

Recently, I found that ELU+dropout also works well instead of using (leaky) ReLU+batchnorm.

thunguyenphuoc commented 7 years ago

Hello, Thanks for the tip for data augmentation and the insights in ELU+dropout. Right now, I only use rotation to augment the data.

I ran a couple of more experiments and found that: a/ Virtual batch norm makes it worse (both of training speed) and quality and b/Playing with the stddev of the noise of the Discriminator input helps with both the quality of the result and the diversity of the generated 3d shapes.

Anyhow, thank you for your tips and look forward to more results from you :)

T

thunguyenphuoc commented 7 years ago

One final question (I'm not sure if it is worth opening another issue or not): where did you get your training data from? Did you download the Shapenet data and convert them to binvox files yourself, or there is another publicly available voxelised Shapenet dataset somewhere?

Thank you for your help, T

maxorange commented 7 years ago

Thank you for the insights in virtual batch norm and the noise of the Discriminator input.

I got Shapenet data from their website (here https://www.shapenet.org/). To download the data, registration is required. They provides several subsets of the full Shapenet dataset. I downloaded ShapeNetCore subset which has 57,449 examples and converted them to binvox files myself.

When I converted, I filled the inside of voxel model with voxels. This is called solid voxelization. I trained GANs on solid voxelized dataset, not surface voxelized one.

If you need, feel free to open another issue.