interactiveaudiolab / course-deep-learning

Teaching materials for the deep learning course.
15 stars 2 forks source link

Notebook_5_gan tweaks, changes... #5

Closed bryan-pardo closed 2 years ago

bryan-pardo commented 2 years ago

image

bryan-pardo commented 2 years ago

You have these vertical pink and blue arrows in the image.....I have no idea what they are about. Also...something about this graph makes it seem like pairwise comparisons are happening.....I wonder if there is a way of modifying it so that it is clear there are no matched pairs between the true and the synthetic data....Maybe not..but just wondering.

OK...now, looking at the GAN code....

`class MLPGAN(torch.nn.Module):

def __init__(self, 
             depth: int, 
             latent_size: int, 
             hidden_size: int, 
             output_size: int,
             activation_generator: torch.nn.Module = torch.nn.ReLU(),
             activation_discriminator: torch.nn.Module = torch.nn.LeakyReLU(0.2)
            ):
    """Construct a simple MLP generative adversarial network"""
    super().__init__()`

I was curious about why you used ReLu as the activation function for the generator and LeakyReLu on the discriminator. Is there some detail there that it might be interesting to highlight?

When I got to the example where you show how you can get a "wacky" output by picking an out-of-distribution input z for the generator, I don't get what looks like a wacky output. I tried some other random out-of-distribution values and they also resulted in not-wacky generations. So we need to think about that example.

I got an error when trying to generate the interpolated MNIST digits. Now....maybe that's my fault? I changed the GAN definition a tiny bit...but everything else ran for the MNIST examples with my changed definition so I don't THINK I broke things...but still....here's an image of the error file (see attached image)

Screen Shot 2022-02-22 at 5 49 49 AM

.

bryan-pardo commented 2 years ago

Accidentally closed the issue. This next bit is about the DCGan example:

I didn't understand this docstring: Unlike our MLP generator, we'll interpret our latent inputs as having three (non-batch) dimensions: channels, height, and width. Our latents will start with only a nontrivial channel dimension, and we will gradually reshape them to "spread out" over the height and width dimensions using transposed convolutions. What is meant by 'will start with only a nontrivial channel dimension'? I don't get what is intended.

In the initialization of the DCGan, I don't know why the weight means and standard deviations were chosen. Is the reason for these in the paper? Could some explanation be added for those choices? I'm talking about this code... if name.find('Conv') != -1: nn.init.normal_(m.weight.data, 0.0, 0.02)

elif name.find('BatchNorm') != -1: nn.init.normal_(m.weight.data, 1.0, 0.02) nn.init.constant_(m.bias.data, 0)

bryan-pardo commented 2 years ago

I wonder if the DC gan stuff could become the basis of a homework? Then maybe it isn't in the tutorial notebook....

oreillyp commented 2 years ago

Ok, I've pushed an update addressing everything through the FFMPEG issue (which I resolved by switching from .gif to .mp4). Regarding the DCGAN portion of the notebook, I agree that this could be homework rather than tutorial material -- the notebook is already fairly long. This would allow for things like weight initialization (as you mentioned above) and mode collapse to be the basis for homework questions. To this end I've temporarily removed the DCGAN sections of the notebook, under the assumption I'll be shaping it into homework questions at some point soon.

interactiveaudiolab commented 2 years ago

Awesome! Thanks for your work on this.