wellecks / vaes

Variational Autoencoders & Normalizing Flows Project
19 stars 10 forks source link

Refactored #1

Closed wellecks closed 7 years ago

wellecks commented 7 years ago

Creating this so I can add comments below

isaachenrion commented 7 years ago

How do we resolve the conflict? Should I try to integrate your Importance Sampling stuff into my refactoring? Not sure how all this works...

wellecks commented 7 years ago

I can take care of the merge conflict - you'd have to merge it manually w/ the command line and resolve the conflicts

isaachenrion commented 7 years ago

ok i'll leave it to you

wellecks commented 7 years ago

Also, this comment got collapsed above:

(_get_made_mask in utils.py) Did you check whether this is equivalent to the old code? I don't see the output mask (MV in the old code), which I think is necessary (equation 9 from MADE paper)

isaachenrion commented 7 years ago

it's not equivalent but i don't think the output mask is necessary (or correct).

The way I see it, equation 9 of the MADE paper is for the situation where we are modeling the whole autoencoder x --> z --> x_rec, with a single layer of parameters in each step. So there is a hidden mask and an output mask.

This is different to our situation. Here we are using MADE to map z0 --> z1 --> ... --> zk, with a layer of parameters in each step. It doesn't relate to the mapping x --> z0, nor to zk --> x_rec. So we need a mask for each layer of IAF. I created something (in neural_networks.py) called made_layer to do just that: given input, it applies a masked weight matrix so that output k only depends on the first k-1 inputs. However I am really unsure about all this, and like I said before, my one isn't working.

The original IAF is maintained in vae_iaf.py, and I haven't touched it.

wellecks commented 7 years ago

I see. I was interpreting MADE as a function f(x) -> x', and in our case we are applying it to z, i.e. f(z0) -> zk. The function happens to be an autoencoder, and the masks that it uses internally ensure that the function satisfies the autoregressive property.

But yeah I'm also not 100%, and I can see your point about it being odd to use the full reconstruction process rather than just the mapping from input to hidden layer

wellecks commented 7 years ago

Anyways, awesome work on the refactor! I'll merge in the changes