keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.9k stars 19.45k forks source link

Possible bug in Autoencoder class #386

Closed turambar closed 7 years ago

turambar commented 9 years ago

Hi Francois, et al., I'm a PhD student in CS/machine learning at USC. We started our DL odyssey using Theano directly and just discovered keras, which has thus far been great, especially for rapidly trying a lot of different models. Thanks for the great work!

I did find what I believe to be a bug in the Autoencoder class, in its handling of tied weights. I noticed after training an autoencoder with tied weights set to true, the actual encoder and decoder weights were NOT identical (which means they WEREN'T actually tied during training). This is because the only place the tie_weights option is used is inside of get_output -- i.e., it just returns the encoder weights as the decoder weights.

Unless I am mistaken, what you want is for the decoder's Theano symbolic variable parameter (e.g., W for a Dense layer) to be the transpose of the encoder's parameter. For example, see the dA class in the Theano tutorial.

I can fix this, but I have two questions:

Another option would be a to make a class method that takes a (generic) layer and creates a decoder version of that layer with tied weights. With that, one could perhaps build the decoder automatically within the Autoencoder constructor.

Thoughts?

P.S. I've also implemented some Noise layers for use with denoising autoencoders, if those are of interest.

fchollet commented 9 years ago

Do you have a preferred procedure for accepting fixes? I.e., just fork keras, make the changes, send a pull request?

Yes. Your PR will be reviewed, you will get feedback, and when everything looks good it will be merged.

Unless I am mistaken, what you want is for the decoder's Theano symbolic variable parameter (e.g., W for a Dense layer) to be the transpose of the encoder's parameter. For example, see the dA class in the Theano tutorial.

It's not so simple, because encoders/decoders are arbitrary stacks of layers.

Another option would be a to make a class method that takes a (generic) layer and creates a decoder version of that layer with tied weights. With that, one could perhaps build the decoder automatically within the Autoencoder constructor.

How would that work?

turambar commented 9 years ago

Yes. Your PR will be reviewed, you will get feedback, and when everything looks good it will be merged.

OK, I can do that.

Unless I am mistaken, what you want is for the decoder's Theano symbolic variable parameter (e.g., W for a Dense layer) to be the transpose of the encoder's parameter. For example, see the dA class in the Theano tutorial.

It's not so simple, because encoders/decoders are arbitrary stacks of layers.

True, but I think the special case of a single layer denoising autoencoder [1] is important enough to warrant a proper implementation in the keras ecosystem (even if it's just an example and not an actual layer). This would permit comparison with the Theano tutorial [2], and it is a common tool for classic unsupervised pretraining [3]. Regarding HOW to do it...

a class method that takes a (generic) layer and creates a decoder version of that layer with tied weights.

How would that work?

I have some ideas. Let me flesh them out and then re-post them in this thread, for your feedback.

  1. P. Vincent, H. Larochelle Y. Bengio and P.A. Manzagol. Extracting and Composing Robust Features with Denoising Autoencoders. ICML 2008.
  2. http://deeplearning.net/tutorial/dA.html#da
  3. P. Vincent, et al. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. JMLR 2010.
fchollet commented 9 years ago

After review, I decided to remove weight tying for now. You can reintroduce it in a cleaner form in a future PR.

jramapuram commented 9 years ago

Even if the enc_param is set to the dec_param after the first initial pass everything is in essence the same. But yes, a simple swap of enc_param <--> dec_param in the swapping logic will solve your 1 timestep lag. You do not generally tie biases. The way I implemented weight tying was using:

if len(dec_param.shape) > 1:

This ensures that the bias vectors are NOT tied. I did it in this manner because layer.get_weights() does not return only the weights. This is erroneous and needs to be refactored to .get_weights() & .get_bias()

There is also 1 more issue: since you can now have an encoder be a sequential layer we will need some way of verifying which the actual 'encoder' and actual 'decoder' is. I.e. an encoder can be: Dense --> (some-other-layer) and a decoder can be just Dense. We want to enforce that only the Dense encoder layer's weights are swapped with the Dense decoder's weights.

jamona commented 8 years ago

What's the workaround here? Without tied_weights I get the following error if I try the "unofficial" SAE-example from you: Layer is not connected and is not an input layer.

isaacgerg commented 8 years ago

I see this bug is still open. I would like to contribute to Keras and feel this would be a good place for me to jump in. Can someone give me a hand on getting started?

niraabe commented 8 years ago

Hi, what is the current state with the tie_weights parameter ? Is the bug fix reviewed and released ?

isaacgerg commented 8 years ago

It doesnt support the tie_weights parameter. As far as I have researched, this is not being implemented. I believe you don't have to the tie the weights to get reasonable results. In fact, I think one of the reasons tie_weights was removed was because @fchollet didnt want to propagate that method because he felt it was unnecessary. @fchollet, can you comment?

On Wed, Feb 24, 2016 at 10:41 AM, niraabe notifications@github.com wrote:

Hi, what is the current state with the tie_weights parameter ? Is the bug fix reviewed and released ?

— Reply to this email directly or view it on GitHub https://github.com/fchollet/keras/issues/386#issuecomment-188310253.

strangeice commented 8 years ago

I tried this tutorial https://blog.keras.io/building-autoencoders-in-keras.html without the tied weights. But the result is not great (at least given the same effort of training compared with tied weights in theano dA). This means I have to translate theano dA class into keras convention in order stay within keras?

Here is the results without tied weights figure_1

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.