tdeboissiere / DeepLearningImplementations

Implementation of recent Deep Learning papers
MIT License
1.81k stars 650 forks source link

Couple of bugs - referring to different keras versions? #23

Closed engharat closed 7 years ago

engharat commented 7 years ago

Hi, I had to fix a couple of things in the code , I presume keras latest version changed some variable definitions, but I haven't tried with older keras versions so I'm not sure about that. Anyway, in the Wasserstein gan code: in utiles/data_utils.py the beta1 parameter to the Adam solver needs to be changed to beta_1 in /model/models_WGAN.py happens an error in conv2d_init function caused by dim_ordering variable. putting dim_ordering=None seems to fix it, so I suggest: def conv2D_init(shape, name=None,dim_ordering=None): return initializations.normal(shape, scale=0.02, name=name)

tdeboissiere commented 7 years ago

You're right about that ! My plan is to wait for keras 2 which should be released shortly. The API will hardly change from there so I'll refactor all the code

engharat commented 7 years ago

I refactored my actual implementation, heavily based on yours but on a jupyter notebook. Anyway, I encountered a very big problem: I never managed to get nice output with batchnorm mode 0 and 1. I get output image varying with the input noise only when using batchnorm mode = 2. (applying batch statistic both on training and on testing, so no moving average) The big problem is there: keras 2.0 removed batchnorm mode 1 and 2! I don't see any viable option of using keras 2.0 on our GAN. Do you encounter the same problem with batchnorm mode 0 and 1?

tdeboissiere commented 7 years ago

I'm mostly using pure tensorflow at the moment so that I have better control over the whole pipeline. Anyway, there still seems to be a couple of bumps on the road to keras 2 (running my benchmarks, it seems that the new NCHW data format for tensorflow is not running as expected) so I'll wait a bit more before updating the keras script.

I've seen your issue on keras repo so we may also get fchollet's feedback.

engharat commented 7 years ago

I adjusted the small variations needed to run your code on keras 2.0, but still remains the big issue: both on keras 1.2 and on keras 2.0, the Wasserstein GAN ( and I suppose the traditional GAN too) doesn't work when using batchnorm mode=0 on the generator. I updated the issue on keras repo but looking at the amount of bugs accompanying the new keras version I do not expect a fast solution. I understand your point of view about tensorflow better control over the whole pipeline, still it will be a pity if you will not update anymore your "recent papers implementations" on keras, because I found it a very valuable resource of code on the easiest Deep Learning framework - most of tensorflow unexperienced researchers, like me, would lose a LOT of time on the question "how to code it" instead of "what experiment should I run now" ;) Anyway, would you mind reporting your finding about GANs and batchnorm mode if you will play with them on tensorflow? I'm curious to see if the batchnorm mode problem is present only on keras or on tensorflow too.

tdeboissiere commented 7 years ago

As far as I can tell, the only difference between mode 0 and mode 2 should be at inference time not training time and thus there should not be that many differences during training.

I'll try to re-run a WGAN experiment with keras 2 over the week end.

I also plan on uploading some pure TF code at some point.

engharat commented 7 years ago

The problem is not on "training" itself - maybe I have badly explained the issue. The problem happens when, during the training, we plot some of the images from the actual generator, like your code do on the file "current_batch.png"; in this istant we are at inference time, and so there is the different behaviour between mode 0 and mode 2. So the training is going nicely with both modes, but we cannot get proper output from the generator at inference time with mode 0 (both on keras 1.2 and 2.0, still unknown what happens on tf / torch / pytorch / etc)

tdeboissiere commented 7 years ago

Gotcha.

As far as tf is concerned, you pass keywords to tf.contrib.layers.batch_norm to specify whether you're training (i.e. accumulate mean and std dev statistics) or inferring (use the accumulated statistics instead of the current batch statistics).

I had a look at a full porting to keras 2 and ran into what looks like a nasty BN bug: the discriminator graph should be independent from the generator graph but when training, tensorflow asks for a value for noise_input which belongs to the G graph. This error only occurs when one includes BN layers in the discriminator.

Did you have a similar issue and if yes how did you solve it ?

mznyc commented 6 years ago

Again, any update for this issue? My WGAN generator basically spits put uniform noises and training simply collapsed.

engharat commented 6 years ago

see https://github.com/fchollet/keras/issues/5892