osh / KerasGAN

A couple of simple GANs in Keras
501 stars 177 forks source link

not working with current Theano? #2

Closed varoudis closed 7 years ago

varoudis commented 7 years ago

Any ideas?

 0%|          | 0/10 [00:00<?, ?it/s]<<!! BUG IN FGRAPH.REPLACE OR A LISTENER !!>> <class 'TypeError'> ('The type of the replacement must be compatible with the type of the original Variable.', GpuAllocEmpty.0, GpuAllocEmpty.0, CudaNdarrayType(float32, (False, True, False, False)), CudaNdarrayType(float32, 4D), 'MergeOptimizer') MergeOptimizer
ERROR (theano.gof.opt): SeqOptimizer apply MergeOptimizer
ERROR (theano.gof.opt): Traceback:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "/opt/conda/lib/python3.5/site-packages/theano/gof/opt.py", line 235, in apply
    sub_prof = optimizer.optimize(graph)

  File "/opt/conda/lib/python3.5/site-packages/theano/gof/opt.py", line 90, in optimise
    ret = self.apply(fgraph, *args, **kwargs)
  File "/opt/conda/lib/python3.5/site-packages/theano/gof/opt.py", line 855, in apply
    fgraph.replace_all_validate(pairs, 'MergeOptimizer')
  File "/opt/conda/lib/python3.5/site-packages/theano/gof/toolbox.py", line 339, in replace_all_validate
    fgraph.replace(r, new_r, reason=reason, verbose=False)
  File "/opt/conda/lib/python3.5/site-packages/theano/gof/fg.py", line 504, in replace
    str(reason))
TypeError: ('The type of the replacement must be compatible with the type of the original Variable.', GpuAllocEmpty.0, GpuAllocEmpty.0, CudaNdarrayType(float32, (False, True, False, False)), CudaNdarrayType(float32, 4D), 'MergeOptimizer')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    865             outputs =\
--> 866                 self.fn() if output_subset is None else\
    867                 self.fn(output_subset=output_subset)

/opt/conda/lib/python3.5/site-packages/theano/gof/op.py in rval(p, i, o, n)
    907             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 908                 r = p(n, [x[0] for x in i], o)
    909                 for o in node.outputs:

/opt/conda/lib/python3.5/site-packages/theano/compile/ops.py in perform(self, node, inp, out_)
    707                                  ' supposed to be 1 (got %s instead)' %
--> 708                                  (axis, x.shape[axis]))
    709         out[0] = x

ValueError: Dimension 3 in Rebroadcast's input was supposed to be 1 (got 28 instead)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-14-39dde9206136> in <module>()
----> 1 train_for_n(nb_epoch=10, plt_frq=25,BATCH_SIZE=128)

<ipython-input-13-3dd7ba09658b> in train_for_n(nb_epoch, plt_frq, BATCH_SIZE)
     24 
     25         #make_trainable(discriminator,False)
---> 26         g_loss = GAN.train_on_batch(noise_tr, y2 )
     27         losses["g"].append(g_loss)
     28 

/opt/conda/lib/python3.5/site-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
   1217             ins = x + y + sample_weights
   1218         self._make_train_function()
-> 1219         outputs = self.train_function(ins)
   1220         if len(outputs) == 1:
   1221             return outputs[0]

/opt/conda/lib/python3.5/site-packages/keras/backend/theano_backend.py in __call__(self, inputs)
    715     def __call__(self, inputs):
    716         assert type(inputs) in {list, tuple}
--> 717         return self.function(*inputs)
    718 
    719 

/opt/conda/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    877                     node=self.fn.nodes[self.fn.position_of_error],
    878                     thunk=thunk,
--> 879                     storage_map=getattr(self.fn, 'storage_map', None))
    880             else:
    881                 # old-style linkers raise their own exceptions

/opt/conda/lib/python3.5/site-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
    323         # extra long error message in that case.
    324         pass
--> 325     reraise(exc_type, exc_value, exc_trace)
    326 
    327 

/opt/conda/lib/python3.5/site-packages/six.py in reraise(tp, value, tb)
    683             value = tp()
    684         if value.__traceback__ is not tb:
--> 685             raise value.with_traceback(tb)
    686         raise value
    687 

/opt/conda/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    864         try:
    865             outputs =\
--> 866                 self.fn() if output_subset is None else\
    867                 self.fn(output_subset=output_subset)
    868         except Exception:

/opt/conda/lib/python3.5/site-packages/theano/gof/op.py in rval(p, i, o, n)
    906             # default arguments are stored in the closure of `real`
    907             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 908                 r = p(n, [x[0] for x in i], o)
    909                 for o in node.outputs:
    910                     compute_map[o][0] = True

/opt/conda/lib/python3.5/site-packages/theano/compile/ops.py in perform(self, node, inp, out_)
    706                 raise ValueError('Dimension %s in Rebroadcast\'s input was'
    707                                  ' supposed to be 1 (got %s instead)' %
--> 708                                  (axis, x.shape[axis]))
    709         out[0] = x
    710 

ValueError: Dimension 3 in Rebroadcast's input was supposed to be 1 (got 28 instead)
Apply node that caused the error: Rebroadcast{?,?,?,1}(GpuContiguous.0)
Toposort index: 127
Inputs types: [CudaNdarrayType(float32, (True, True, True, False))]
Inputs shapes: [(1, 1, 1, 28)]
Inputs strides: [(0, 0, 0, 1)]
Inputs values: ['not shown']
Outputs clients: [[GpuElemwise{true_div,no_inplace}(Rebroadcast{?,?,?,1}.0, GpuElemwise{sqrt,no_inplace}.0), GpuElemwise{Composite{((i0 * i1 * i2) / i3)}}[(0, 1)](CudaNdarrayConstant{[[[[-0.5]]]]}, GpuDimShuffle{x,0,x,x}.0, Rebroadcast{?,?,?,1}.0, GpuElemwise{mul,no_inplace}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1270, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1270, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1270, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1104, in access_term_cache
    input_grads = node.op.grad(inputs, new_output_grads)
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1270, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1270, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1270, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1104, in access_term_cache
    input_grads = node.op.grad(inputs, new_output_grads)

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
dolaameng commented 7 years ago

@varoudis : I came across the same error with the latest Theano. Removing all the BatchNormalization layers made it compile. However I was not able to achieve the same results - the error of GAN didn't converge. Have you been able to regenerate the results?

osh commented 7 years ago

are you running on a CPU or GPU device under theano?

dolaameng commented 7 years ago

it's theano on GPU. It seems that the error was raised when there is a BatchNormalization after a Conv2D. So I disable the BatchNormalization, does that affect the performance. To be frank I have only run 2000 iterations, but GAN error never goes down below 2.

osh commented 7 years ago

yes, batchnorm is pretty critical to making GANs work afaik - not sure why its breaking on the latest, I'll look into it when I get a chance

dolaameng commented 7 years ago

@osh : I dig a little bit. The error might be caused by PR #3529 at Keras. Using an explicit parameter axis=1 in H = BatchNormalization(mode=2, axis=1)(H) seems to fix it. The default value of axis is -1.

Another quick question: it seems that you have commented off line 185 and 194. What's the rational behind it? Does it affect the performance? thanks!

osh commented 7 years ago

Ah great, will have to make that change unless you want to submit a PR? I believe make_trainable() only matters at model.compile() time, thus doing it while looping through training was useless and was commented (should be deleted I suppose). Thus it shouldn't change anything -

dolaameng commented 7 years ago

Yes please go ahead to make the change.

On the trainable setting, it seems that there has been some changes - the trainable takes affect at running time (1st calling fit) now - or I misunderstood it?

I also found the trainable part confusing and have raised an issue. Hopefully someone can answer it.

Thanks for the great example.

osh commented 7 years ago

If the behavior has changed we should put them back in for sure

osh commented 7 years ago

just checked in update to MNIST_CNN_GAN_v2.ipynb which works with the latest Keras/TF.