not working with current Theano? #2

varoudis commented 7 years ago

Any ideas?

 0%|          | 0/10 [00:00<?, ?it/s]<<!! BUG IN FGRAPH.REPLACE OR A LISTENER !!>> <class 'TypeError'> ('The type of the replacement must be compatible with the type of the original Variable.', GpuAllocEmpty.0, GpuAllocEmpty.0, CudaNdarrayType(float32, (False, True, False, False)), CudaNdarrayType(float32, 4D), 'MergeOptimizer') MergeOptimizer
ERROR (theano.gof.opt): SeqOptimizer apply MergeOptimizer
ERROR (theano.gof.opt): Traceback:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "/opt/conda/lib/python3.5/site-packages/theano/gof/opt.py", line 235, in apply
    sub_prof = optimizer.optimize(graph)

  File "/opt/conda/lib/python3.5/site-packages/theano/gof/opt.py", line 90, in optimise
    ret = self.apply(fgraph, *args, **kwargs)
  File "/opt/conda/lib/python3.5/site-packages/theano/gof/opt.py", line 855, in apply
    fgraph.replace_all_validate(pairs, 'MergeOptimizer')
  File "/opt/conda/lib/python3.5/site-packages/theano/gof/toolbox.py", line 339, in replace_all_validate
    fgraph.replace(r, new_r, reason=reason, verbose=False)
  File "/opt/conda/lib/python3.5/site-packages/theano/gof/fg.py", line 504, in replace
TypeError: ('The type of the replacement must be compatible with the type of the original Variable.', GpuAllocEmpty.0, GpuAllocEmpty.0, CudaNdarrayType(float32, (False, True, False, False)), CudaNdarrayType(float32, 4D), 'MergeOptimizer')

ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    865             outputs =\
--> 866                 self.fn() if output_subset is None else\
    867                 self.fn(output_subset=output_subset)

/opt/conda/lib/python3.5/site-packages/theano/gof/op.py in rval(p, i, o, n)
    907             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 908                 r = p(n, [x[0] for x in i], o)
    909                 for o in node.outputs:

/opt/conda/lib/python3.5/site-packages/theano/compile/ops.py in perform(self, node, inp, out_)
    707                                  ' supposed to be 1 (got %s instead)' %
--> 708                                  (axis, x.shape[axis]))
    709         out[0] = x

ValueError: Dimension 3 in Rebroadcast's input was supposed to be 1 (got 28 instead)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-14-39dde9206136> in <module>()
----> 1 train_for_n(nb_epoch=10, plt_frq=25,BATCH_SIZE=128)

<ipython-input-13-3dd7ba09658b> in train_for_n(nb_epoch, plt_frq, BATCH_SIZE)
     25         #make_trainable(discriminator,False)
---> 26         g_loss = GAN.train_on_batch(noise_tr, y2 )
     27         losses["g"].append(g_loss)

/opt/conda/lib/python3.5/site-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
   1217             ins = x + y + sample_weights
   1218         self._make_train_function()
-> 1219         outputs = self.train_function(ins)
   1220         if len(outputs) == 1:
   1221             return outputs[0]

/opt/conda/lib/python3.5/site-packages/keras/backend/theano_backend.py in __call__(self, inputs)
    715     def __call__(self, inputs):
    716         assert type(inputs) in {list, tuple}
--> 717         return self.function(*inputs)

/opt/conda/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    877                     node=self.fn.nodes[self.fn.position_of_error],
    878                     thunk=thunk,
--> 879                     storage_map=getattr(self.fn, 'storage_map', None))
    880             else:
    881                 # old-style linkers raise their own exceptions

/opt/conda/lib/python3.5/site-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
    323         # extra long error message in that case.
    324         pass
--> 325     reraise(exc_type, exc_value, exc_trace)

/opt/conda/lib/python3.5/site-packages/six.py in reraise(tp, value, tb)
    683             value = tp()
    684         if value.__traceback__ is not tb:
--> 685             raise value.with_traceback(tb)
    686         raise value

/opt/conda/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    864         try:
    865             outputs =\
--> 866                 self.fn() if output_subset is None else\
    867                 self.fn(output_subset=output_subset)
    868         except Exception:

/opt/conda/lib/python3.5/site-packages/theano/gof/op.py in rval(p, i, o, n)
    906             # default arguments are stored in the closure of `real`
    907             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 908                 r = p(n, [x[0] for x in i], o)
    909                 for o in node.outputs:
    910                     compute_map[o][0] = True

/opt/conda/lib/python3.5/site-packages/theano/compile/ops.py in perform(self, node, inp, out_)
    706                 raise ValueError('Dimension %s in Rebroadcast\'s input was'
    707                                  ' supposed to be 1 (got %s instead)' %
--> 708                                  (axis, x.shape[axis]))
    709         out[0] = x

ValueError: Dimension 3 in Rebroadcast's input was supposed to be 1 (got 28 instead)
Apply node that caused the error: Rebroadcast{?,?,?,1}(GpuContiguous.0)
Toposort index: 127
Inputs types: [CudaNdarrayType(float32, (True, True, True, False))]
Inputs shapes: [(1, 1, 1, 28)]
Inputs strides: [(0, 0, 0, 1)]
Inputs values: ['not shown']
Outputs clients: [[GpuElemwise{true_div,no_inplace}(Rebroadcast{?,?,?,1}.0, GpuElemwise{sqrt,no_inplace}.0), GpuElemwise{Composite{((i0 * i1 * i2) / i3)}}[(0, 1)](CudaNdarrayConstant{[[[[-0.5]]]]}, GpuDimShuffle{x,0,x,x}.0, Rebroadcast{?,?,?,1}.0, GpuElemwise{mul,no_inplace}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1270, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1270, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1270, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1104, in access_term_cache
    input_grads = node.op.grad(inputs, new_output_grads)
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1270, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1270, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 964, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1270, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/opt/conda/lib/python3.5/site-packages/theano/gradient.py", line 1104, in access_term_cache
    input_grads = node.op.grad(inputs, new_output_grads)

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
dolaameng commented 7 years ago

@varoudis : I came across the same error with the latest Theano. Removing all the BatchNormalization layers made it compile. However I was not able to achieve the same results - the error of GAN didn't converge. Have you been able to regenerate the results?

osh commented 7 years ago

are you running on a CPU or GPU device under theano?

dolaameng commented 7 years ago

it's theano on GPU. It seems that the error was raised when there is a BatchNormalization after a Conv2D. So I disable the BatchNormalization, does that affect the performance. To be frank I have only run 2000 iterations, but GAN error never goes down below 2.

osh commented 7 years ago

yes, batchnorm is pretty critical to making GANs work afaik - not sure why its breaking on the latest, I'll look into it when I get a chance

dolaameng commented 7 years ago

@osh : I dig a little bit. The error might be caused by PR #3529 at Keras. Using an explicit parameter axis=1 in H = BatchNormalization(mode=2, axis=1)(H) seems to fix it. The default value of axis is -1.

Another quick question: it seems that you have commented off line 185 and 194. What's the rational behind it? Does it affect the performance? thanks!

osh commented 7 years ago

Ah great, will have to make that change unless you want to submit a PR? I believe make_trainable() only matters at model.compile() time, thus doing it while looping through training was useless and was commented (should be deleted I suppose). Thus it shouldn't change anything -

dolaameng commented 7 years ago

Yes please go ahead to make the change.

On the trainable setting, it seems that there has been some changes - the trainable takes affect at running time (1st calling fit) now - or I misunderstood it?

I also found the trainable part confusing and have raised an issue. Hopefully someone can answer it.

Thanks for the great example.

osh commented 7 years ago

If the behavior has changed we should put them back in for sure

osh commented 7 years ago

just checked in update to MNIST_CNN_GAN_v2.ipynb which works with the latest Keras/TF.