Closed pvnick closed 7 years ago
The error message is clear, but it would theoretically be impossible for the internal scan loop of recurrent layers to return anything other than Theano floatX. So the cause is mysterious. Also as far as I can tell everything is looking fine on EC2 GPUs, so the problem might be specific to your GPU architecture (who knows).
You can try the following: cast all return values of the _step
functions in keras.layers.recurrent
to floatX, with:
val = T.cast(val, theano.config.floatX)
It's a trivial change. Let us know if it fixes your problem. It's worth trying.
That is weird by the way. Let us know if you solve it somehow.
Hmm, that seems to fix many of the issues, but the following tests are still failing:
auto/test_shape_inference.py FFFFFFFFFFFFFFF auto/test_tasks.py ...... auto/keras/test_activations.py FFF. auto/keras/test_constraints.py ..... auto/keras/test_normalization.py ....... auto/keras/layers/test_convolutional.py FF..... auto/keras/layers/test_core.py ................FFF
With error messages that are very similar:
E TypeError: ('Bad input argument to theano function with name "/home/paul/keras/tests/auto/test_shape_inference.py:16" at index 0(0-based)', 'TensorType(float32, 3D) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function".',
E NotImplementedError: The image and the kernel must have the same type.inputs(float64), kerns(float32) ../../ipynb/local/lib/python2.7/site-packages/theano/tensor/nnet/conv.py:646: NotImplementedError
you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function".'
Very helpful, but all functions manipulated in Keras specify allow_input_downcast=True
(see keras.models
), for this very reason. So that's why this issue really isn't supposed to be happening.
Try specifying floatX=float32
via command line. It's possible your .theanorc
isn't being picked up.
Very helpful, but all functions manipulated in Keras
That's true in the Keras codebase itself, but we may be instantiating custom functions in the tests. Mind checking if the failures can be linked to custom Theano functions in the tests?
At first glance that seems to be true for at least some of the failing tests, maybe all. In that case a fix would be to add allow_input_downcast=True
every time a Theano function gets instantiated in a test.
I tried setting the floatx parameter on the command line, downgrading both keras and theano to each of their week-old code bases, and restarting the server on which things are running. None seemed to fix the issue :-/
For the unit tests failures, that is. The recurrent network seems to be working for normal usage.
Btw, I installed everything in a virtual environment, not globally.
I am pretty sure the tests can be fixed by adding allow_input_downcast=True
every time a Theano function gets instantiated in a test, as per my comment above. Try that.
This does indeed appear to fix the issue. Would you like me to submit a pull request for the changes?
All of the tests pass on CPU mode, but many tests fail when using GPU mode, and with my own network (e.g. any of the scripts in the examples folder) the loss immediately goes to NaN. My GPU is the GeForce GTX 980 Ti. I am using the most recent code in the theano and keras repositories as of this morning.
Relevant test results: auto/test_shape_inference.py FFFFFFFFFFFFFFF auto/test_tasks.py ...... auto/keras/test_activations.py FFF. auto/keras/test_constraints.py ..... auto/keras/test_normalization.py ....... auto/keras/layers/test_convolutional.py FF..... auto/keras/layers/test_core.py ................FFF auto/keras/layers/test_recurrent.py FFFFFFF
Most failures seem to be associated with the following error message:
Here is my .theanorc file:
Running deviceQuery in the cuda samples folder shows that the test passed.
Not sure where to go from this point :(