Closed fmfn closed 8 years ago
I'm getting similar errors, running a smaller, simpler model using version 1.0.7 on a CPU (Linux). LSTMs with fewer than 15 output nodes seem to train fine. 16+ nodes gives an alloc error similar to yours, but running theano.config.mode='NanGuardMode' says that some Infs are popping up.
EDIT: Keras 1.0.7
I noticed something similar. Training the same model as above with 1.0.8 on a GPU failed with maxlen = 140, however it worked with maxlen = 120
Do you have a reproducible code snippet? I've haven't noticed anything weird with large RNNs on CPU.
On 31 August 2016 at 07:46, Richard Tanburn notifications@github.com wrote:
I'm getting similar errors, running a smaller, simpler model using version 1.0.8 on a CPU (Linux). LSTMs with fewer than 15 output nodes seem to train fine. 16+ nodes gives an alloc error similar to yours, but running theano.config.mode='NanGuardMode' says that some Infs are popping up.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/3637#issuecomment-243787744, or mute the thread https://github.com/notifications/unsubscribe-auth/AArWbzWj8W88PV9KIlDhwhtESJdWWcfnks5qlZPEgaJpZM4JxSZe .
The following runs on 1.0.7 and fails (with the above traceback) on 1.0.8
import numpy as np
from keras.layers import LSTM, Dense, RepeatVector
from keras.layers import TimeDistributed, Masking#, Bidirectional
from keras.models import Sequential
from keras.optimizers import Nadam
from keras.objectives import categorical_crossentropy
from keras import backend as K
def sparse_char_softmax(y_true, y_pred):
steps_loss = [
categorical_crossentropy(y_true[:, i, :], y_pred[:, i, :])
for i in range(10)
]
return K.sum(steps_loss) / \
(10 + 256)
def model_loader(maxlen, max_features, lstm_size):
encoder = Sequential(name="encoder")
encoder.add(
Masking(
input_shape=(maxlen, max_features),
mask_value=0,
)
)
encoder.add(
LSTM(
output_dim=lstm_size,
return_sequences=False,
go_backwards=False,
name='encoder_rnn_0'
)
)
model = Sequential(name='char-auto-encoder')
model.add(encoder)
# Context
model.add(
RepeatVector(n=maxlen,
name='context_vector_repeat')
)
model.add(
LSTM(
output_dim=lstm_size,
return_sequences=True,
go_backwards=False,
name='decoder_rnn_0'
)
)
model.add(
TimeDistributed(
Dense(
output_dim=max_features,
activation='softmax',
name='distribution_over_tokens'
),
)
)
return model, encoder
if __name__ == "__main__":
X = np.zeros((256, 10, 80), dtype=bool)
for row in X:
for col in row:
col[np.random.randint(0, 80)] += 1
model, encoder = model_loader(10, 80, 25)
model.summary()
encoder.compile('sgd', 'mse')
model.compile(
loss=sparse_char_softmax,
optimizer=Nadam(lr=0.001, clipnorm=2.0),
)
h = model.fit(
X, X,
nb_epoch=1,
verbose=1,
batch_size=256,
validation_data=[X, X],
)
Your loss function should not work (K.sum must be called on a tensor, not a list). If I replace it with MSE your script runs fine with both Theano and TF.
Thanks! I figured it had to be the loss function, despite it working ok in prior releases.
Use sum
instead (i.e. Python sum operator).
Below is what I get when compiling with mse loss. Btw, the snippet above works (with sparse_char_softmax
loss and keras 1.0.8) in a ubuntu 14.04, CUDA8rc, python 2.7, gtx 1080 setup.
Epoch 1/1
Traceback (most recent call last):
File "/Users/<me>/venvs3/general/lib/python3.5/site-packages/theano/compile/function_module.py", line 859, in __call__
outputs = self.fn()
MemoryError: alloc failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "keras_bug.py", line 87, in <module>
validation_data=[X, X],
File "/Users/<me>/venvs3/general/lib/python3.5/site-packages/keras/models.py", line 620, in fit
sample_weight=sample_weight)
File "/Users/<me>/venvs3/general/lib/python3.5/site-packages/keras/engine/training.py", line 1104, in fit
callback_metrics=callback_metrics)
File "/Users/<me>/venvs3/general/lib/python3.5/site-packages/keras/engine/training.py", line 822, in _fit_loop
outs = f(ins_batch)
File "/Users/<me>/venvs3/general/lib/python3.5/site-packages/keras/backend/theano_backend.py", line 672, in __call__
return self.function(*inputs)
File "/Users/<me>/venvs3/general/lib/python3.5/site-packages/theano/compile/function_module.py", line 871, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "/Users/<me>/venvs3/general/lib/python3.5/site-packages/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/Users/<me>/venvs3/general/lib/python3.5/site-packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/Users/<me>/venvs3/general/lib/python3.5/site-packages/theano/compile/function_module.py", line 859, in __call__
outputs = self.fn()
MemoryError: alloc failed
Apply node that caused the error: AllocEmpty{dtype='float32'}(TensorConstant{11}, Elemwise{Composite{Switch(EQ(i0, i1), ((i2 * i0) // (i3 * i0)), i0)}}.0, TensorConstant{25})
Toposort index: 190
Inputs types: [TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar)]
Inputs shapes: [(), (), ()]
Inputs strides: [(), (), ()]
Inputs values: [array(11), array(-10667), array(25)]
Outputs clients: [[IncSubtensor{InplaceSet;:int64:}(AllocEmpty{dtype='float32'}.0, Rebroadcast{0}.0, Constant{1})]]
Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "keras_bug.py", line 74, in <module>
model, encoder = model_loader(10, 80, 25)
File "keras_bug.py", line 50, in model_loader
name='decoder_rnn_0'
File "/Users/<me>/venvs3/general/lib/python3.5/site-packages/keras/models.py", line 308, in add
output_tensor = layer(self.outputs[0])
File "/Users/<me>/venvs3/general/lib/python3.5/site-packages/keras/engine/topology.py", line 515, in __call__
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/Users/<me>/venvs3/general/lib/python3.5/site-packages/keras/engine/topology.py", line 573, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/Users/<me>/venvs3/general/lib/python3.5/site-packages/keras/engine/topology.py", line 150, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/Users/<me>/venvs3/general/lib/python3.5/site-packages/keras/layers/recurrent.py", line 213, in call
input_length=input_shape[1])
File "/Users/<me>/venvs3/general/lib/python3.5/site-packages/keras/backend/theano_backend.py", line 842, in rnn
go_backwards=go_backwards)
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
Did you try updating Keras to the master version. In theory Theano RNNs should be strictly identical between 1.0.7 and master.
Updated with the master version and it worked. Thanks!
I get the following when trying to train a model (on a CPU) after upgrading to 1.0.8. Interestingly it works if I downgrade to 1.0.7. Perhaps even more surprising is that it works (with 1.0.8) on a ubuntu-GPU setup.
The model is: