Closed seven7e closed 7 years ago
I encountered the same issue as nanoix9. Although it is still possible for me to train a model using techniques like memory mapping, I prefer to load my entire dataset in memory. Is it possible to use Theano's sparse matrix representation?
store your data to hdf5 and try this class https://github.com/fchollet/keras/blob/master/keras/utils/io_utils.py#L7-L52
@EderSantana Will using that class (could some documentation be added?) use sparse matrices on the GPU? Some of the problems I'm trying I have more than enough RAM in my desktop, the GPU is the limitation.
@nanoix9 That's true. You need to write you own class(RecurrentEmbedding) if you want to use index as the input of RecurrentLayer directly.
I was able to reduce the memory usage significantly by using an Embedding
layer at the first step, which allows for a more efficient input form - though isn't quite identical to the model I was hoping to replicate.
To further this question a bit, the categorical_crossentropy
loss implementation in Keras makes use of T.nnet.categorical_crossentropy
. The documentation for this method indicates that the target may be either the same dimension, or a 1 dimensional list of integers which will be treated as a one-hot encoding of the target vector. However, if I give Keras a 1-hot target as the output, I will get an error like Input dimension mis-match. (input[0].shape[1] = 1, input[1].shape[1] = 256)
(in this case I had a 256 dimension softmax as the final layer/target). Is there a way to make this input for work in Keras? The memory use difference is huge for even a modest number of target values.
Extending this further, is there a way to define a layer for mismatched output sizes and Y vectors? That way we could implement more efficent representations for k-hot vectors or othe specialized types. This is similar to the question in issue #1043 , but on the opposite end of the network. Its a bit trickier, since I want the output to be say, a 256 dimension softmax, but I want to represent the target in a more space efficient format.
Is there any way to fix it?
I actually just got this partially working, when Keras munges the input it converts everything to floatX
, so if I can define my own loss that converts the list back to ints
def my_one_hot_categorical_crossentropy(y_true, y_pred):
'''Expects a binary class matrix instead of a vector of scalar classes
'''
epsilon = 1.0e-7
y_pred = T.clip(y_pred, epsilon, 1.0 - epsilon)
# scale preds so that the class probas of each sample sum to 1
y_pred /= y_pred.sum(axis=-1, keepdims=True)
#orig
#cce = T.nnet.categorical_crossentropy(y_pred, y_true)
cce = T.nnet.categorical_crossentropy(y_pred, T.cast(y_true.flatten(), 'int32'))
return cce
Which will now accept (only) a one-hot target vector. I'm new to writing code with Theano, if there is a way to check for a dimension mismatch, and the dimension mismatch is because the target is has a size of 1 along the target dimension, it should call that instead. It would also be good to know how we can insert Theano print statements given the whole Keras pipeline.
Theano categorical_crossentropy accepts one-hot vectors, and it expects a one-hot vector to be of type int instead float. However, keras converts everything to float, and this the reason why we have to convert it back to int as said by RaffEdwardBAH. I think we should add this cost function to keras as a standard one.
After updating to the new head version of Keras, my custom function no longer seems to work. When i compile I get an error on the cce = T.nnet.categorical_crossentropy(y_pred, T.cast(y_true.flatten(), 'int32'))
line that reads
TypeError Traceback (most recent call last)
<ipython-input-9-9af9fd60d08b> in <module>()
11 model.add(Activation('softmax'))
12
---> 13 model.compile(loss=one_hot_categorical_crossentropy, optimizer=Adam(clipnorm=20))
/usr/local/lib/python2.7/dist-packages/Keras-0.3.0-py2.7.egg/keras/models.pyc in compile(self, optimizer, loss, class_mode, theano_mode)
382 else:
383 mask = None
--> 384 train_loss = weighted_loss(self.y, self.y_train, self.weights, mask)
385 test_loss = weighted_loss(self.y, self.y_test, self.weights, mask)
386
/usr/local/lib/python2.7/dist-packages/Keras-0.3.0-py2.7.egg/keras/models.pyc in weighted(y_true, y_pred, weights, mask)
76 mask: binary
77 '''
---> 78 score_array = fn(y_true, y_pred)
79 if mask is not None:
80 score_array *= mask
<ipython-input-8-08ee58c4e220> in one_hot_categorical_crossentropy(y_true, y_pred)
12 #orig
13 #cce = T.nnet.categorical_crossentropy(y_pred, y_true)
---> 14 cce = T.nnet.categorical_crossentropy(y_pred, T.cast(y_true.flatten(), 'int32'))
15 return cce
/usr/local/lib/python2.7/dist-packages/theano/tensor/nnet/nnet.pyc in categorical_crossentropy(coding_dist, true_dist)
1875 return crossentropy_categorical_1hot(coding_dist, true_dist)
1876 else:
-> 1877 raise TypeError('rank mismatch between coding and true distributions')
1878
1879
TypeError: rank mismatch between coding and true distributions
Anyone have an idea of what needs to change to fix this, or is this a regression somewhere else?
Just wanted to add I'm having TypeError issues as well when using categorical_crossentropy
. Mine are a bit weirder:
TypeError: ('An update must have the same type as the original shared variable (shared_var=
<TensorType(float32, matrix)>, shared_var.type=TensorType(float32, matrix),
update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64, matrix)).', 'If the
difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var,
axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')
I don't know why some things were converted to float32
and others are float64
. The inputs in my case are sequences of integers passing through an Embedding
layer and the outputs are sequences of one-hot vectors (int32
).
@jfsantos do you have more details? I only get that if I try to set something by hand, with set_value
. Since numpy is float64.
I solved that error by setting floatX
to float64
on $HOME/keras/keras.json
(I'm testing on a CPU, so it's not a big deal). My model is a stack of LSTMs (with return_sequences=True
), and a TimeDistributedDense layer with softplus activation on top.
The only "fancy" thing I'm doing is that my labels are sequences too, but I did that before the move to multiple backends with MSE as criterion and it worked well (I mean, the results were horrible, but the code worked).
Interesting, was your theano floatX=float64 already?
Btw, Keras does the following to calculate cost functions:(samples, time, dim) -> (samples*time, dim)
. If you have more dimensions than that, it will (samples, time, row, col) -> (samples*time*row, col)
which messes cost average up. I had sequence to sequence learning with input and output videos and my results were horrible too xD. But I don't think that is your problem right?
Yes, I did not have a config for floatX so it defaults to float64 on the CPU.
And nope: my system has a softmax output which is a one-hot encoding of a single label per timestep, so I don't have that kind of issue and evaluating the error over all timesteps at once should not be a big deal.
Will set floatX=float32 okay?
@RaffEdwardBAH : Did you find a work around for your "TypeError: rank mismatch between coding and true distributions" ? I am getting the same error message.
My class labels size is about 60000 (=vocabulary size) which is typical when using one-hot vectors in language model training. It is not feasible for me to use "np_utils.to_categorical" ...
Hello! I am trying to train RNN LSTM with embedding layer on top. My problem is the similar with @Palang2014. In the word level language model i have about 65k class labels so i cant one-hot encode them because of memory usage issue. Can anyone give a tip on how to overcome this?
Thanks.
Edit: seems like sparse_categorical_crossentropy helped
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.
I am a new user of RNN and Keras for language mode. I found Keras accepts 3D tensor as input of RNN, which means word sequences have to be encoded into sequences of word vectors. The simplest is one hot encoding, but that's a heavy waste of memory because most elements in the 3D tensor is zero.
I only find a Embedding layer which accepts index represented word sequence (no need for one hot encoding and thus memory efficient), but such layer generates a DENSE word vector and then feed this vector to the recurrent layer, which forces me to use dense representation instead of one hot encoding.
Is there any efficient way for one hot encoding? Or did I missed something?
Besides, I got "g++ not detected" error while data set goes large, but the same code works for small data set. I asked a question on SO http://stackoverflow.com/questions/33671453/g-not-detected-while-data-set-goes-larger-is-there-any-limit-to-matrix-size I thought larger data set might be supported if there was a memory-saving way for one hot representation.