tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
186.65k stars 74.36k forks source link

RuntimeError: Cannot use a constraint function on a sparse variable in google colab #47155

Closed addy1997 closed 3 years ago

addy1997 commented 3 years ago

I am trying to train my model using Keras and TensorFlow 2.x, while using the model.fit() method I ran into this error

Error

Epoch 1/10
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)

<ipython-input-73-4871a80f91a3> in <module>()
      2 
      3 for i in range(N_epoch):
----> 4     model.fit(x=train_X,y=train_Y,batch_size=32,epochs=10,verbose=1, validation_data=(val_X,val_Y))
      5     output = model.predict_proba(val_X, batch_size=10, verbose=1)
      6     # find validation accuracy using the best threshold value t

9 frames

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    975           except Exception as e:  # pylint:disable=broad-except
    976             if hasattr(e, "ag_error_metadata"):
--> 977               raise e.ag_error_metadata.to_exception(e)
    978             else:
    979               raise

RuntimeError: in user code:

    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:805 train_function  *
        return step_function(self, iterator)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:795 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:1259 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2730 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:3417 _call_for_each_replica
        return fn(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:788 run_step  **
        outputs = model.train_step(data)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:757 train_step
        self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:498 minimize
        return self.apply_gradients(grads_and_vars, name=name)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:635 apply_gradients
        "name": name,
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2941 merge_call
        return self._merge_call(merge_fn, args, kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2948 _merge_call
        return merge_fn(self._strategy, *args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:683 _distributed_apply  **
        var, apply_grad_to_update_var, args=(grad,), group=False))
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2494 update
        return self._update(var, fn, args, kwargs, group)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:3431 _update
        return self._update_non_slot(var, fn, (var,) + tuple(args), kwargs, group)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:3437 _update_non_slot
        result = fn(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:650 apply_grad_to_update_var  **
        "Cannot use a constraint function on a sparse variable.")

    RuntimeError: Cannot use a constraint function on a sparse variable.

System information

Code is given below. Link to the colab for full code.


# Train data preparation
N = datasets[0].shape[0]
conv_input_width = W.shape[1]
conv_input_height = int(datasets[0].shape[1]-1)

# For each word write a word index (not vector) to X tensor
train_X = np.zeros((N, conv_input_height), dtype=np.int)
train_Y = np.zeros((N, 2), dtype=np.int)
for i in range(N):
    for j in range(conv_input_height):
        train_X[i, j] = datasets[0][i, j]

print ('train_X.shape = {}'.format(train_X.shape))
print ('train_Y.shape = {}'.format(train_Y.shape))

# Validation data preparation
Nv = datasets[1].shape[0]

# For each word write a word index (not vector) to X tensor
val_X = np.zeros((Nv, conv_input_height), dtype=np.int)
val_Y = np.zeros((Nv, 2), dtype=np.int)
for i in range(Nv):
    for j in range(conv_input_height):
        val_X[i, j] = datasets[1][i, j]
print('val_X.shape = {}'.format(val_X.shape))
print('val_Y.shape = {}'.format(val_Y.shape))
for i in range(Nv):
    val_Y[i,data_train.iloc[i,3]] = 1

from keras.optimizers import RMSprop
from keras import backend
backend.set_image_data_format('channels_first')
import keras

# Number of feature maps (outputs of convolutional layer)
N_fm = 200
# kernel size of convolutional layer
kernel_size = 5

model = Sequential()
# Embedding layer (lookup table of trainable word vectors)
model.add(Embedding(input_dim=W.shape[0], 
                    output_dim=W.shape[1], 
                    input_length=conv_input_height,
                    weights=[W], 
                    embeddings_constraint=UnitNorm,
                    name = 'e_l'))
# Reshape word vectors from Embedding to tensor format suitable for Convolutional layer
model.add(Reshape((1, conv_input_height, conv_input_width)))

# first convolutional layer
model.add(Convolution2D(N_fm,
                        kernel_size, 
                        conv_input_width,
                        kernel_initializer='random_uniform',
                        padding='valid',
                        kernel_regularizer=l2(0.001)))
# ReLU activation
model.add(Activation('relu'))

# aggregate data in every feature map to scalar using MAX operation
model.add(MaxPooling2D(pool_size=(conv_input_height+kernel_size+1,1), padding='same'))

model.add(Flatten())
model.add(Dropout(0.4))
model.add(Dense(128,kernel_initializer='random_uniform'))
model.add(Activation('relu'))
model.add(Dropout(0.4))
# Inner Product layer (as in regular neural network, but without non-linear activation function)
model.add(Dense(2))
# SoftMax activation; actually, Dense+SoftMax works as Multinomial Logistic Regression
model.add(Activation('softmax'))

# Custom optimizers could be used, though right now standard adadelta is employed
opt = RMSprop(lr=0.001, rho=0.9, epsilon=None)
model.compile(loss='mean_squared_error', 
              optimizer=opt,
              metrics=['accuracy'])

The line that throws the error

N_epoch = 3

for i in range(N_epoch):
    model.fit(x=train_X,y=train_Y,batch_size=32,epochs=10,verbose=1, validation_data=(val_X,val_Y))
    output = model.predict_proba(val_X, batch_size=10, verbose=1)
    # find validation accuracy using the best threshold value t
    vacc = np.max([np.sum((output[:,1]>t)==(val_Y[:,1]>0.5))*1.0/len(output) for t in np.arange(0.0, 1.0, 0.01)])
    # find validation AUC
    vauc = roc_auc_score(val_Y, output)
    val_acc.append(vacc)
    val_auc.append(vauc)
    print('Epoch {}: validation accuracy = {:.3%}, validation AUC = {:.3%}'.format(epoch, vacc, vauc))
    epoch += 1

print('{} epochs passed'.format(epoch))
print('Accuracy on validation dataset:')
print(val_acc)
print('AUC on validation dataset:')
print(val_auc)

Tweaks I tried

  1. I have tried changing the UnitNorm in the embedding layer
  2. Verified the embedding layer doesn't use sparse data. Instead, it uses a dense matrix to store that data.
  3. Referred this link but couldn't solve my error.

Please can anyone suggest a solution? Thanks

Saduf2019 commented 3 years ago

@addy1997 I ran the code shared and face a different error, please find the gist here.

addy1997 commented 3 years ago

@Saduf2019 Here's my colab link

for uploading files

essay.csv file link - https://github.com/addy1997/Task9-personality-prediction/blob/main/essays.csv link to imdb-train-val-testN.pickle file - https://github.com/addy1997/Task9-personality-prediction/blob/main/imdb-train-val-testN.pickle

Saduf2019 commented 3 years ago

@addy1997 I ran the code but face a truncating error, please find the gist here.

addy1997 commented 3 years ago

@addy1997 I ran the code but face a truncating error, please find the gist here.

I uploaded the file and I don't get that truncating error - gist

addy1997 commented 3 years ago

@Saduf2019 were you able to get rid of the error?

Saduf2019 commented 3 years ago

I am able to replicate the issue reported on 2.x, please find the gist here for nightly, and tf 2.4

addy1997 commented 3 years ago

I am able to replicate the issue reported on 2.x, please find the gist here.

That's great. Now, what do you suggest about the issue? What is to be done to solve it? @tensorflower-gardener can you suggest something on this?

jvishnuvardhan commented 3 years ago

@addy1997 Is it possible to share a simple standalone code to reproduce the issue? Thanks!

addy1997 commented 3 years ago

@addy1997 Is it possible to share a simple standalone code to reproduce the issue? Thanks!

Sorry, it is not possible to provide a standalone code without loading files as I am performing text classification which requires the ".pickle" file to be loaded.

Here's the link to my colab and link to the .pickle file.

mattdangerw commented 3 years ago

This is low priority for us right now, but please consider contributing a fix!

addy1997 commented 3 years ago

This is low priority for us right now, but please consider contributing a fix!

This is a new issue and really important part of my project. At least, can you give me some hint about how to tackle this @mattdangerw ? I have asked it on stackoverflow but they told me to raise an issue on github.

mattdangerw commented 3 years ago

Hi sorry for the quick reply above, was typing fast in our triage meeting.

Dug a little deeper. This looks like a duplicate of #33755

I would suggest the workaround here, can you see if this works for you?

We have also bumped up the priority for the #33755.

Thanks for bringing this up!

addy1997 commented 3 years ago

Hi sorry for the quick reply above, was typing fast in our triage meeting.

Dug a little deeper. This looks like a duplicate of #33755

I would suggest the workaround here, can you see if this works for you?

We have also bumped up the priority for the #33755.

Thanks for bringing this up!

@mattdangerw the workaround you suggested isn't working for my code. Here's the colab

mattdangerw commented 3 years ago

With the workaround, you should be removing any use of embeddings_constraint completely. You may need to switch to a functional model.

This is broken with #33755:

X = np.random.randint(100, size=(32, 10))
Y = np.ones((32, 1))
model = tf.keras.Sequential()
model.add(tf.keras.layers.Embedding(100, 8, input_length=10, embeddings_constraint=tf.keras.constraints.UnitNorm(axis=1)))
model.add(tf.keras.layers.Dense(1))
model.compile('rmsprop', 'mse')
model.fit(X, Y)

This should work:

X = np.random.randint(100, size=(32, 10))
Y = np.ones((32, 1))
input = tf.keras.Input(shape=(10,))
output = tf.keras.layers.Embedding(100, 8, input_length=10)(input)
output = tf.keras.constraints.UnitNorm(axis=1)(output)
output = tf.keras.layers.Dense(1)(output)
model = tf.keras.Model(input, output)
model.compile('rmsprop', 'mse')
model.fit(X, Y)

For updates on the main bug, follow #33755. This bug can be closed as a duplicate.

addy1997 commented 3 years ago

@mattdangerw thanks a lot. I removed the embedding constraint and trained my model. It worked perfectly. colab.

Though the model overfits (validation acc.- 53% and accuracy= 0.9987%), I will try to optimize it. Thanks to @jvishnuvardhan and @Saduf2019 for their guidance. This bug was really important.

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No