Closed hsekol-hub closed 4 years ago
@lokesharma-dev
Request you to share TF version you are using?.Please, share colab link or simple standalone code with proper indentation and supporting files to reproduce the issue in our environment.It helps us in localizing the issue faster.Thanks!
@ravikyram Tensorflow 2.2.0 Colab link: https://colab.research.google.com/drive/1sdv-sGE80HwdPKkl_p3gakhkxgKO-zRv?usp=sharing
The files required are attached. In case anything is not accessible. Kindly notify. Thanks for your time.
I have tried in colab with TF version 2.2, nightly (2.3.0-dev20200614
) and was able to reproduce the issue.Please, find the gist here.Thanks!
System information
**Hello, I am trying to build a model that relies on the concept of virtual adversarial training for text classification. The model involves the functional API of Keras and I have been to build a model graph. However, there is something wrong when I try to fit my training dataset on the model. Please suggest to me where I might be going wrong in the process.
Virtual Adversarial Training involves calculating KL DIvergence between two probability distribution. The shape of train data is displayed as well to get a better understanding about the problem. The code has been inspired from the following link but I have changed the domain to text preprocessing and is more advanced.
More information: Code: [https://gist.github.com/divamgupta/c778c17459c1f162e789560d5e0b2f0b] Theory: [https://divamgupta.com/unsupervised-learning/semi-supervised-learning/2019/05/31/introduction-to-virtual-adversarial-training.html]
Google Colab link for below code: [https://colab.research.google.com/drive/1sdv-sGE80HwdPKkl_p3gakhkxgKO-zRv?usp=sharing] **
CODE
` import numpy as np import random import time
------------------- Tensorflow
import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers from tensorflow.keras.models import Model, Sequential from tensorflow.keras.layers import Input, Embedding, Dense, LSTM, Bidirectional
MAX_VOCAB_SIZE = len(word_index) + 1 # maximum no of unique words MAX_DOC_LENGTH = 500 # maximum no of words in each sentence EMBEDDING_DIM = 300 # Embeddings dimension from Glove directory
def compute_kld(p_logit, q_logit): p = tf.nn.softmax(p_logit) q = tf.nn.softmax(q_logit) kl_score = tf.reduce_sum( p * (tf.math.log(p+1e-16) - tf.math.log(q+1e-16)), axis = 1) return kl_score # lower kl means closer the distributions are
inputs = Input(shape=(MAX_DOC_LENGTH,), name="Seq_Input") # Text Sequence is the first input
inputs = tf.Variable(tf.zeros(shape=(MAX_DOC_LENGTH,)))
def createEmbd(inputs): # Creates Embeddings for a sequence of words return layers.Embedding(input_dim=MAX_VOCAB_SIZE, output_dim=EMBEDDING_DIM, input_length = MAX_DOC_LENGTH, trainable=True, mask_zero=True, name="Keras_Embedding")(inputs)
input_emb = createEmbd(inputs) noise_emb = tf.random.uniform(shape=tf.shape(input_emb)) # Idea is to add noise to these embeddings
noise_emb = tf.math.add(input_emb, noise_emb)
noise_emb = input_emb + noise_emb
input_h1 = layers.LSTM(units=128,name="Input_h1")(input_emb) noise_h1 = layers.LSTM(units=128,name="Noise_h1")(noise_emb)
p_logit = layers.Dense(units=16, activation='relu', name="p_logit")(input_h1) p_logit_r = layers.Dense(units=16, activation='relu', name="p_logit_r")(noise_h1)
with tf.GradientTape(watch_accessed_variables=False) as tape: tape.watch(noise_emb) kl_score = compute_kld(p_logit, p_logit_r) kl_score = tf.convert_to_tensor(kl_score, dtype=tf.float32) grads = tape.gradient(kl_score, noise_emb) # Differentiate kl_score with respect to noise_embd
.#p_logit = tf.stop_gradient(p_logit) .#p_logit_r = tf.stop_gradient(p_logit_r)
.# Due to some reason the first execution returned "None" for gradients so manually added the shape to be able to build the model if grads is None: grads = tf.random.uniform(shape=tf.shape(noise_emb))
vadv_emb = tf.math.add(input_emb, grads) vadv_h1 = layers.LSTM(units=128,name="vadv_h1")(vadv_emb) q_logit = layers.Dense(units=16, activation='relu', name="q_logit")(vadv_h1)
vat_loss = compute_kld(p_logit, q_logit) # I need to add this vat loss(Scalar) to the final cost function
.# logits = layers.average([p_logit, p_logit_r, q_logit]) outputs = layers.Dense(units=1, activation='softmax', name="output")(p_logit) model = keras.Model(inputs, outputs)
model.add_loss(vat_loss)
.# Not sure if this graph has any problem keras.utils.plot_model(model, show_shapes=True, show_layer_names=True)
model.summary()
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),loss= 'binary_crossentropy',metrics=['accuracy','precision'])
Shuffle data random before splitting
indices = np.arange(sequences.shape[0]) random.Random(1).shuffle(indices) data = sequences[indices] labels = y[indices]
num_test_samples = int(0.2 * data.shape[0]) x_train = data[:-num_test_samples] y_train = labels[:-num_test_samples] x_test = data[-num_test_samples:] y_test = labels[-num_test_samples:] print(x_train.shape, y_train.shape) print(x_train[0].shape, y_train[0].shape)
Output: (400, 500) (400,) (500,) ()
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)) train_dataset.take(1)
output <TakeDataset shapes: ((500,), ()), types: (tf.int32, tf.int64)>
model.fit(train_dataset, epochs=2, batch_size=40)
model.fit(x_train, y_train, epochs=2, validation_split=0.2, shuffle=True, batch_size=32)
`
Any other info/logs Error: Epoch 1/2 WARNING:TensorFlow:Model was constructed with shape (None, 500) for input Tensor("Seq_Input:0", shape=(None, 500), dtype=float32), but it was called on an input with incompatible shape (500, 1).
ValueError Traceback (most recent call last)