tensorflow / hub

A library for transfer learning by reusing parts of TensorFlow models.
https://tensorflow.org/hub
Apache License 2.0
3.48k stars 1.66k forks source link

Regularization on Hub Keras Layer #454

Closed meethariprasad closed 4 years ago

meethariprasad commented 4 years ago

Dear TF Hub Team,

I was able to build a simple Siamese Network as follows and perfectly train it, but as suspected it ends up in severely altering generalization of embeddings, resulting in over fitting where not only related sentences, even unrelated sentences move up in cosine similarity. As I understood we need to do fine tuning with lower learning rate or regularization.

Problem in below network is I can't add a drop out directly on top of keras layer as I suspect it alters the embedding coming out of it, nor I can put it on top of dot layer as it doesn't make sense to put a drop out layer on a Cosine Similiarity/Distance.

So question to TF Hub team is, in such architecture below (Or open on alternative suggestions which works similar to below), how can we add drop outs or is there any suggestions/way to alter below Siamese network to ensure we will not loose generalization, still achieve our local corpus similiarity objective?

Git: https://github.com/meethariprasad/research_works/blob/master/Siamese_TF_Cosine_Distance_Fine_Tune.ipynb

Example: As you can see in Git link above cell (post fine tuning): Unrelated sentences "Man is going to Moon" and "Literacy is important for civilization" has come together post fine tuning so much, which need to be reduced.

import tensorflow_hub as hub
from tensorflow import keras
import os
import logging
tf.get_logger().setLevel(logging.ERROR)

huburl = "https://tfhub.dev/google/universal-sentence-encoder/4" #@param ["https://tfhub.dev/google/universal-sentence-encoder/4", "https://tfhub.dev/google/universal-sentence-encoder-large/5"]
loaded_module_obj = hub.load(huburl)
shared_embedding_layer = hub.KerasLayer(loaded_module_obj,trainable=True)

left_input = keras.Input(shape=(), dtype=tf.string)
right_input = keras.Input(shape=(), dtype=tf.string)

embedding_left_output= shared_embedding_layer(left_input)
embedding_right_output= shared_embedding_layer(right_input)

cosine_similiarity=tf.keras.layers.Dot(axes=-1,normalize=True)([embedding_left_output,embedding_right_output])
cos_distance=1-cosine_similiarity

model = tf.keras.Model([left_input,right_input], cos_distance)
#Define Optimizer
#https://machinelearningmastery.com/how-to-avoid-exploding-gradients-in-neural-networks-with-gradient-clipping/

# optim = keras.optimizers.RMSprop(clipnorm=1.)
optim =tf.compat.v1.train.ProximalAdagradOptimizer(learning_rate=0.0001
                                                   ,l1_regularization_strength=0.0,
                                                   l2_regularization_strength=0.01
                                                   )
model.compile(optimizer=optim, loss='mse')
model.summary()
arnoegw commented 4 years ago

Hi @meethariprasad

With all due respect, I think this is more of a question for StackOverflow (how to do fancy task X) than a bug report or feature request for the TF Hub developers. I don't have a full answer for you. But let me provide some partial answers about using Hub in general.