Regularization on Hub Keras Layer

Dear TF Hub Team,

I was able to build a simple Siamese Network as follows and perfectly train it, but as suspected it ends up in severely altering generalization of embeddings, resulting in over fitting where not only related sentences, even unrelated sentences move up in cosine similarity. As I understood we need to do fine tuning with lower learning rate or regularization.

Problem in below network is I can't add a drop out directly on top of keras layer as I suspect it alters the embedding coming out of it, nor I can put it on top of dot layer as it doesn't make sense to put a drop out layer on a Cosine Similiarity/Distance.

So question to TF Hub team is, in such architecture below (Or open on alternative suggestions which works similar to below), how can we add drop outs or is there any suggestions/way to alter below Siamese network to ensure we will not loose generalization, still achieve our local corpus similiarity objective?

Git: https://github.com/meethariprasad/research_works/blob/master/Siamese_TF_Cosine_Distance_Fine_Tune.ipynb

Example: As you can see in Git link above cell (post fine tuning): Unrelated sentences "Man is going to Moon" and "Literacy is important for civilization" has come together post fine tuning so much, which need to be reduced.

import tensorflow_hub as hub
from tensorflow import keras
import os
import logging
tf.get_logger().setLevel(logging.ERROR)

huburl = "https://tfhub.dev/google/universal-sentence-encoder/4" #@param ["https://tfhub.dev/google/universal-sentence-encoder/4", "https://tfhub.dev/google/universal-sentence-encoder-large/5"]
loaded_module_obj = hub.load(huburl)
shared_embedding_layer = hub.KerasLayer(loaded_module_obj,trainable=True)

left_input = keras.Input(shape=(), dtype=tf.string)
right_input = keras.Input(shape=(), dtype=tf.string)

embedding_left_output= shared_embedding_layer(left_input)
embedding_right_output= shared_embedding_layer(right_input)

cosine_similiarity=tf.keras.layers.Dot(axes=-1,normalize=True)([embedding_left_output,embedding_right_output])
cos_distance=1-cosine_similiarity

model = tf.keras.Model([left_input,right_input], cos_distance)
#Define Optimizer
#https://machinelearningmastery.com/how-to-avoid-exploding-gradients-in-neural-networks-with-gradient-clipping/

# optim = keras.optimizers.RMSprop(clipnorm=1.)
optim =tf.compat.v1.train.ProximalAdagradOptimizer(learning_rate=0.0001
                                                   ,l1_regularization_strength=0.0,
                                                   l2_regularization_strength=0.01
                                                   )
model.compile(optimizer=optim, loss='mse')
model.summary()

Hi @meethariprasad

With all due respect, I think this is more of a question for StackOverflow (how to do fancy task X) than a bug report or feature request for the TF Hub developers. I don't have a full answer for you. But let me provide some partial answers about using Hub in general.

Fine-tuning requires a sufficiently large dataset and careful selection of the training objective.
The U.S.E. module you use does not provide regularization losses, but you can do L2 regularization yourself, either with an optimizer that lets you set l2_regularization_strength or by adding loss terms for the elements of shared_embedding_layer.trainable_weights.
I wouldn't globally rule out putting a Dropout layer past the hub.KerasLayer. (Recall that dropout only affects training.)

tensorflow / hub

Regularization on Hub Keras Layer #454