keras-team / keras-hub

Pretrained model hub for Keras 3
Apache License 2.0
797 stars 243 forks source link

semantic similarity example (Siamese BERT-Network) #934

Closed abuelnasr0 closed 1 year ago

abuelnasr0 commented 1 year ago

add example to keras.io to train a bert model to generate sentence embeddings and then perform semantic similarity on them. using technique used in Sentence-Bert: https://arxiv.org/abs/1908.10084

NiharJani2002 commented 1 year ago

@mattdangerw , Is this issue assigned to someone else ? If not I am going to work on it.

shivance commented 1 year ago

keras-io/#1299

abuelnasr0 commented 1 year ago

@shivance Take a look at the paper. Or search for sbert at google. That's different than passing the two sentences to the same model and get a score between 0-1. That is called cross-encoder. The approach in the paper is to pass only one sentence to the model and the model will generate an embedding that holds the semantic meaning of the sentence where you can compare it to other sentences embeddings to get similiar sentence.

NiharJani2002 commented 1 year ago

@abuelnasr0 can you assign me this issue ?

abuelnasr0 commented 1 year ago

@NiharJani2002 I am not a mentor bro. That's missunderstanding. I am asking the mentors if they are interested to add this example. And I will add it.

NiharJani2002 commented 1 year ago

@abuelnasr0 okay.

mattdangerw commented 1 year ago

@abuelnasr0 sounds good to me! I think this would make for a great keras.io example, especially since IIUC correctly this is a more complex fine-tuning procedure on top of BERT/RoBERTa pretrained models. I like this idea a lot!

I think the thing to do here would be to show creating a custom model and training it one top of either a BertBackbone or RobertaBackbone, much like this section of our getting started guide. https://keras.io/guides/keras_nlp/getting_started/#fine-tuning-with-a-custom-model

I would first try to get a proof of concept colab working for yourself, then when you think of having a good overall flow, convert that to a keras.io "tutobook" and make a PR there. See https://github.com/keras-team/keras-io#creating-a-new-example-starting-from-a-ipynb-file

The overall goal should probably be twofold here...

Thanks!

jbischof commented 1 year ago

@abuelnasr0 please be respectful to all community members. Filing an issue does not mean that someone owns it or even that it is a good fit for our library. If we decide to proceed and you would like to work on it please let us know but we want to foster a community of respect and goodwill.

abuelnasr0 commented 1 year ago

@mattdangerw it's only one backbone that will be used. I will create a colab with the example and send the links as soon as I finish.

mattdangerw commented 1 year ago

it's only one backbone that will be used.

Ah I see now from the paper, the siamese network structure shares weights between the two towers? So effectively, one backbone, invoked twice?

abuelnasr0 commented 1 year ago

So effectively, one backbone, invoked twice?

yes exactly, I didn't train a siamese network in keras at all. but I think if I passed two inputs to the same backbone and get two outputs and put a loss function in top of them it will work fine and the weights of the model will be updated without any problem.

abuelnasr0 commented 1 year ago

@mattdangerw here is a colab with an example but it is not complete just the model and a small training example: https://colab.research.google.com/drive/1Pm5dlJVWq99D4yobse_oOjYe-BapCK4H#scrollTo=ywMubN_HiF60&uniqifier=1

the example uses the second approach in the mentioned paper (regression objective function).

abuelnasr0 commented 1 year ago

sorry for the delay. I have opened a pull request keras-team/keras-io#1405