tensorflow / hub

A library for transfer learning by reusing parts of TensorFlow models.
https://tensorflow.org/hub
Apache License 2.0
3.48k stars 1.66k forks source link

Universal sentence encoder retraining #110

Closed choran closed 6 years ago

choran commented 6 years ago

Hi, I am trying to fine-tune the latest ("https://tfhub.dev/google/universal-sentence-encoder-large/3") USE on custom data. I want to set the learning rate low so that it does not significantly alter the original embeddings. So far I am a little confused how to do this. So far I have looked at: 1/ https://www.tensorflow.org/hub/fine_tuning - to better understand fine-tuning and this was helpful 2/ I looked at #46 and #36 and these also contained some helpful info such as confirming that some form of retraining should be possible (i.e. a "fuzzy yes"). 3/ I looked through the example code to use a classifier on top of the USE which shows the benefits of this form of transfer learning 4/ I have also looked through the general tf.hub docs to better understand how to use the feature columns and input functions when retaining pre-trained module.

But I am still unsure exactly how to implement this. For example, I was looking through the USE docs to see if there were specs to show what is available for retraining. Does anyone have a simple code example of how this might look or maybe there are some docs I am missing to show how you would add your own layers ontop of the saved model with your own data?

Cheers Cathal

choran commented 6 years ago

As an example, the ELMO module provides some examples of the trainable parameters. I was wondering was there any similar examples for the USE?

vbardiovskyg commented 6 years ago

One example that would generalize to USE is here: https://colab.research.google.com/github/tensorflow/hub/blob/master/docs/tutorials/text_classification_with_tf_hub.ipynb

In general calling hub.Module("https://tfhub.dev/google/universal-sentence-encoder-large/3", trainable=True) will expose the variables as trainable.

The learning rate is usually a parameter of the optimizer (tf.train.AdagradOptimizer in the colab above).

choran commented 6 years ago

ok, I didnt see that example, that is helpful thanks. I will try and tweak that and see how it goes.

Right now I am using the default USE and getting some pretty good results. It works If I look for the top 3 most similar sentences for example.

But one thing I notice is that the absolute value of the similar score I am using (the cosine similarity used in the USE colab example) varies a bit more that I thought. i.e. the absolute similarity can var between .7 to .3 with .3 still representing what I would consider a strong match.

So I am now trying to see if there is some way to train this to get a better absolute match or use some other similarity measure to see if it impacts the absolute scores.

choran commented 6 years ago

Just one other question I have about accessing the saved model for the USE. I dont see a metagraph associated with this saved model? I used the saved model cli but it dos not find any tag sets for the saved model. Is this expected? I.e the variables are available but the model itself it not?

hamifthi commented 6 years ago

i use USE with the format of ELMO module and it works, you can use a classifier that your module become more sensitive to the exact sentence that you are looking for

MFarhatUllah commented 6 years ago

is there a method to download tf-hub Universal Sentence Encoder and from where? can we reuse from it our PC directory ?

andresusanopinto commented 6 years ago

@MFarhatUllah - you have 2 options:

1 (recomended) - Check the section about caching modules in basic documentation to know how to setup a cache dir in a location you control.

2 - Check the documentation about downloading modules in common issues

MFarhatUllah commented 6 years ago

Thanks buddy it works.

On Thu, Aug 9, 2018 at 2:22 PM, André Susano Pinto <notifications@github.com

wrote:

@MFarhatUllah https://github.com/MFarhatUllah - you have 2 options:

1 (recomended) - Check the session about caching modules in basic documentation https://www.tensorflow.org/hub/basics to know how to setup a cache dir in a location you control.

2 - Check the documentation about downloading modules in common issues https://github.com/tensorflow/hub/blob/master/docs/common_issues.md

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/hub/issues/110#issuecomment-411694868, or mute the thread https://github.com/notifications/unsubscribe-auth/AnmBZlDuB1b6U_hfuIvs_cfIV5x6VPL4ks5uO_9HgaJpZM4VZ8w2 .

choran commented 6 years ago

@Hamifthi thanks for the clarification. I have trained a classifier model using the USE and set it to trainable so it should be more tuned to our sentences. However, I am not 100% how to reuse that specific module. For example, how do you identify in your code to use the re-trained module and not the default one? Do you need to specify the directory where it is downloaded to? Thanks again for the update Cathal

nico1as commented 6 years ago

Hello all,

I have the same concern than @Choran.

I would like to enhance the USE initial model with my own data. Let me explain: my custom data are domain specific (ie: aerospace domain) therefore, there is a good chance that many of my worlds was not in the initial training corpus use to build the USE pre-trained model.

Then, I'm not looking to classify anything right now or translate sentences in other language. I only want to start from the USE model you already shared, and add my own data-source to learn a new USE model enhance with my domain data (called USE_aero for example).

After that, I would like to have the possibiliyt to point on my new USE_aero model and use it for transfer learning on specific classification tasks (but this part is ok and well documented)

Well, If you have any ideas to perform such pre-trained task, it will be very helpfull. Thanks a lot,

Nicolas

akhorlin commented 6 years ago

One approach for retraining USE in an unsupervised fashion is to replicate training procedure described in https://arxiv.org/abs/1803.11175 but initialize the starting weight using existing USE module.

On Thu, Aug 16, 2018 at 2:20 PM nico1as notifications@github.com wrote:

Hello all,

I have the same concern than @choran https://github.com/choran.

I would like to enhance the USE initial model with my own data. Let me explain: my custom data are domain specific (ie: aerospace domain) therefore, there is a good chance that many of my worlds was not in the initial training corpus use to build the USE pre-trained model.

Then, I'm not looking to classify anything right now or translate sentences in other language. I only want to start from the USE model you already shared, and add my own data-source to learn a new USE model enhance with my domain data (called USE_aero for example).

After that, I would like to have the possibiliyt to point on my new USE_aero model and use it for transfer learning on specific classification tasks (but this part is ok and well documented)

Well, If you have any ideas to perform such pre-trained task, it will be very helpfull. Thanks a lot,

Nicolas

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/hub/issues/110#issuecomment-413526486, or mute the thread https://github.com/notifications/unsubscribe-auth/AbunTEpFAyCZh-q5mVPwZ0tEWlRk-7LEks5uRWNwgaJpZM4VZ8w2 .

shabie commented 6 years ago

So essentially there is not way to train this model right now (from scratch) for other languages like German?

vbardiovskyg commented 6 years ago

Nope, that's not possible at the moment, due to at least these two reasons:

  1. training procedure is not fully contained inside the module,
  2. vocabulary is frozen to English.
lucasmengual92 commented 5 years ago

Hi @nico1as, is there a chance I can contact you. I'm also looking to retrain USE for automobile corpuses-oriented, so would be nice to know if you found out a way to solved the matter.

nreimers commented 5 years ago

Hi @lucasmengual92 I came here as I also tried to find a way on how to fine-tune USE for custom data. Sadly the paper does not really contain enough information to understand the network model or how it was trained (on which tasks and with which objectives exactly). Would be great if more details about USE could be published or open-sourced.

I currently work on a sentence embedding method based on transformer networks. It outperforms USE on common tasks like various Semantic Textual Similarity tasks and transfer learning tasks like sentiment classification, subjectivity prediction etc. Further, adapting to new tasks and domains is rather straight forward and was so war very successful (only limited training data needed).

The paper is currently under review and open-source code will be released soon to Github. If you are interested, feel free to get in touch with me. Official release to Github will be mid June.

lucasmengual92 commented 5 years ago

Alright thanks a lot with the info. Is what I perceived for the last days, the limitations of USE to a more adhoc industry data. It´ll be great to keep in touch, could email me at l.mengual(at)gmail(dot)com (couldnt find a way to contact you directly. Specially interested in the open-source code work done with transformer methods for my project problem.

nreimers commented 5 years ago

Hi @lucasmengual92, tried to email you, but gmails respondes with 'account does not exists'.

Feel free to email me at: Rnils@web.de

deepankar27 commented 5 years ago

@nreimers : I am also looking for some unsupervised sentence encoder model for non-English like skip-thoughts, I am still in researching mode where reading different papers & evaluating which fits my requirement so, it would be really great if you can share your ideas with me as well. Please feel free to email me at deepankar27@gmail.com.

nreimers commented 5 years ago

I released today the code to fine-tune BERT and XLNet for the generation of sentence embeddings: https://github.com/UKPLab/sentence-transformers

Training (fine-tuning) of own models is easy and various data-formats are supported (softmax classification, regression, various triplet losses, ranking losses).

Documentation is a bit sparse, but in case of questions, feel free to contact me.

nandhini2312 commented 5 years ago

Alright thanks a lot with the info. Is what I perceived for the last days, the limitations of USE to a more adhoc industry data. It´ll be great to keep in touch, could email me at l.mengual(at)gmail(dot)com (couldnt find a way to contact you directly. Specially interested in the open-source code work done with transformer methods for my project problem.

I am also need to do the same for insurance domain corpus

nandhini2312 commented 5 years ago

@lucasmengual92 If you find a way share it with me also

dabasmoti commented 4 years ago

@nreimers can you share to repo link please? Dabastany@gmail.com

nreimers commented 4 years ago

@dabasmoti here is the link https://github.com/UKPLab/sentence-transformers