Update Universal Sentence Encoder for TF2.0 SavedModel

tensorflow / hub

A library for transfer learning by reusing parts of TensorFlow models.

https://tensorflow.org/hub

Apache License 2.0

3.48k stars 1.66k forks source link

Update Universal Sentence Encoder for TF2.0 SavedModel #735

Closed maziyarpanahi closed 3 years ago

maziyarpanahi commented 3 years ago

Hi,

The following models for multi-lingual Universal Sentence Encoder are not available as TF2.0 SavedModel:

Would it be possible to ask the authors/devs to provide a new version for TF2.0 Saved Model like other multi-lingual USE models: https://tfhub.dev/google/universal-sentence-encoder-multilingual/3

Many thanks

Matthieu-Tinycoaching commented 3 years ago

Hi,

I would be highly interested for having https://tfhub.dev/google/universal-sentence-encoder-xling/en-fr/1 as TF2.0 saved model, since it gives more accurate results than the multilangual model.

Thanks !

akhorlin commented 3 years ago

From conversation with the publisher, TF2 models:

https://tfhub.dev/google/universal-sentence-encoder-cmlm/en-base/1 https://tfhub.dev/google/universal-sentence-encoder-cmlm/multilingual-base/1

should supercede the old TF1 versions (https://tfhub.dev/google/universal-sentence-encoder-xling-many/1, https://tfhub.dev/google/universal-sentence-encoder-xling/en-de/1, https://tfhub.dev/google/universal-sentence-encoder-xling/en-es/1, https://tfhub.dev/google/universal-sentence-encoder-xling/en-fr/1)

maziyarpanahi commented 3 years ago

Thanks @akhorlin

I have had the time to test these new models. The accuracy is similar and sometimes better, however, the speed is not comparable with the original USE models. The new CMLM models are based on BERT, so it still needs to deal with tokens/encoding/word_by_word prediction and whatever happens to make the vectors to 1 per text.

The original USE models are much faster and their accuracy is still close to STOA. I would appreciate it if we can get updates from the missing TF v2 models.