jina-ai / jina-hub

An open-registry for hosting Jina executors via container images
Apache License 2.0
104 stars 49 forks source link

Add LaBSE support #123

Closed bhavsarpratik closed 4 years ago

bhavsarpratik commented 4 years ago

Describe the feature Till now, the best way to make a multilingual semantic search was with multilingual-USE. But it supports only 16 languages.

2 days back Google released LaBSE through which Jina can support multilingual semantic search for 109 languages.

https://ai.googleblog.com/2020/08/language-agnostic-bert-sentence.html

Your proposal Add a new labse module to /encoders/nlp

Notes LaBSE is 1.63GB on TF-Hub. This model is not suitable for search in English.

From paper "We observe that LaBSE performs worse on pairwise English semantic similarity than other sentence embedding models. This result contrasts with its excellent performance on cross-lingual bi-text retrieval. "

┆Issue is synchronized with this Jira Task by Unito

JoanFM commented 4 years ago

Hey @bhavsarpratik,

It seems like a really interesting feature to have!

But being a BERT based model, would it be available through our BaseTransformerEncoder just by passing the model name when supported bu hugging-face transformer or by passing the model file?

bhavsarpratik commented 4 years ago

Hi Joan, I checked the TF-Hub model instructions and they are like BERT! We can close the issue.

nan-wang commented 4 years ago

@bhavsarpratik Thanks for your comment, Pratik! I'll close this issue for the time being.