Promote Huggingface Hub to first class citizen

NirantK commented 9 months ago

The latest plans are in the most recent comment at the end

Error Handling improvements will come from two main improvements:

Migrating away from GCP to Huggingface Hub completely
1. This will reduce the edge cases we need to maintain, including file renaming and similar code too
For models which we push to HF Hub, we can add a “name” and “sources” field —
1. where the name is what HF Hub base model and sources is a list of community or Qdrant models

This issue is about the first one.

How to push models?

This is a good reference contribution: https://huggingface.co/weakit-v/bge-base-en-v1.5-onnx/tree/main

This is what we should aim to replicate as much as we can. We'll have these models under the Qdrant Huggingface Hub account instead. So they'd be something like: qdrant/bge-base-en-v1.5-onnx

{
   name: "BAAI/bge-base-en-v1.5",
   sources: ["qdrant/bge-base-en-v1.5-onnx", "weakit-v/bge-base-en-v1.5-onnx"]
}

We'll have to do this for each model, one at a time:

[x] BAAI/bge-small-en-v1.5
[x] BAAI/bge-base-en-v1.5
[x] sentence-transformers/all-MiniLM-L6-v2 — do not quantize the model and push as is
[x] intfloat/multilingual-e5-large
[ ] jinaai/jina-embeddings-v2-small-en — we should be able to retain the existing embedding implementation
[ ] jinaai/jina-embeddings-v2-base-en — we should be able to retain the existing embedding implementation

In this process, we deprecate the following models by not porting them from GCP to HF Hub on our account:

BAAI/bge-small-en
BAAI/bge-small-zh-v1.5
BAAI/bge-base-en

generall commented 8 months ago

As a first step, I would implement support for both - HF and arbitrary links like it is right now. Then if we will see benefit of complete migration, we can continue

NirantK commented 8 months ago

Motivation

Why should we consider upgrading Huggingface Hub as a download option from JinaEmbedding specific to all Embedding models?

That makes it easier to support new models — including community additions
Improves error handling e.g. we can add multiple sources for the same model, consistent naming convention
Download stats for models built by Qdrant via Huggingface Hub

Where will we change things?

We'll change things in the Embedding class. We'll add a download_from_hf or similar function. We will continue to support existing models via GCS (arbitrary URLs via requests) in addition to Huggingface Hub.

Download Function will:

First, check if there is a Huggingface model or source
If yes, download from there and corresponding loaders will be used
If not, we check from Google Cloud Storage/URL

At the end of this issue, users will be able to:

[ ] Pass their own download URLs
[ ] Pass a name and ONNX port source in a single PR to add a new model

How to push models?

This is a good reference contribution: https://huggingface.co/weakit-v/bge-base-en-v1.5-onnx/tree/main

This is what we should aim to replicate as much as we can. We'll have these models under the Qdrant Huggingface Hub account instead. So they'd be something like: qdrant/bge-base-en-v1.5-onnx

{
   name: "BAAI/bge-base-en-v1.5",
   sources: ["qdrant/bge-base-en-v1.5-onnx", "weakit-v/bge-base-en-v1.5-onnx"]
}

We'll have to do this for each model, one at a time:

[ ] BAAI/bge-small-en-v1.5
[ ] BAAI/bge-base-en-v1.5
[ ] sentence-transformers/all-MiniLM-L6-v2 — do not quantize the model and push as is
[ ] intfloat/multilingual-e5-large
[ ] jinaai/jina-embeddings-v2-small-en — we should be able to retain the existing embedding implementation
[ ] jinaai/jina-embeddings-v2-base-en — we should be able to retain the existing embedding implementation

NirantK commented 8 months ago

cc @Anush008 Added the 4 models here: https://huggingface.co/Qdrant. This should unblock you completely

When a -Q and without Q suffix both are available, prefer the one without the Q suffix.

The other pattern is that of Jina: https://huggingface.co/jinaai/jina-embeddings-v2-small-en/tree/main

I'd recommend that we find a way to handle this without downloading PyTorch files at all. If we can't find that, open a new issue and I'll coordinate with the Jina folks.

x4080 commented 7 months ago

is "Qdrant/bge-m3-onnx" already supported ?

qdrant / fastembed