huggingface / text-embeddings-inference

A blazing fast inference solution for text embeddings models
https://huggingface.co/docs/text-embeddings-inference/quick_tour
Apache License 2.0
2.82k stars 176 forks source link

Support gte-Qwen1.5-7B-instruct #261

Open reverland opened 6 months ago

reverland commented 6 months ago

Model description

Here is the model description

gte-Qwen1.5-7B-instruct is the latest addition to the gte embedding family. This model has been engineered starting from the Qwen1.5-7B LLM, drawing on the robust natural language processing capabilities of the Qwen1.5-7B model. Enhanced through our sophisticated embedding training techniques, the model incorporates several key advancements:

Integration of bidirectional attention mechanisms, enriching its contextual understanding. Instruction tuning, applied solely on the query side for streamlined efficiency Comprehensive training across a vast, multilingual text corpus spanning diverse domains and scenarios. This training leverages both weakly supervised and supervised data, ensuring the model's applicability across numerous languages and a wide array of downstream tasks.

I noticed that this model use pooling_mode_lasttoken, which is not supported now.

Open source status

Provide useful links for the implementation

https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct https://sbert.net/docs/package_reference/models.html https://github.com/huggingface/text-embeddings-inference/blob/cc1c510e8d8af8447c01e6b14c417473cf2dfda9/router/src/lib.rs#L364 https://github.com/huggingface/text-embeddings-inference/blob/cc1c510e8d8af8447c01e6b14c417473cf2dfda9/backends/core/src/lib.rs#L59

OKHand-Zy commented 5 months ago

Me too, I use the model have some error. it say no found this model.

(llama) root:~/RAG/ST# python st-gte-Qwen1.5-7B-instruct.py
No sentence-transformers model found with name Alibaba-NLP/gte-Qwen1.5-7B-instruct. Creating a new one with MEAN pooling.
/root/miniconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Traceback (most recent call last):
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct/resolve/main/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/transformers/utils/hub.py", line 399, in cached_file
    resolved_file = hf_hub_download(
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1823, in _raise_on_head_call_error
    raise head_call_error
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
    r = _request_wrapper(
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
    response = _request_wrapper(
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper
    hf_raise_for_status(response)
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 321, in hf_raise_for_status
    raise GatedRepoError(message, response) from e
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-664afbef-6531182244d705fb13a09317;3b1035b6-e893-42b9-8ff3-89666cc9c3a8)

Cannot access gated repo for url https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct/resolve/main/config.json.
Access to model Alibaba-NLP/gte-Qwen1.5-7B-instruct is restricted. You must be authenticated to access it.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/RAG/ST/st-gte-Qwen1.5-7B-instruct.py", line 15, in <module>
    model = SentenceTransformer("Alibaba-NLP/gte-Qwen1.5-7B-instruct", trust_remote_code=True)
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 205, in __init__
    modules = self._load_auto_model(
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 1197, in _load_auto_model
    transformer_model = Transformer(
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 35, in __init__
    config = AutoConfig.from_pretrained(model_name_or_path, **model_args, cache_dir=cache_dir)
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 934, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
    resolved_config_file = cached_file(
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/transformers/utils/hub.py", line 417, in cached_file
    raise EnvironmentError(
OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct.
401 Client Error. (Request ID: Root=1-664afbef-6531182244d705fb13a09317;3b1035b6-e893-42b9-8ff3-89666cc9c3a8)

Cannot access gated repo for url https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct/resolve/main/config.json.
Access to model Alibaba-NLP/gte-Qwen1.5-7B-instruct is restricted. You must be authenticated to access it.
(llama) root:~/RAG/ST#