huggingface / huggingface_hub

The official Python client for the Huggingface Hub.
https://huggingface.co/docs/huggingface_hub
Apache License 2.0
1.98k stars 514 forks source link

client.sentence_similarity() does not use correct route by default #2494

Open MoritzLaurer opened 2 weeks ago

MoritzLaurer commented 2 weeks ago

Describe the bug

I have a TEI embedding model endpoint created like this:

from huggingface_hub import create_inference_endpoint

repository = "thenlper/gte-large"  #"BAAI/bge-reranker-large-base"
endpoint_name = "gte-large-001"
namespace = "MoritzLaurer"  # your user or organization name

# check if endpoint with this name already exists from previous tests
available_endpoints_names = [endpoint.name for endpoint in huggingface_hub.list_inference_endpoints()]
if endpoint_name in available_endpoints_names:
    endpoint_exists = True
else: 
    endpoint_exists = False
print("Does the endpoint already exist?", endpoint_exists)

# create new endpoint
if not endpoint_exists:
    endpoint = create_inference_endpoint(
        endpoint_name,
        repository=repository,
        namespace=namespace,
        framework="pytorch",
        task="sentence-similarity",
        # see the available hardware options here: https://huggingface.co/docs/inference-endpoints/pricing#pricing
        accelerator="gpu",
        vendor="aws",
        region="us-east-1",
        instance_size="x1",
        instance_type="nvidia-a10g",
        min_replica=2,
        max_replica=4,
        type="protected",
        custom_image={
            "health_route":"/health",
            "env": {
                "MAX_BATCH_TOKENS":"16384",
                "MAX_CONCURRENT_REQUESTS":"512",
                "MAX_BATCH_REQUESTS": "124",
                "MODEL_ID": "/repository"},
            "url":"ghcr.io/huggingface/text-embeddings-inference:latest"
        }
    )
    print("Waiting for endpoint to be created")
    endpoint.wait()
    print("Endpoint ready")

# if endpoint with this name already exists, get existing endpoint
else:
    endpoint = huggingface_hub.get_inference_endpoint(name=endpoint_name, namespace=namespace)
    if endpoint.status in ["paused", "scaledToZero"]:
        print("Resuming endpoint")
        endpoint.resume()
    print("Waiting for endpoint to start")
    endpoint.wait()
    print("Endpoint ready")

Based on the docs here, I should be able to call it like this:

from huggingface_hub import InferenceClient
client = InferenceClient()
client.sentence_similarity(
    "Machine learning is so easy.",
    other_sentences=[
        "Deep learning is so straightforward.",
        "This is so difficult, like rocket science.",
        "I can't believe how much I struggled with this.",
    ],
    model=endpoint.url
)

This results in this (hard to interpret) error message: HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://c5hhcabur7dqwyj7.us-east-1.aws.endpoints.huggingface.cloud/ (Request ID: nEd4Xz) Make sure 'sentence-similarity' task is supported by the model.

It does work when making the /similarity route from TEI explicit:

from huggingface_hub import InferenceClient
client = InferenceClient()
client.sentence_similarity(
    "Machine learning is so easy.",
    other_sentences=[
        "Deep learning is so straightforward.",
        "This is so difficult, like rocket science.",
        "I can't believe how much I struggled with this.",
    ],
    model=endpoint.url + "/similarity"
)
# output: [0.9319057, 0.81048536, 0.75192505]

Seems like the route is not set correctly by the client.

Reproduction

No response

Logs

No response

System info

{'huggingface_hub version': '0.24.6',
 'Platform': 'Linux-5.10.205-195.807.amzn2.x86_64-x86_64-with-glibc2.31',
 'Python version': '3.9.5',
 'Running in iPython ?': 'Yes',
 'iPython shell': 'ZMQInteractiveShell',
 'Running in notebook ?': 'Yes',
 'Running in Google Colab ?': 'No',
 'Token path ?': '/home/user/.cache/huggingface/token',
 'Has saved token ?': True,
 'Who am I ?': 'MoritzLaurer',
 'Configured git credential helpers': 'store',
 'FastAI': 'N/A',
 'Tensorflow': 'N/A',
 'Torch': 'N/A',
 'Jinja2': '3.1.4',
 'Graphviz': 'N/A',
 'keras': 'N/A',
 'Pydot': 'N/A',
 'Pillow': 'N/A',
 'hf_transfer': 'N/A',
 'gradio': 'N/A',
 'tensorboard': 'N/A',
 'numpy': 'N/A',
 'pydantic': 'N/A',
 'aiohttp': 'N/A',
 'ENDPOINT': 'https://huggingface.co',
 'HF_HUB_CACHE': '/home/user/.cache/huggingface/hub',
 'HF_ASSETS_CACHE': '/home/user/.cache/huggingface/assets',
 'HF_TOKEN_PATH': '/home/user/.cache/huggingface/token',
 'HF_HUB_OFFLINE': False,
 'HF_HUB_DISABLE_TELEMETRY': False,
 'HF_HUB_DISABLE_PROGRESS_BARS': None,
 'HF_HUB_DISABLE_SYMLINKS_WARNING': False,
 'HF_HUB_DISABLE_EXPERIMENTAL_WARNING': False,
 'HF_HUB_DISABLE_IMPLICIT_TOKEN': False,
 'HF_HUB_ENABLE_HF_TRANSFER': False,
 'HF_HUB_ETAG_TIMEOUT': 10,
 'HF_HUB_DOWNLOAD_TIMEOUT': 10}
MoritzLaurer commented 2 weeks ago

(could maybe be useful to double check all the TEI routes (Swagger here) and related client methods to make sure that they work correctly)

Wauplin commented 2 weeks ago

Thanks for reporting these with a reproducible example @MoritzLaurer. I'm figuring out a solution to avoid this kind of problems where we don't call the correct endpoint because of difference between Inference API and Inference Endpoints (similar to https://github.com/huggingface/huggingface_hub/issues/2484). Will keep you posted.