Adding support for 0-shot classification pipeline with TEI on HF inference endpoints

MoritzLaurer commented 7 months ago

Feature request

It would be amazing if 0-shot text classifiers that are designed to work with the HF zeroshot pipeline were supported by TEI and HF inference endpoints.

I tried a deployment like this:

from huggingface_hub import create_inference_endpoint

endpoint = create_inference_endpoint(
    name="roberta-base-zeroshot-v2-0-test4",  #"roberta-emotions-test2",  #"roberta-base-zeroshot-v2-0-test",
    repository="MoritzLaurer/roberta-base-zeroshot-v2.0",  #"SamLowe/roberta-base-go_emotions",  #"MoritzLaurer/roberta-base-zeroshot-v2.0",
    namespace="MoritzLaurer",
    framework="pytorch",
    task="zero-shot-classification",  #"zero-shot-classification",  #"text-classification",
    accelerator="gpu",
    vendor="aws",
    region="us-east-1",
    type="protected",
    min_replica=0,
    max_replica=1,
    instance_type="g5.2xlarge",  # options: https://github.com/huggingface/hf-endpoints/issues/1090#issuecomment-1909482979
    instance_size="medium",
    custom_image={
        "health_route": "/health",
        # params: https://github.com/huggingface/text-embeddings-inference?tab=readme-ov-file#docker
        "env": {
            #"MAX_BATCH_TOKENS": "16384",
            #"MAX_CONCURRENT_REQUESTS": "512",
            #"DTYPE": "float16",
            "MODEL_ID": "/repository",
        },
        "url": "ghcr.io/huggingface/text-embeddings-inference:86-1.2",  # options: https://github.com/huggingface/text-embeddings-inference?tab=readme-ov-file#docker-images
    },
)

But local inference with the TEI endpoint seems to ignore the 0-shot pipeline parameters:

import requests

API_URL = endpoint.url   #+ "/predict" #"https://p541kpj2wbtdybru.us-east-1.aws.endpoints.huggingface.cloud"
headers = {
    "Accept" : "application/json",
    "Authorization": f"Bearer {os.getenv('HF_TOKEN')}",
    "Content-Type": "application/json" 
}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

output = query({
    "inputs": "I like you. I love you",
    # https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.ZeroShotClassificationPipeline.__call__
    "parameters": {
        "hypothesis_template": "This example is {}",
        "candidate_labels": ["joy", "anger", "sadness", "surprise", "fear", "love", "hate"],
        "multi_label": False,
    }
})

print(output)
#[{'score': 0.57349926, 'label': 'entailment'},
#{'score': 0.4265007, 'label': 'not_entailment'}]

The output is the output from the model as if it were used as a normal classifier. The expectation would be that the task="zero-shot-classification" flag changes how the input is processed internally in accordance with the 0-shot pipeline and output probabilities for each class in "candidate_labels".

In the endpoint playground, the model deployed with the TEI container produces the following error:

The same model with the same deployment code works, if I do not use a custom_image with a TEI container. I imagine that this is because the 0-shot pipeline is not supported by TEI? (Not sure if changes to inference endpoints would be required for this as well)

Note: One API call with one text an 8 candidate labels requires 8 forward-passes in the model (one for each label), given how the 0-shot pipeline and NLI-based 0-shot models work. Not sure to what extent this complicates things for TEI and things like continuous batching.

Motivation

Zeroshot classifiers are downloaded millions of times via the HF Hub and are part of the default models in the HF inference endpoint catalogue. See also this internal thread on upcoming new 0-shot classifiers.

@OlivierDehaene

Your contribution

Happy to contribute to this feature

delibae commented 7 months ago

I too am interested in seeing this feature realized.

vrdn-23 commented 3 months ago

+1 for this if this is still on your radar @OlivierDehaene

huggingface / text-embeddings-inference