huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.89k stars 1.05k forks source link

Guidance not working with the serverless inference API #1717

Closed camilleborrett closed 6 months ago

camilleborrett commented 6 months ago

System Info

Model: mistralai/Mixtral-8x7B-Instruct-v0.1 Accessed through: https://api-inference.huggingface.co/models/mistralai/Mixtral-8x7B-Instruct-v0.1

Information

Tasks

Reproduction

import requests
from pydantic import BaseModel, conint
from typing import List
import huggingface_hub

class Animals(BaseModel):
    location: str
    activity: str
    animals_seen: conint(ge=1, le=5)  # Constrained integer type
    animals: List[str]

prompt = "convert to JSON: I saw a puppy a cat and a raccoon during my bike ride in the park"

data = {
    "inputs": prompt,
    "parameters": {
        "repetition_penalty": 1.3,
        "grammar": {
            "type": "json",
            "value": Animals.schema()
        }
    }
}

headers = {
    "Authorization": f"Bearer {huggingface_hub.get_token()}",
    "Content-Type": "application/json",
}

API_URL = "https://api-inference.huggingface.co/models/mistralai/Mixtral-8x7B-Instruct-v0.1"
response = requests.post(
    API_URL,
    headers=headers,
    json=data
)
print(response.json())

Expected behavior

I copy/pasted the "Constrain with Pydantic" code from this link: https://huggingface.co/docs/text-generation-inference/conceptual/guidance#constrain-with-pydantic to test using guidance with the 'mistralai/Mixtral-8x7B-Instruct-v0.1' model.

I would like to test my use case using the HF serverless inference API before launching my own endpoint. However, I get the following error message: {'error': 'Request failed during generation: Server error: ', 'error_type': 'generation'}. Is guidance not supported on the API inference? (It does work without guidance)

My understanding is that the HF API is running on version 1.4.3., which should be compatible.

The only modification I made to the code in the example link is adding "Authorization": f"Bearer {huggingface_hub.get_token()}" to the headers and using https://api-inference.huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1 as the URL.

camilleborrett commented 6 months ago

It now suddenly works, while it didn't work during the entire day. Closing this issue.

OlivierDehaene commented 6 months ago

I updated the backend to 1.4.5. We had a bug with batching grammar requests in 1.4.3.

However there is still something weird at play here as the model seems to only partially follow the grammar. I pinged someone internally to look into it.