Feature request: Add documentation and examples for adding additional API endpoints.

huggingface / text-generation-inference

Large Language Model Text Generation Inference

http://hf.co/docs/text-generation-inference

Apache License 2.0

9.11k stars 1.07k forks source link

Feature request: Add documentation and examples for adding additional API endpoints. #2321

Open michael-conrad opened 3 months ago

michael-conrad commented 3 months ago

Feature request

I would like to be able to use guidelines or other libraries that support constrained output with HF endpoints.

Reference: A guidance language for controlling large language models.

Motivation

I am wanting to use a library such as guidance for constrained generating using a HF inference endpoint so that we can use larger models that exceed our local computational capabilities.

Your contribution

Don't know where to start for adding an API endpoint to the existing TGI config.

ErikKaum commented 3 months ago

Hi @michael-conrad 🙌

We have structured generation support in TGI through outlines.

Would this solve your problem?

docs: https://huggingface.co/docs/text-generation-inference/basic_tutorials/using_guidance

michael-conrad commented 3 months ago

ErikKaum commented 3 months ago

Okay thanks for pointing that out 👍

One thing that's probably good to clarify is that TGI and the Inference Endpoints are not dependent on each other. They are two separate things.

TGI is an LLM server, and can be deployed on Inference Endpoints, but can be deployed to any other cloud provider as a container.
You can deploy your own container to Inference Endpoints, TGI is just the most common one that people use.

So in this case I think adding a custom Inference Handler (also linked in the guidance issue) is the way to go. TGI per se doesn't have a config to add new endpoints.

Does this make sense?