Support for huggingface/text-generation-inference

sam-h-bean commented 1 year ago

This library from HF is pretty great and I get use out of it in production settings for LLMs. Would love to figure out how to integrate a system like this for LLM safety with it so I can use HF models, get dynamic batching, and be able to stream tokens with the guidance library!

sheenobu commented 1 year ago

Are you using the client to connect to a running text-generation-inference server? You would probably create your own subclass of guidance.llms.LLM . If the text-generation-inference server is OpenAI compatible (i don't think it is...) then you would be able to try the OpenAI client.

I'll take a look at it as I'm testing out the guidance library and huggingface/text-generation-inference looks compelling. No promises, not a microsoft employee, et-cetera.

slundberg commented 1 year ago

@sheenobu let us know what you find out!

There are two ways to support this. The first is just to create a LLM backend like OpenAI's, that's the first step (that I think you are looking into).

Second, I was planning to work on the remote inference story for guidance here soon, it is still in flux a bit. But some keys aspects will be:

One of the goals of guidance is to be able to send a whole program to a remote server for high speed inference/control (lot's of fine grained template control with no REST overhead etc).
When user-level functions are called we pause the program execution (like await does), and send it back to the client, which can eval the user function until it gets to a command that uses the LLM, then it sends it back to the server for more eval.
All this should happen while allows seamless streaming of results for the client.

I just share the above for context, it is not implemented fully yet :)

bdambrosio commented 1 year ago

this answers my question (issue #48) as well? Any ideas on timeline?

sheenobu commented 1 year ago

Looks like much of OpenAIs guidance.llms.LLM implementation applies for text-generation-inference since they both support standard REST calls. I'm surprised the OpenAI one isnt using aiohttp instead of requests, considering its in an asyncio context anyway but I'm open to being told i'm missing something.

This was a very messy version I got working. https://gist.github.com/sheenobu/9bdd03609e2b1125a3cfd7e5cbd046fc . if you are desperate you could probably extend guidance.llms.OpenAI and override the critical methods.

Like I side elsewhere, I'll have to drop this work for now. Thanks.

sam-h-bean commented 1 year ago

I can take a swing at implementing the rest

bdambrosio commented 1 year ago

That would be GREAT, I haven't had much luck. I do have a llms compatible server with access between encode and generate, and streaming access between generate a decode, if we need any server-side to get full guidance capability...

andreykurenkov commented 1 year ago

+1 to this being a feature that would be useful! It's not critical for us yet, but could give it a try if @sam-h-bean doesn't finish up.

HarshTrivedi commented 1 year ago

+1 to this feature request.

faizanahemad commented 1 year ago

+1 to this

zacharyblank commented 1 year ago

+1

nkey0 commented 1 year ago

+2

slchenchn commented 1 year ago

Looks like much of OpenAIs guidance.llms.LLM implementation applies for text-generation-inference since they both support standard REST calls. I'm surprised the OpenAI one isnt using aiohttp instead of requests, considering its in an asyncio context anyway but I'm open to being told i'm missing something.

This was a very messy version I got working. https://gist.github.com/sheenobu/9bdd03609e2b1125a3cfd7e5cbd046fc . if you are desperate you could probably extend guidance.llms.OpenAI and override the critical methods.

Like I side elsewhere, I'll have to drop this work for now. Thanks.

It seems the provided Gist link is no longer valid. Could you kindly re-upload the code or provide an updated link? Thank you!

julienripoche commented 1 year ago

+1

julienripoche commented 1 year ago

Actually, it is really possible to extend full fledge guidance to Text Generation Inference ? For example, what could we do about additional logits_processors such as TokenHealingLogitsProcessor ?

marioplumbarius commented 12 months ago

@sam-h-bean have you managed to put a PR together for this?

guidance-ai / guidance

Support for huggingface/text-generation-inference #33