Open sam-h-bean opened 1 year ago
Are you using the client to connect to a running text-generation-inference server? You would probably create your own subclass of guidance.llms.LLM
. If the text-generation-inference server is OpenAI compatible (i don't think it is...) then you would be able to try the OpenAI client.
I'll take a look at it as I'm testing out the guidance library and huggingface/text-generation-inference
looks compelling. No promises, not a microsoft employee, et-cetera.
@sheenobu let us know what you find out!
There are two ways to support this. The first is just to create a LLM backend like OpenAI's, that's the first step (that I think you are looking into).
Second, I was planning to work on the remote inference story for guidance here soon, it is still in flux a bit. But some keys aspects will be:
I just share the above for context, it is not implemented fully yet :)
this answers my question (issue #48) as well? Any ideas on timeline?
Looks like much of OpenAIs guidance.llms.LLM
implementation applies for text-generation-inference since they both support standard REST calls. I'm surprised the OpenAI one isnt using aiohttp instead of requests, considering its in an asyncio context anyway but I'm open to being told i'm missing something.
This was a very messy version I got working. https://gist.github.com/sheenobu/9bdd03609e2b1125a3cfd7e5cbd046fc . if you are desperate you could probably extend guidance.llms.OpenAI and override the critical methods.
Like I side elsewhere, I'll have to drop this work for now. Thanks.
I can take a swing at implementing the rest
That would be GREAT, I haven't had much luck. I do have a llms compatible server with access between encode and generate, and streaming access between generate a decode, if we need any server-side to get full guidance capability...
+1 to this being a feature that would be useful! It's not critical for us yet, but could give it a try if @sam-h-bean doesn't finish up.
+1 to this feature request.
+1 to this
+1
+2
Looks like much of OpenAIs
guidance.llms.LLM
implementation applies for text-generation-inference since they both support standard REST calls. I'm surprised the OpenAI one isnt using aiohttp instead of requests, considering its in an asyncio context anyway but I'm open to being told i'm missing something.This was a very messy version I got working. https://gist.github.com/sheenobu/9bdd03609e2b1125a3cfd7e5cbd046fc . if you are desperate you could probably extend guidance.llms.OpenAI and override the critical methods.
Like I side elsewhere, I'll have to drop this work for now. Thanks.
It seems the provided Gist link is no longer valid. Could you kindly re-upload the code or provide an updated link? Thank you!
+1
Actually, it is really possible to extend full fledge guidance to Text Generation Inference ?
For example, what could we do about additional logits_processors
such as TokenHealingLogitsProcessor
?
@sam-h-bean have you managed to put a PR together for this?
This library from HF is pretty great and I get use out of it in production settings for LLMs. Would love to figure out how to integrate a system like this for LLM safety with it so I can use HF models, get dynamic batching, and be able to stream tokens with the guidance library!