vLLM endpoint support - Githubissues

s-m-palmier commented 6 months ago

[x] I have read and agree to the contributing guidelines.

Hello, I'm trying to connect a locally hosted LLM to to a prompt engine, we are deprecating our use of HuggingFaceTextGenInference TGI, and are switching to vLLM for future use. I tried using the OpenAIPromptDriver but it looks like it attempts to connect to openai servers regardless of the base_url provided.

I'd like a prompt driver that takes in an endpoint (vLLM) and allows for parameters to be passed for tuning, and authorization. Does one already exist? I'm new to this so I might be missing something

Thanks

collindutter commented 6 months ago

Hi @s-m-palmier, can you please share a minimal reproducible example of the OpenAiChatPromptDriver not using base_url correctly? We have used it with success on services like TogetherAi.

s-m-palmier commented 6 months ago

@collindutter I'm working on an endpoint that I can't send out, but basically - it's not an OpenAI endpoint. The LLM is hosted internally and was setup to run using TGI, but now it's moved to vLLM and i'm wondering if any of your prompt drivers can handle something more generic, like an inference endpoint, model_name, tuning parameters, and an api_token. The OpenAiPromptDriver seems close, but it fails to look outside of OpenAI servers.

import os
from griptape.structures import Agent
from griptape.drivers import OpenAiChatPromptDriver
from griptape.rules import Rule
from griptape.config import StructureConfig, StructureGlobalDriversConfig
from dotenv import load_dotenv

load_dotenv()

agent = Agent(
    config=StructureConfig(
        global_drivers=StructureGlobalDriversConfig(
            prompt_driver=OpenAiChatPromptDriver(
                base_url="https://[my_model_endpoint]",
                api_key=os.getenv('[API_TOKEN]'),
                temperature=0.1,
                max_tokens=2048,
                model="Mixtral-8x7B-Instruct-v0.1",
                seed=42,
            )
        )
    ),
    input_template="You will be provided with a sentence, please provide a one word classification of the sentiment. Sentence: {{ args[0] }}",
    rules=[
        Rule(
            value='Write your output in all caps'
        )
    ],
)

agent.run("I really hate it here.")

renders the following error:

WARNING:root:model not found. Using cl100k_base encoding. failed (SSLError HTTPSConnectionPool(host='openaipublic.blob.core.windows.net')

I don't want it to look on that host, I'm trying to reach my host at my url. how can I do that?

collindutter commented 6 months ago

Ah, that is coming from the OpenAiTokenizer which is trying to look up the Mixtral-8x7B-Instruct-v0.1 model. You should be able to resolve by giving OpenAiChatPromptDriver a Tokenizer that is built for Mixtral (from HuggingFace, for instance). You can also use SimpleTokenizer as a quick workaround.

collindutter commented 6 months ago

Also going to point to this comment which may assist. I am going to close this issue for now but please feel free to re-open if you still face issues!

griptape-ai / griptape

vLLM endpoint support #704