Support for skipping OpenAI token limit validation during vLLM-Langchain integrations

alexzfan commented 9 months ago

vLLM offers an OpenAI-like inference server endpoint that allows it to be dropped into applications that currently use the OpenAI Protocol. This has been integrated into the python langchain package (https://python.langchain.com/docs/integrations/llms/vllm), but not the ruby gem. Hacking around it by using the base OpenAI constructor such as below leads to token length/limit validation errors

Langchain::LLM::OpenAI.new(
    api_key: "EMPTY",
    llm_options: {
        uri_base: "https://localhost:8000/v1/",
        extra_headers: {
             "Content-Type": "application/json",
         },
    default_options: {
         temperature: 0.2,
         chat_completion_model_name: "tiiuae/falcon-7b"
    }
)

Is there any plans to integrate this fully or has anyone been able to find a different solution to this?

andreibondarev commented 9 months ago

@alexzfan I'm okay with the optional max_token validation.

I haven't heard of vLLM before. What kind of integration were you thinking of?

alexzfan commented 9 months ago

@andreibondarev I was thinking a wrapper around the OpenAI class that's currently in the library. It seems like the python package does this as well https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/llms/vllm.py

patterns-ai-core / langchainrb

Support for skipping OpenAI token limit validation during vLLM-Langchain integrations #400