Closed PeytonCleveland closed 2 months ago
If vLLM
is OpenAI-compatible, you can use the OpenAI provider with custom settings: https://sdk.vercel.ai/providers/ai-sdk-providers/openai#provider-instance
The vLLM Chat completions API is now more strictly OpenAI-compatible, including tool calls and tool streaming per vllm-project/vllm#8272
I detailed how to use createOpenAI
to get everything set up in that issue, so it should work out-of-the-box for you if you're using Hermes or Mistral models @PeytonCleveland
Feature Description
Overview
The AI SDK supports numerous commercial providers, such as OpenAI, Anthropic, etc. However, there are many instances where a self-hosted inference server is more appropriate or necessary due to security or data privacy concerns. vLLM is a popular choice for use cases, and SDK support would greatly simplify the creation of generative AI applications making use of RSCs and self-hosted inference servers.
vLLM OpenAI-compatible API server
vLLM exposes an OpenAI-compatible API, meaning implementation here should look very similar to the existing OpenAI provider: vLLM Docs
Use Case
Self Hosted Models
SDK users wishing to integrate a self hosted inference server into their application would benefit greatly from support for vLLM. There are a large number of domains that may wish to make use of generative AI, such as government, military, healthcare, financial, etc., but have strict requirements around protecting data and thus necessitate the use of self-hosted solutions.
RSCs + GenAI
RSCs and the AI SDK can greatly simplify and reduce the amount of effort needed to build generative AI applications. However, without support for self-hosted models the number of teams able to make use of this SDK will be limited. Adding this support makes the AI SDK an ideal solution for those using Next, RSCs, and self hosted infrastructure.
Additional context
Background
I've implemented generative AI features in a number of applications within the public sector. Currently, getting all the pieces working well together is a major headache and architecture can get quite convoluted. All my projects use Next, and while there are a number of great projects out there like Langchain and LlamaIndex, the JS versions always lag behind the Python versions and are missing support for things like vLLM. This project seems like an ideal solution for those using Next and exactly what I've been wishing for, just need to support vLLM to be able to use it 😄