opea-project / GenAIComps

GenAI components at micro-service level; GenAI service composer to create mega-service
Apache License 2.0
76 stars 140 forks source link

[RFC] OPEA Inference Microservices Integration for LangChain #831

Open avinashkarani opened 4 weeks ago

avinashkarani commented 4 weeks ago

OPEA Inference Microservices Integration for LangChain

This RFC proposes the integration of OPEA inference microservices (from GenAIComps) into LangChain [extensible to other frameworks], enabling conversational AI, embeddings, and LLM-based applications powered by OPEA services.

Author(s)

Avinash Karani Raghavendra Bhat

Status

Under Review

Objective

The primary goal of this RFC is to provide a new framework extension called langchain-opea-endpoints that simplifies the integration of OPEA inference microservices. This extension will empower developers to use OPEA services, including chat models, semantic embeddings, rerankers, and LLM inference, within LangChain (This will be extensible to other frameworks).

Motivation

Developers increasingly rely on modular frameworks like LangChain for LLM applications, embeddings, and search optimization. By integrating OPEA microservices, developers will gain access to unique inference capabilities. This proposal offers feature parity with other industry tools (e.g., Nvidia NIM, AWS Beadrock and OpenAI) while introducing the benefits of OPEA inference.

Design Proposal

Overview of langchain-opea-endpoints Python Package This package will act as a bridge between OPEA’s microservices and LangChain, providing key functionalities like chat models, embeddings, reranking, and LLM interfaces. It will offer clear APIs and callbacks for seamless integration.

  1. Chat Models (ChatOPEA)

    The ChatOPEA class will serve as the main interface for interacting with OPEA’s conversational models. Like ChatOpenAI, Nvidia’s ChatNVIDIA, it will connect LangChain applications to OPEA-hosted LLM inference service, providing context-aware conversations across domains like support chatbots and virtual assistants.

    Example Usage:

    from langchain_opea.chat_models import ChatOPEA
    
    chat = ChatOPEA(base_url=http://localhost:8000/v1,model=””, api_key="<none>")
    
    response = chat.invoke("How does LangChain work?")
    print(response)
  2. Embeddings (OPEAEmbeddings)

The OPEAEmbeddings class enables semantic embeddings through OPEA microservices, providing highly relevant vector embeddings that extend Hugging Face’s embedding classes [More relevant for TEI based service]. Example Usage:

from langchain_opea.embeddings import OPEAEmbeddings

embedding_model = OPEAEmbeddings(model="opea-embed-v1")
embedding = embedding_model.embed_query("Exploring AI capabilities.")
print(embedding)
  1. Reranker (OPEAReranker) The OPEAReranker class will provide a document reranking API that integrates OPEA’s ranking microservice. It will allow developers to optimize document retrieval pipelines with contextual re-ranking
    
    from langchain_opea.reranking import OPEAReranker

reranker = OPEAReranker(model="opea-rank-v1") ranked_results = reranker.rerank(query="AI trends", documents=["AI in healthcare", "AI in robotics"]) print(ranked_results)


4. LLM Interface (OPEALLM)
This interface will provide completion-based APIs for using OPEA’s LLMs, extending LangChain’s core LLM class. It will facilitate seamless interaction with OPEA’s completion services, similar to Nvidia’s use of the vLLMOpenAI framework.

from langchain_opea.reranking import OPEAReranker

reranker = OPEAReranker(model="opea-rank-v1") ranked_results = reranker.rerank(query="AI trends", documents=["AI in healthcare", "AI in robotics"]) print(ranked_results)

5. Callbacks and Monitoring
In LangChain, callbacks and monitoring play an essential role in managing, observing, and optimizing the execution of operations like API calls, model usage, and token consumption. proposed OPEA integration provide several key benefits:

  - Usage tracking to monitor token consumption and API costs.
  - Debugging tools to capture errors and trace outputs.
  - Performance insights to optimize latency and manage workflows.
  - Workflow triggers to handle multi-step processes and events efficiently.
  - Custom logging for analytics and audit purposes.

from langchain_opea.callbacks import UsageCallbackHandler

callback = UsageCallbackHandler() with callback.monitor(): response = chat.invoke("What is edge computing?")



________________________________________

## Alternatives Considered

Using OpenAI Model serving Framework’s langchain interface:
Pros: 
1.  OPEA will have full control and scope to add new features.
2.  Even though OPEA microservices API’s or compatible with openAI, Model serving frameworks has additional API for monitoring and auditing Inference. These will not be available with openAI langchain.
3.  Removes guess work for developers to find right langchain class which works for OPEA microservice.
Cons: 
1.  Developer needs to update the application code. 
2.  Complexity of maintaining python package.

## Compatibility

•   Backward Compatibility: This package will extend, not replace, existing LangChain components to ensure no disruption to existing workflows.
•   Workflow Compatibility: OPEA classes will work alongside existing OpenAI interfaces for a smooth developer experience.
## Conclusion
This RFC proposes the creation of a langchain-opea-endpoints package that will integrate OPEA microservices with LangChain. This solution will extend the capabilities of conversational AI, embeddings, and LLM-based inference, offering a modular and robust developer experience.

## Miscellaneous

Work in progress to support native inference Engines under OPEA without having a model serving framework.
mkbhanda commented 3 weeks ago

LLM Interface (OPEALLM) seems to have a cut and paste error :-)

Could this code be resident in Langchain repo to increase OPEA visibility. Of course OPEA advertise that it has langchain integration.

If it lives in langchain repo, how do we test it to ascertain OPEA has not broken the intragration? Would they run nightly tests? Would OPEA trigger a test each time its images change for embedder/re-ranker/llm?

How should OPEA monitor if there are changes/new things to support for Langchain?

Do we want to add somewhere that for now the integration spans x, y, z and no audio or video etc.

avinashkarani commented 2 weeks ago

This package can be unstreamed to Langchain as partner package for joint maintenance. Before langchain accepts this as partner package, we can have this in OPEA repo and maintained for one stable version of langchain.

There are a few different places you can contribute integrations for Lang Chain: Community: For lighter-weight integrations that are primarily maintained by LangChain and the Open Source Community. Partner Packages: For independent packages that are co-maintained by LangChain and a partner.