huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.84k stars 1.04k forks source link

[Feature] Langchain compatability #253

Open darth-veitcher opened 1 year ago

darth-veitcher commented 1 year ago

Similar to the work performed langchain-llm-api I would like to see the ability to use this natively within langchain. Are there any plans to do so such that the models could be presented back as generate and embed endpoints for use?

dcbark01 commented 1 year ago

This is actually pretty easy to implement as is with the HF inference server since Langchain supports wrapping custom models (example below is taken nearly verbatim from the Langchain docs). You can do something like this, just adjust the host name / ports to your liking:

import os
from typing import Any, List, Mapping, Optional

from text_generation import Client
from langchain.llms.base import LLM

LLM_HOST = os.environ.get('LLM_HOST', '0.0.0.0')
LLM_PORT = os.environ.get('LLM_PORT', 6018)
client = Client(f"http://{LLM_HOST}:{LLM_PORT}")

class CustomLLM(LLM):
    name: str
    temperature: float = 0.8
    max_new_tokens: int = 100
    stream: bool = False

    @property
    def _llm_type(self) -> str:
        return "custom"

    def _call(
            self,
            prompt: str,
            stop: Optional[List[str]] = None,
            run_manager: Optional[CallbackManagerForLLMRun] = None,
    ) -> str:
        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        if not self.stream:
            reply = client.generate(prompt, max_new_tokens=self.max_new_tokens).generated_text
            # print(reply)
            return reply
        else:
            raise NotImplementedError

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying parameters."""
        return {"name": self.name}

if __name__ == "__main__":

    query = 'Question: How old is Barack Obama? Answer:'
    llm = CustomLLM(name='local_llm')
    resp = llm(query)
    print(resp)

I haven't been able to get the streaming part to work yet, but I think Langchain is working on some updates on their end that should make that work soon.

rahuldshetty commented 1 year ago

langchain team has already built this integration: https://python.langchain.com/en/latest/modules/models/llms/integrations/huggingface_textgen_inference.html

dcbark01 commented 1 year ago

langchain team has already built this integration: https://python.langchain.com/en/latest/modules/models/llms/integrations/huggingface_textgen_inference.html

Yes, and we also have the streaming feature working in that implementation, contrary to my original post.

Unfortunately, it doesn't support embedding; it's inference only. This fork of the repo aims to support embedding, but to my knowledge it isn't working yet.

ArnaudHureaux commented 1 year ago

@dcbark01 is there an API/repo for the embedding similar to "huggingface/text-generation-inference" ? i didn't found it

dcbark01 commented 1 year ago

@ArnaudHureaux, unfortunately the answer right now is (to my knowledge) 'no', there isn't a similar option available for embeddings. This is a major hole in the LLM ecosystem IMO, so it is something I am actively working on fixing. In fact, I already have a solution implemented, I'm just working with my current employer at the moment to open-source it. We're an academic research outfit, so I expect we'll get the approval to do so, but it may take a couple weeks. I'll be sure to comment back on this issue if/when we get it approved.

npuichigo commented 9 months ago

@ArnaudHureaux https://github.com/huggingface/text-embeddings-inference/tree/main

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.