langchain-huggingface: Using ChatHuggingFace requires hf token for local TGI using localhost HuggingFaceEndpoint

avargasestay commented 2 months ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace

# This part works as expected
llm = HuggingFaceEndpoint(endpoint_url="http://127.0.0.1:8080")

# This part raises huggingface_hub.errors.LocalTokenNotFoundError
chat_llm = ChatHuggingFace(llm=llm)

Error Message and Stack Trace (if applicable)

Traceback (most recent call last): .venv/lib/python3.10/site-packages/langchain_huggingface/chat_models/huggingface.py", line 320, in init self._resolve_model_id()

.venv/lib/python3.10/site-packages/langchain_huggingface/chat_models/huggingface.py", line 458, in _resolve_model_id available_endpoints = list_inference_endpoints("*")

.venv/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 7081, in list_inference_endpoints user = self.whoami(token=token)

.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs)

.venv/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1390, in whoami headers=self._build_hf_headers(

.venv/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 8448, in _build_hf_headers return build_hf_headers(

.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs)

.venv/lib/python3.10/site-packages/huggingface_hub/utils/_headers.py", line 124, in build_hf_headers token_to_send = get_token_to_send(token)

.venv/lib/python3.10/site-packages/huggingface_hub/utils/_headers.py", line 158, in get_token_to_send raise LocalTokenNotFoundError(

huggingface_hub.errors.LocalTokenNotFoundError: Token is required (token=True), but no token found. You need to provide a token or be logged in to Hugging Face with huggingface-cli login or huggingface_hub.login. See https://huggingface.co/settings/tokens.

Description

I am trying to use langchain_huggingface library to connect to a TGI instance served locally. The problem is when wrapping a HuggingFaceEndpoint into ChatHuggingFace, it raises error requesting user token to be provided when it shouldn't be necessary a token when the model has already being downloaded and is serving locally.
There is a similar issue #23872 but the fix they mentioned doesn't work because adding the model_id parameter to the ChatHuggingFace doesn't avoid falling in the following case:

class ChatHuggingFace(BaseChatModel):
    """Hugging Face LLM's as ChatModels.
    ...
    """  # noqa: E501
    ...
    def __init__(self, **kwargs: Any):
        super().__init__(**kwargs)

        from transformers import AutoTokenizer  # type: ignore[import]

        self._resolve_model_id()  # ---> Even when providing the model_id it will enter here

        self.tokenizer = (
            AutoTokenizer.from_pretrained(self.model_id)
            if self.tokenizer is None
            else self.tokenizer
        )
    ...
    def _resolve_model_id(self) -> None:
        """Resolve the model_id from the LLM's inference_server_url"""

        from huggingface_hub import list_inference_endpoints  # type: ignore[import]

        if _is_huggingface_hub(self.llm) or (
            hasattr(self.llm, "repo_id") and self.llm.repo_id
        ):
            self.model_id = self.llm.repo_id
            return
        elif _is_huggingface_textgen_inference(self.llm):
            endpoint_url: Optional[str] = self.llm.inference_server_url
        elif _is_huggingface_pipeline(self.llm):
            self.model_id = self.llm.model_id
            return
        else: # This is the case we are in when _is_huggingface_endpoint() is True
            endpoint_url = self.llm.endpoint_url
        available_endpoints = list_inference_endpoints("*")  # ---> This line raises the error if we don't provide the hf token
        for endpoint in available_endpoints:
            if endpoint.url == endpoint_url:
                self.model_id = endpoint.repository

        if not self.model_id:
            raise ValueError(
                "Failed to resolve model_id:"
                f"Could not find model id for inference server: {endpoint_url}"
                "Make sure that your Hugging Face token has access to the endpoint."
            )

I was able to solve the issue by modifying the constructor method so when providing the model_id it doesn't resolve it:

class ChatHuggingFace(BaseChatModel):
    """Hugging Face LLM's as ChatModels.
    ...
    """  # noqa: E501

    ...
    def __init__(self, **kwargs: Any):
        super().__init__(**kwargs)

        from transformers import AutoTokenizer  # type: ignore[import]

        self.model_id or self._resolve_model_id()  # ---> Not a good solution because if model_id is invalid then the tokenizer instantiation will fail only if the tokinizer is not provided and also won't check other hf_hub inference cases

        self.tokenizer = (
            AutoTokenizer.from_pretrained(self.model_id)
            if self.tokenizer is None
            else self.tokenizer
        )

I imagine there is a better way to solve this, for example by adding some logic to check if the endpoint_url is a valid ip to request or if it is served with TGI or simply by checking if it's localhost:

class ChatHuggingFace(BaseChatModel):
    """Hugging Face LLM's as ChatModels.
    ...
    """  # noqa: E501

    ...
    def _resolve_model_id(self) -> None:
        """Resolve the model_id from the LLM's inference_server_url"""

        from huggingface_hub import list_inference_endpoints  # type: ignore[import]

        if _is_huggingface_hub(self.llm) or (
            hasattr(self.llm, "repo_id") and self.llm.repo_id
        ):
            self.model_id = self.llm.repo_id
            return
        elif _is_huggingface_textgen_inference(self.llm):
            endpoint_url: Optional[str] = self.llm.inference_server_url
        elif _is_huggingface_pipeline(self.llm):
            self.model_id = self.llm.model_id
            return
        elif _is_huggingface_endpoint(self.llm):  # ---> New case added to check url
            ...  # Take the following code with a grain of salt
            if is_tgi_hosted(self.llm.endpoint_url):
                if not self.model_id and not self.tokenizer:
                    raise ValueError("You must provide valid model id or a valid tokenizer")
                return
            ...
            endpoint_url = self.llm.endpoint_url
        else:  # ---> New last case in which no valid huggingface interface was provided
            raise TypeError("llm must be `HuggingFaceTextGenInference`, `HuggingFaceEndpoint`, `HuggingFaceHub`, or `HuggingFacePipeline`.") 
        available_endpoints = list_inference_endpoints("*")
        for endpoint in available_endpoints:
            if endpoint.url == endpoint_url:
                self.model_id = endpoint.repository

        if not self.model_id:
            raise ValueError(
                "Failed to resolve model_id:"
                f"Could not find model id for inference server: {endpoint_url}"
                "Make sure that your Hugging Face token has access to the endpoint."
            )

System Info

System Information

OS: Linux OS Version: #126-Ubuntu SMP Mon Jul 1 10:14:24 UTC 2024 Python Version: 3.10.14 (main, Jul 18 2024, 23:22:54) [GCC 11.4.0]

Package Information

langchain_core: 0.2.22 langchain: 0.2.10 langchain_community: 0.2.9 langsmith: 0.1.93 langchain_google_community: 1.0.7 langchain_huggingface: 0.0.3 langchain_openai: 0.1.17 langchain_text_splitters: 0.2.2

efriis commented 2 months ago

Believe this will be fixed by #23821 - will take a look if @Jofthomas doesn't have time!

avargasestay commented 2 months ago

Believe this will be fixed by #23821 - will take a look if @Jofthomas doesn't have time!

Hey @efriis, thanks for your answer! Looking at #23821 I don't think it'll solve the issue because that PR is improving the huggingface_token management inside HuggingFaceEndpoint and as I mentioned in the description, the HuggingFaceEndpoint works as expected with a localhost endpoint_url.

I strongly believe the problem is inside ChatHuggingFace as it is reaching line 458 (calling list_inference_endpoints("*") from huggingface_hub) when it shouldn't do it as the inference endpoint is served locally using TGI.

Jofthomas commented 2 months ago

You are right @avargasestay , My PR draft does not solve this issue. I'll provide a fix in my next commit. Thanks for bringing it to me.

fhkingma commented 1 month ago

@Jofthomas I don't see any PR's yet to solve this issue. Have you worked on it already? Was thinking of doing the commit otherwise.

Jofthomas commented 1 month ago

Not yet, but with the recent reworks of the inferenceClient, I'll do a refresher in langchain-huggingface code this weekend.

Simon-Stone commented 1 hour ago

Has there been any progress on this? I am currently stuck on the same issue.

langchain-ai / langchain