langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.06k stars 14.65k forks source link

Handling huggingfacehub_api_token=None for HuggingFaceEndpoint #20342

Closed martj001 closed 3 months ago

martj001 commented 4 months ago

Checked other resources

Example Code

from langchain_community.llms import HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    endpoint_url="http://localhost:8010/",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
    huggingfacehub_api_token=None
)

Error Message and Stack Trace (if applicable)

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[2], line 3
      1 from langchain_community.llms import HuggingFaceEndpoint
----> 3 llm = HuggingFaceEndpoint(
      4     endpoint_url="http://localhost:8010/",
      5     max_new_tokens=512,
      6     top_k=10,
      7     top_p=0.95,
      8     typical_p=0.95,
      9     temperature=0.01,
     10     repetition_penalty=1.03,
     11     huggingfacehub_api_token=None
     12 )

File ~/Jupyter/llm/venv/lib/python3.10/site-packages/langchain_core/load/serializable.py:120, in Serializable.__init__(self, **kwargs)
    119 def __init__(self, **kwargs: Any) -> None:
--> 120     super().__init__(**kwargs)
    121     self._lc_kwargs = kwargs

File ~/Jupyter/llm/venv/lib/python3.10/site-packages/pydantic/v1/main.py:341, in BaseModel.__init__(__pydantic_self__, **data)
    339 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
    340 if validation_error:
--> 341     raise validation_error
    342 try:
    343     object_setattr(__pydantic_self__, '__dict__', values)

ValidationError: 1 validation error for HuggingFaceEndpoint
__root__
  Could not authenticate with huggingface_hub. Please check your API token. (type=value_error)

Description

Background

While restructuring our codebase in response to the deprecation of HuggingFaceTextGenInference, I encountered an error when attempting to create a HuggingFaceEndpoint with a locally hosted TGI server.

Issue

The error occurs in the validate_environment function of the huggingface_endpoint.py file, specifically in the lines 170-179.

The @root_validator() decorator throws an error when huggingfacehub_api_token is passed as None, which happens due to login(token=huggingfacehub_api_token) in validate_environment function. By commenting out the block that processes the API token and manually setting huggingfacehub_api_token to None, I am able to successfully create an InferenceClient.

Since HuggingFaceTextGenInference is fused into HuggingFaceEndpoint in PR #17254, we need to add logic to handle cases where huggingfacehub_api_token is passed as None or when no environment variable HUGGINGFACEHUB_API_TOKEN is set. This is particularly necessary for setups using a locally hosted TGI server where authentication with the Huggingface Hub may not be required.

System Info

huggingface-hub==0.22.2 langchain-commnity==0.0.32

platform: linux python version: 3.10

2016bgeyer commented 4 months ago

I also am having this issue. Please fix!

NunoSantos2021 commented 4 months ago

Hello, I'm also having this issue. I need to run an inference server locally. I should require to use Hugging Face API Token, since it's my local TGI.

ggbetz commented 4 months ago

Hi. Let me add that this issue also persists when using a token that is issued by HuggingFace's oath service. https://www.gradio.app/guides/sharing-your-app#o-auth-login-via-hugging-face

That's because you can use these tokens for accessing inference API, but not for logging in. But token validity is checked by trying to login to hub, as detailed above.

As a workaround I subclassed HuggingFaceEndpoint:

from langchain_community.llms.huggingface_endpoint import HuggingFaceEndpoint
from langchain_core.pydantic_v1 import root_validator
from langchain_core.utils import get_from_dict_or_env

class LazyHuggingFaceEndpoint(HuggingFaceEndpoint):
    """LazyHuggingFaceEndpoint"""
    # We're using a lazy endpoint to avoid logging in with hf_token,
    # which might in fact be a hf_oauth token that does only permit inference,
    # not logging in.

    @root_validator()
    def validate_environment(cls, values: Dict) -> Dict:
        """Validate that package is installed; SKIP API token validation."""
        try:
            from huggingface_hub import AsyncInferenceClient, InferenceClient

        except ImportError:
            msg = (
                "Could not import huggingface_hub python package. "
                "Please install it with `pip install huggingface_hub`."
            )
            raise ImportError(msg)  # noqa: B904

        huggingfacehub_api_token = get_from_dict_or_env(
                values, "huggingfacehub_api_token", "HUGGINGFACEHUB_API_TOKEN"
            )

        values["client"] = InferenceClient(
            model=values["model"],
            timeout=values["timeout"],
            token=huggingfacehub_api_token,
            **values["server_kwargs"],
        )
        values["async_client"] = AsyncInferenceClient(
            model=values["model"],
            timeout=values["timeout"],
            token=huggingfacehub_api_token,
            **values["server_kwargs"],
        )

        return values

Might also help here: https://github.com/langchain-ai/langchain/issues/19685

bitsofinfo commented 4 months ago

experiencing the same, I'd like to be able to talk to a local TDI that has no auth, but I can't

dbkinghorn commented 4 months ago

This is a showstopper for me. It would be nice here an acknowledgment of the issue. I'm sorry I don't have a pull request.

bitsofinfo commented 4 months ago

i've tried commenting out the offending block per @martj001 original comment.

Here is my setup

  1. on a remote server I have hugging face text-generation-inference running:
model=meta-llama/Meta-Llama-3-8B-Instruct 

docker run     --gpus all     --shm-size 1g     \
-p 8080:80     -v $volume:/data     \
-e HUGGING_FACE_HUB_TOKEN=$token     \
ghcr.io/huggingface/text-generation-inference:2.0.2     \
--model-id $model     --quantize bitsandbytes-fp4     \
--max-input-length 8000     --max-total-tokens 8192
  1. I can verify its accessible :8080

  2. I have chat-ui running on the same VM with the following model config:


# 'name', 'userMessageToken', 'assistantMessageToken' are required
MODELS=`[
    {
      "name": "meta-llama/Meta-Llama-3-8B-Instruct",
      "displayName": "meta-llama/Meta-Llama-3-8B-Instruct",
      "description": "meta-llama/Meta-Llama-3-8B-Instruct",
      "multimodal" : false,
      "websiteUrl": "https://meta.ai",
      "userMessageToken": "",
      "userMessageEndToken": "<|eot_id|>",
      "chatPromptTemplate": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>{{@root.preprompt}}<|eot_id|>{{#each messages}}{{#ifUser}}<|start_header_id|>user<|end_header_id|>{{content}}<|eot_id|>{{/ifUser}}{{#ifAssistant}}<|start_header_id|>assistant<|end_header_id|>{{content}}<|eot_id|>{{/ifAssistant}}{{/each}}<|start_header_id|>assistant<|end_header_id|>",
      "parameters": {
          "temperature": 0.9,
          "top_p": 0.95,
          "top_k": 50,
          "truncate": 4096,
          "max_new_tokens": 4096,
          "stop": [
              "<|start_header_id|>",
              "<|end_header_id|>",
              "<|eot_id|>"
          ]
      },
      "endpoints": [
          {
              "type": "tgi",
              "url": "http://127.0.0.1:8080"
          }
      ]
    }
]`
  1. My remote chat-ui works fine @ http://:5173/ and interacts w/ the TGI hosted llama3 model no problem.

  2. On another remote machine I'm trying to run a langchain chain, under chainlit. The only way I can get this to partially work is by commenting out the following in lanchain_community/llms/huggingface_endpoint.py

    @root_validator()
    def validate_environment(cls, values: Dict) -> Dict:
        """Validate that package is installed and that the API token is valid."""
        #try:
        #    from huggingface_hub import login
    
        #except ImportError:
        #    raise ImportError(
        #        "Could not import huggingface_hub python package. "
        #        "Please install it with `pip install huggingface_hub`."
        #    )
        #try:
            #huggingfacehub_api_token = get_from_dict_or_env(
            #    values, "huggingfacehub_api_token", "HUGGINGFACEHUB_API_TOKEN"
            #)
            #login(token=huggingfacehub_api_token)
        #except Exception as e:
        #    raise ValueError(
        #        "Could not authenticate with huggingface_hub. "
        #        "Please check your API token."
        #    ) from e
    ...
  3. I construct it like this: (HuggingFaceEndpoint I can't get to work)

    HuggingFaceTextGenInference(
        inference_server_url="http://<tgi.vm.ip>:8080",
        max_new_tokens=256,
        top_k=10,
        top_p=0.95,
        typical_p=0.95,
        temperature=0.8,
        repetition_penalty=1.03,
        streaming=True,
        timeout=30
        )

This sort of works, but I constantly get timeouts and non-responses. Its hard to tell whats going on with this without better logging.