HuggingFaceEndpoint: skip login to hub with oauth token

ggbetz commented 4 months ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_huggingface.llms import HuggingFaceEndpoint

token = "<TOKEN_WITH_FINBEGRAINED_PERMISSIONS>"

llm = HuggingFaceEndpoint(
    endpoint_url='https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta',
    token=token,
    server_kwargs={
        "headers": {"Content-Type": "application/json"}
    }
)
resp = llm.invoke("Tell me a joke")
print(resp)

Error Message and Stack Trace (if applicable)

No response

Description

With the PR https://github.com/langchain-ai/langchain/pull/22365, login to hf hub is skipped while validating the environment during initializing HuggingFaceEndpoint IF token is None, which resolves case in which we have local TGI (https://github.com/langchain-ai/langchain/issues/20342).

However, we might want to construct HuggingFaceEndpoint with

fine-grained token, which allow accessing InferenceEndpoint, but cannot be used for logging in
user-specific oauth tokens, which also don't allow logging in, but which can be used to access inference api.

These cases are not handled.

System Info

generic

mirkenstein commented 4 months ago

I am not able to reproduce this bug I generated fine grained token with only these permissions: Inference

[x] Make calls to the serverless Inference API
- [x] Make calls to Inference Endpoints
- [x] Manage Inference Endpoints

import huggingface_hub
from langchain_huggingface.llms import HuggingFaceEndpoint

huggingfacehub_finegrained_api_token="hf_xxxx"
huggingface_hub.logout()
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
llm = HuggingFaceEndpoint(
    huggingfacehub_api_token=huggingfacehub_finegrained_api_token,
    repo_id=repo_id,

)
print(llm.invoke("Tell me a joke"))

Output:

Successfully logged out.
The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to ~/.cache/huggingface/token
Login successful
...

Adding the token in the header seems to skip the login.

llm = HuggingFaceEndpoint(
     repo_id=repo_id,
    server_kwargs={
        "headers": {"Content-Type": "application/json"
                     ,"Authorization": f"Bearer {huggingfacehub_finegrained_api_token}"
                    }
    }
)
print(llm.invoke("Tell me a joke"))

Output:

Successfully logged out.
, please!

Why did the tomato turn red?

Because it saw the salad dressing!

If I remove the token from the header then I get the error Rate limit reached. Please log in or use a HF access token

llm = HuggingFaceEndpoint(
     repo_id=repo_id,
    server_kwargs={
        "headers": {"Content-Type": "application/json"
                     }
    }
)
print(llm.invoke("Tell me a joke"))

ggbetz commented 4 months ago

My mistake. I apologize for my sloppiness. The bug pops up when I use a user-specific oauth token, as generated with gradio login button.

Here is a minimally modified oauth demo to illustrate the bug: https://huggingface.co/spaces/ggbetz/gradio-oauth

The error is (see logs):

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 527, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 261, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1788, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1340, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 759, in wrapper
    response = f(*args, **kwargs)
  File "/home/user/app/app.py", line 22, in create_endpoint
    llm = HuggingFaceEndpoint(
  File "/usr/local/lib/python3.10/site-packages/pydantic/v1/main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for HuggingFaceEndpoint
__root__
  Could not authenticate with huggingface_hub. Please check your API token. (type=value_error)

ggbetz commented 4 months ago

One solution could be to check 'validity' of api token with a brief LLM call, rather than by logging in.

mirkenstein commented 4 months ago

Re: https://github.com/langchain-ai/langchain/issues/22456#issuecomment-2149142173 I cloned your hf space and modified the llm part like this

    llm = HuggingFaceEndpoint(
        # huggingfacehub_api_token=oauth_token.token,
        repo_id=repo_id,
        server_kwargs={
        "headers": {
            "Content-Type": "application/json",
            "Authorization": f"Bearer {oauth_token.token}"
                     }
        }

    )

The app shows this output at the bottom of the screen:

HuggingFaceEndpoint: [1mHuggingFaceEndpoint[0m Params: {'endpoint_url': None, 'task': None, 'model_kwargs': {}}

However when modified the function to actually call the LLM I get this:

Bad request:
Authorization header is correct, but the token seems invalid.

langchain-ai / langchain