langchain-ai / langchain

šŸ¦œšŸ”— Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.85k stars 15.36k forks source link

`HuggingFaceEndpoint` does not raise exceptions when API call fails due to token counts. #26525

Open michael-newsrx opened 1 month ago

michael-newsrx commented 1 month ago

Checked other resources

Example Code

bearer_token = hf_bearer_token()

# Get the HuggingFace API URL
ep: InferenceEndpoint = local_utils.hf.inference.llama_31_8B_Instruct(wait=True)

# Create the langchain endpoint
llm = HuggingFaceEndpoint(  #
        # repo_id=ep.repository,
        endpoint_url=ep.url,  # + chat_completions,  #
        task="text-generation",  #
        huggingfacehub_api_token=bearer_token,  #
)

#Bind parameters
llm = llm.bind(max_tokens=8192, temperature=None)  #.with_retry(stop_after_attempt=99)
# This is a utility class for conversing with the LLama3 model.
# Constrained output via regex or json is broken, so this approach is used instead

@dataclass
class LangChainRawChat:
    llm: LLM
    text: str

    def __init__(self, llm: LLM):
        self.llm = llm
        self.text = "<|begin_of_text|>"

    def system(self, content: str, *, role="system") -> None:
        self.text += f"<|start_header_id|>{role}<|end_header_id|>\n"
        self.text += content
        self.text += "<|eot_id|>"

    def user(self, content: str, *, role="user") -> None:
        self.text += f"<|start_header_id|>{role}<|end_header_id|>\n"
        self.text += content
        self.text += "<|eot_id|>"

    def assistant(self, content: str | None = None, *, role="assistant", temperature=0.0) -> str:
        self.text += f"<|start_header_id|>{role}<|end_header_id|>\n"
        if content is not None:
            self.text += content
        _temperature = temperature if temperature > 0.0 else None
        output = self.llm.invoke(self.text, max_tokens=8192, temperature=_temperature,
                                 stop_sequence="<|eot_id|>")
        self.text += output
        self.text += "<|eot_id|>"
        return content + output
# Sample code to run one "chat" session
src_article = """An article whose length along with prompt and output format instructions exceed the token limit"""

job_chat: LangChainRawChat = LangChainRawChat(llm)
job_chat.system(system_prompt)
job_chat.user(prompt.format(format=format, text=src_article))
response = job_chat.assistant("# Analysis\n\n## Entities and concepts\n\n")

Error Message and Stack Trace (if applicable)

The HuggingFaceEndpoint silently fails.

The API endpoint shows in the logs:

`inputs` tokens + `max_new_tokens` must be <= 16384. Given: 20016 `inputs` tokens and 512 `max_new_tokens`

The error message also implies that max_new_tokens from the bind on the LLM is being ignored for actual API request calls.

Description

I'm trying to use the langchain library to interface with Hugging Face Dedicated Endpoints.

System Info

System Information

OS: Linux OS Version: #44-Ubuntu SMP PREEMPT_DYNAMIC Tue Aug 13 13:35:26 UTC 2024 Python Version: 3.12.5 | packaged by conda-forge | (main, Aug 8 2024, 18:36:51) [GCC 12.4.0]

Package Information

langchain_core: 0.2.38 langchain: 0.2.16 langchain_community: 0.2.16 langsmith: 0.1.117 langchain_cli: 0.0.30 langchain_huggingface: 0.0.3 langchain_llm: 0.4.15 langchain_openai: 0.1.23 langchain_text_splitters: 0.2.4 langgraph: 0.2.19 langserve: 0.2.3

Other Dependencies

accelerate: 0.34.2 aiohttp: 3.10.5 async-timeout: Installed. No version info available. cpm_kernels: 1.0.11 dataclasses-json: 0.6.7 einops: 0.8.0 fastapi: 0.114.0 gitpython: 3.1.43 httpx: 0.27.2 huggingface-hub: 0.24.6 jsonpatch: 1.33 langgraph-checkpoint: 1.0.9 langserve[all]: Installed. No version info available. libcst: 1.4.0 loguru: 0.7.2 numpy: 1.26.4

efriis commented 1 month ago

Hey there! I think the param you want to populate is max_new_tokens, not max_tokens

The warning seems like you're also passing a different input text than what's declared in your repro code.

@Jofthomas thoughts on the error handling here? Is the integration catching an error it shouldn't be?