langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
88.31k stars 13.85k forks source link

HuggingFacePipeline does not use chat template #19933

Open bibhas2 opened 2 months ago

bibhas2 commented 2 months ago

Checked other resources

Example Code

Hugging Face pipeline now has support for chat templates. This calls the apply_chat_template() of the tokenizer. This is a super useful feature which formats the input correctly according to the model. To apply the template one needs to pass a messages list to the pipeline as input (and not a prompt text).

Langchain's HuggingFacePipeline class is written in a way that only prompt text is passed to the pipeline. We can see this in the ‎HuggingFacePipeline._generate method. As a result the prompt is constructed using Langchain's default template which is not the same as what the model works best with.

Let's build an example.

import torch
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline

def test_agnostic_prompt(llm):
    prompt = ChatPromptTemplate.from_messages(
        [
            ("human", "When was Abraham Lincoln born?"),
            ("ai", "Abraham Lincoln was born on February 12, 1809."),
            ("human", "How old was he when he died?"),
            ("ai", "Abraham Lincoln died on April 15, 1865, at the age of 56."),
            ("human", "{question}"),
        ]
    )

    output_parser = StrOutputParser()

    chain = prompt | llm | output_parser

    reply = chain.invoke({"question": "Where did he die?"})

    print(reply)

hf_llm = HuggingFacePipeline.from_model_id(
        model_id="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
        task="text-generation",
        pipeline_kwargs={"max_new_tokens": 128})

test_agnostic_prompt(hf_llm)

This sends the following prompt.

Human: When was Abraham Lincoln born?
AI: Abraham Lincoln was born on February 12, 1809.
Human: How old was he when he died?
AI: Abraham Lincoln died on April 15, 1865, at the age of 56.
Human: Where did he die?

The correct prompt, if chat template was applied, would be:

<|user|>
When was Abraham Lincoln born?</s> 
<|assistant|>
Abraham Lincoln was born on February 12, 1809.</s> 
<|user|>
How old was he when he died?</s> 
<|assistant|>
Abraham Lincoln died on April 15, 1865, at the age of 56.</s> 
<|user|>
Where did he die?</s> 
<|assistant|>

Error Message and Stack Trace (if applicable)

No response

Description

The HuggingFacePipeline class should what is necessary to convert the ChatPromptTemplate to a messages list and then pass it to the pipeline. This will cause the pipeline to use apply_chat_template() of the tokenizer to correctly format the prompt.

System Info

System Information

OS: Linux OS Version: #1 SMP Sat Feb 24 09:50:35 UTC 2024 Python Version: 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0]

Package Information

langchain_core: 0.1.38 langchain: 0.1.14 langchain_community: 0.0.31 langsmith: 0.1.24 langchain_openai: 0.1.1 langchain_text_splitters: 0.0.1

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph langserve

liugddx commented 2 months ago

Let me see

olegarch commented 1 month ago

any workaround available?