langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
88.61k stars 13.92k forks source link

HuggingFacePipeline with ChatPromptTemplate never ends #19770

Open bibhas2 opened 3 months ago

bibhas2 commented 3 months ago

Checked other resources

Example Code

I create a pipeline.

from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import pipeline
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

pipeline = pipeline(
    "text-generation",
    "TinyLlama/TinyLlama-1.1B-Chat-v1.0")

I use the pipeline directly and get a response back in seconds.

messages = [
    {"role": "user", "content": "When was Abraham Lincoln born?"},
    {"role": "assistant", "content": "Abraham Lincoln was born on February 12, 1809."},
    {"role": "user", "content": "How old was he when he died?"},
    {"role": "assistant", "content": "Abraham Lincoln died on April 15, 1865, at the age of 56."},
    {"role": "user", "content": "Where did he die?"},
]

print(pipeline(messages, max_new_tokens=128)) 

Ignore the wrong answer :-)

[{'generated_text': [
    {'role': 'user', 'content': 'When was Abraham Lincoln born?'},
    {'role': 'assistant',
     'content': 'Abraham Lincoln was born on February 12, 1809.'},
    {'role': 'user', 'content': 'How old was he when he died?'},
    {'role': 'assistant',
     'content': 'Abraham Lincoln died on April 15, 1865, at the age of 56.'
     },
    {'role': 'user', 'content': 'Where did he die?'},
    {'role': 'assistant',
     'content': 'Abraham Lincoln died at his home in Springfield, Illinois.'
     },
    ]}]

Then I try to do the same using Langchain.

llm = HuggingFacePipeline(
    pipeline=pipeline, 
    pipeline_kwargs={"max_new_tokens": 128}
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "When was Abraham Lincoln born?"),
        ("ai", "Abraham Lincoln was born on February 12, 1809."),
        ("human", "How old was he when he died?"),
        ("ai", "Abraham Lincoln died on April 15, 1865, at the age of 56."),
        ("human", "{question}"),
        # ("ai", "")
    ]
)

chain = prompt | llm

print(chain.invoke({"question":"Where did he die?"}))

This code never ends. It seems to be stuck here.

File [~/Library/Python/3.9/lib/python/site-packages/langchain_community/llms/huggingface_pipeline.py:204](http://localhost:8888/lab/workspaces/~/Library/Python/3.9/lib/python/site-packages/langchain_community/llms/huggingface_pipeline.py#line=203), in HuggingFacePipeline._generate(self, prompts, stop, run_manager, **kwargs)
    201 batch_prompts = prompts[i : i + self.batch_size]
    203 # Process batch of prompts
--> 204 responses = self.pipeline(batch_prompts, **pipeline_kwargs)
    206 # Process each response in the batch
    207 for j, response in enumerate(responses):

Error Message and Stack Trace (if applicable)

No response

Description

Langchain chain never finishes executing. This seems to be a problem with HuggingFacePipeline as the same prompt works fine with Open AI.

System Info

System Information
------------------
> OS:  Darwin
> OS Version:  Darwin Kernel Version 23.2.0: Wed Nov 15 21:53:34 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T8103
> Python Version:  3.9.6 (default, Nov 10 2023, 13:38:27) 
[Clang 15.0.0 (clang-1500.1.0.2.5)]

Package Information
-------------------
> langchain_core: 0.1.36
> langchain: 0.1.7
> langchain_community: 0.0.20
> langsmith: 0.1.37
> langchain_mistralai: 0.0.4
> langchain_openai: 0.1.1

Packages not installed (Not Necessarily a Problem)
--------------------------------------------------
The following packages were not found:

> langgraph
> langserve

pip show output:

Name: langchain
Version: 0.1.7
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: XXX
Requires: jsonpatch, SQLAlchemy, langsmith, dataclasses-json, langchain-core, async-timeout, numpy, aiohttp, PyYAML, pydantic, requests, langchain-community, tenacity
Required-by: 
---
Name: transformers
Version: 4.39.2
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: XXX
Requires: packaging, safetensors, pyyaml, huggingface-hub, tokenizers, requests, filelock, tqdm, numpy, regex
Required-by: sentence-transformers
Jaiczay commented 3 months ago

When initializing the HuggingFacePipeline Class like this:

pipeline = pipeline(
    "text-generation",
    "TinyLlama/TinyLlama-1.1B-Chat-v1.0")

llm = HuggingFacePipeline(
    pipeline=pipeline, 
    pipeline_kwargs={"max_new_tokens": 128}
)

The pipeline_kwargs gets overwritten and that is why it takes so long. This is probably a bug right now. But It works like this:

llm = HuggingFacePipeline.from_model_id(
    model_id="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    task="text-generation",
    pipeline_kwargs={"max_new_tokens": 128}
)
bibhas2 commented 3 months ago

@Jaiczay I can verify that your suggested solution works.

djstrong commented 2 weeks ago

The stop argument is not used in HuggingFacePiepeline. Therefor, agents don't work properly.