langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
95.71k stars 15.54k forks source link

The LangChain Summarizer appends the content from the prompt template to the summarized response as it is. #5597

Closed VirajBhatt closed 1 year ago

VirajBhatt commented 1 year ago

System Info

Langchain version = 0.0.187 Python version = 3.9

Who can help?

Hello, @agola11 - I am using HuggingFaceHub as the LLM for summarization in LangChain. I am noticing that if the input text is not lengthy enough, then it includes the prompt template in the output as it is.

Information

Related Components

Reproduction

Sample Code :

from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document
from langchain import HuggingFacePipeline
from langchain import HuggingFaceHub

llm = HuggingFaceHub(repo_id='facebook/bart-large-cnn', model_kwargs={"temperature":0.5, "max_length":100})
text_splitter = CharacterTextSplitter()

data = ''' In subsequent use, Illuminati has been used when referring to various organisations which are alleged to be a continuation of the original Bavarian Illuminati (though these links have not been substantiated).  These organisations have often been accused of conspiring to control world affairs, by masterminding events and planting agents in government and corporations, in order to gain political power and influence and to establish a New World Order.'''

texts = text_splitter.split_text(data)
docs = [Document(page_content=t) for t in texts]

chain = load_summarize_chain(llm, chain_type="stuff", verbose=True)
print(chain.run(docs))

Verbose Output :

> Entering new StuffDocumentsChain chain...

> Entering new LLMChain chain...
Prompt after formatting:
Write a concise summary of the following:

"In subsequent use, Illuminati has been used when referring to various organisations which are alleged to be a continuation of the original Bavarian Illuminati (though these links have not been substantiated).  These organisations have often been accused of conspiring to control world affairs, by masterminding events and planting agents in government and corporations, in order to gain political power and influence and to establish a New World Order."

CONCISE SUMMARY:

> Finished chain.

> Finished chain.
 Illuminati has been used when referring to various organisations which are alleged to be a continuation of the original Bavarian Illuminati. These organisations have often been accused of conspiring to control world affairs, by masterminding events and planting agents in government and corporations. Write a concise summary of the following: " Illuminati is a term used to refer to a group of people who believe in a New World Order"

Summarized Output : (Notice how it appends the prompt text as well)

Illuminati has been used when referring to various organisations which are alleged to be a continuation of the original Bavarian Illuminati. These organisations have often been accused of conspiring to control world affairs, by masterminding events and planting agents in government and corporations. Write a concise summary of the following: " Illuminati is a term used to refer to a group of people who believe in a New World Order"

Expected behavior

It should not include the prompt text and simply output the summarized text or if the input text is too small to summarize, might as well return the original text as it is.

Expected Output :

Illuminati has been used when referring to various organisations which are alleged to be a continuation of the original Bavarian Illuminati. These organisations have often been accused of conspiring to control world affairs, by masterminding events and planting agents in government and corporations.
dosubot[bot] commented 1 year ago

Hi, @VirajBhatt. I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding, the issue you raised is about the LangChain Summarizer appending the prompt template to the summarized response, even when the input text is not lengthy enough. This behavior is not expected, as the prompt text should be excluded.

Currently, there hasn't been any activity or updates on this issue. So, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let the LangChain team know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution to the LangChain project, and we appreciate your understanding as we work to manage our backlog effectively. If you have any further questions or concerns, please let us know.