Strange output when summarizing long text using local Llama-3 model with LlamaCpp

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

I am trying to do a simple text summarization task and return the result in JSON format by using the local Llama-3 8B Instruct model (GGUF version) and running with CPU only. The code is as follow:

from langchain.chains import LLMChain
from langchain_community.llms import LlamaCpp
from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler
from langchain_core.prompts import PromptTemplate

# Create the prompt
template = """
              Read the article and return the "release date of Llama-3" in JSON format.
              If the information is not mentioned, please do not return any answer.
              Article: {text}
              Answer:
           """

# Text for summarization (from https://en.wikipedia.org/wiki/Llama_(language_model))
text = """
Llama (acronym for Large Language Model Meta AI, and formerly stylized as LLaMA) is a family of autoregressive large language models (LLMs) released by Meta AI starting in February 2023. The latest version is Llama 3, released in April 2024.

Model weights for the first version of Llama were made available to the research community under a non-commercial license, and access was granted on a case-by-case basis. Unauthorized copies of the model were shared via BitTorrent. In response, Meta AI issued DMCA takedown requests against repositories sharing the link on GitHub. Subsequent versions of Llama were made accessible outside academia and released under licenses that permitted some commercial use. Llama models are trained at different parameter sizes, typically ranging between 7B and 70B. Originally, Llama was only available as a foundation model. Starting with Llama 2, Meta AI started releasing instruction fine-tuned versions alongside foundation models.

Alongside the release of Llama 3, Meta added virtual assistant features to Facebook and WhatsApp in select regions, and a standalone website. Both services use a Llama 3 model.
"""

# Set up and run Local Llama-3 model
prompt = PromptTemplate(template=template, input_variables=["text"])
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
llm = LlamaCpp(model_path="model/llama/Meta-Llama-3-8B-Instruct.Q6_K.gguf",
      n_ctx=2048, callback_manager=callback_manager, verbose=True)
chain = prompt | llm
chain.invoke(text)

Error Message and Stack Trace (if applicable)

No response

Description

By using the code, the model could be run successfully, and the output would be nice.

 {
              "release_date": "April 2024"
            }

However, if I input more text (adding more paragraphs in the webpage (https://en.wikipedia.org/wiki/Llama_(language_model))), the output would be bad and the model kept generating the result:

The release notes for LLaMA model can be found on the official website, Meta AI.  Release notes are typically available after you read the answer.
LLaMA.   If you cannot
    it as is in.  Read More
    LLaMA is a "Release.  Release note the "Read the article.

# Release note the "read in.  Read more and more, Read the  Release on "read a "a
      Release in "Read the "Release
.
.
.

May I know if there is any solution if I would like to input a long text for summarization using local Llama-3 model?

System Info

langchain==0.2.10 langchain_community==0.2.9 langchain_core==0.2.22 Python version 3.10.12

langchain-ai / langchain