Llama 3.1: The output text is truncated

Gumichocopengin8 commented 4 months ago

Describe the bug

Found a similar issue with Llama 2 #717, but this is for Llama 3.1. The output text is cut off and cannot see the entire text result. Is there a way to extend the max length of the output text? What is the default max length?

Minimal reproducible example

import transformers
import torch

model_id = "meta-llama/Meta-Llama-3.1-8B"

pipeline = transformers.pipeline(
  "text-generation",
  model=model_id,
  model_kwargs={"torch_dtype": torch.bfloat16},
  device="cpu",
)

pipeline("Hey how are you doing today?")

Output

[{'generated_text': 'Hey how are you doing today? I’m doing good. I’m just here to talk about'}]

Runtime Environment

Model: meta-llama/Meta-Llama-3.1-8B
Using via huggingface?: yes
OS: Mac with Apple Silicon
GPU VRAM: N/A (used CPU)
Number of GPUs: N/A (used CPU)
GPU Make: N/A (used CPU)

Additional context Add any other context about the problem or environment here.

lmntrx-sys commented 3 months ago

while there are many possible issues with the environment which i cannot assist you virtually Model Configuration The configuration of the language model, such as the maximum token limit set for generation, can lead to truncation. If the max_gen_length parameter is set to a low value, the output will be cut off after reaching that limit. the source code specifically sets this parameter to 64 as default

lmntrx-sys commented 3 months ago

usage of a cpu may also be a reason for the truncated output. Running a large language model on a CPU can be resource-intensive. If the system runs out of memory or CPU resources, it might truncate the output to prevent crashes or excessive lag.

irtiq7 commented 2 months ago

Anyone managed to solve this?

meta-llama / llama