Open Gumichocopengin8 opened 4 months ago
while there are many possible issues with the environment which i cannot assist you virtually Model Configuration The configuration of the language model, such as the maximum token limit set for generation, can lead to truncation. If the max_gen_length parameter is set to a low value, the output will be cut off after reaching that limit. the source code specifically sets this parameter to 64 as default
usage of a cpu may also be a reason for the truncated output. Running a large language model on a CPU can be resource-intensive. If the system runs out of memory or CPU resources, it might truncate the output to prevent crashes or excessive lag.
Anyone managed to solve this?
Describe the bug
Found a similar issue with Llama 2 #717, but this is for Llama 3.1. The output text is cut off and cannot see the entire text result. Is there a way to extend the max length of the output text? What is the default max length?
Minimal reproducible example
Output
Runtime Environment
meta-llama/Meta-Llama-3.1-8B
Additional context Add any other context about the problem or environment here.