llm response truncated for llama2

simonw / llm

Access large language models from the command-line

https://llm.datasette.io

Apache License 2.0

4.44k stars 243 forks source link

llm response truncated for llama2 #181

Open gao-jian opened 1 year ago

gao-jian commented 1 year ago

Hi Simon, the responses from llama2 have been truncated. What is the good way for llm to handle this? see

% llm -m l2c "give me 20 good names for avatars" --system "you are a creator" Sure, here are 20 creative and unique name suggestions for avatars:

Aurora Sparklefeather
Zephyr Starlight
Nova Frostbite
Luna Nightshade
Astrid Stormbringer
Lyra Moonwhisper
Niamh Fireheart
Zara Wildfire
Calliope Cosmos
Phoebe Cosmicray
Willow Waverider
Echo Nightwalker
Elara Star

jefftriplett commented 1 year ago

I don't see the same issue you are seeing, but I am running into issues where I can't seem to exceed more then ~1500 or 2k tokens. I did notice this tonight while playing around with the ollama library and wanted to pass it along in case it's helpful.

I was comparing values and one that stuck out to me was from the ollama library. see num_ctx here: https://github.com/jmorganca/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values

I can change ollama's max token limit (num_ctx=8192) and it works as expected. This makes me wonder if the wrong value isn't being accepted (or I don't know how to use llm which is quite possible)

cmungall commented 1 year ago

Can you provide more details? Is l2c aliases to a llama model accessed via the replicate plugin, or llm-llama-cpp?

I'm guessing the latter, since l2c is the suggested alias here: https://simonwillison.net/2023/Aug/1/llama-2-mac/

A general tip: try re-running again with --no-log, to force it to generate a different solution (without this you are stuck getting the truncated response for the same prompt).

As a general observation I have found the 7B model to give quite short or truncated answers. Weirdly when I access it via the replicate plugin I sometimes get responses as short as one letter or word (and I can't seem to replicate this on the playground).

I have had more success with 70B but YMMV

iam4x commented 1 year ago

Increasing the -o max_gen_len 4096 worked for me with different models.

sfc-gh-jroes commented 1 year ago

Doesn't seem to be an option for me?

~ via 🌙 via 🐍 v3.11.5 took 24s 
λ llm models list --options  
[...]
LlamaModel: llama-2-7b-chat.ggmlv3.q8_0 (aliases: llama2-chat, l2c)
  verbose: boolean

~ via 🌙 via 🐍 v3.11.5 
λ llm "how do i increase the output token count for this llm" -m l2c -o max_gen_len 4096
Error: max_gen_len
  Extra inputs are not permitted