simonw / llm

Access large language models from the command-line
https://llm.datasette.io
Apache License 2.0
4.44k stars 243 forks source link

llm response truncated for llama2 #181

Open gao-jian opened 1 year ago

gao-jian commented 1 year ago

Hi Simon, the responses from llama2 have been truncated. What is the good way for llm to handle this? see

% llm -m l2c "give me 20 good names for avatars" --system "you are a creator" Sure, here are 20 creative and unique name suggestions for avatars:

  1. Aurora Sparklefeather
  2. Zephyr Starlight
  3. Nova Frostbite
  4. Luna Nightshade
  5. Astrid Stormbringer
  6. Lyra Moonwhisper
  7. Niamh Fireheart
  8. Zara Wildfire
  9. Calliope Cosmos
  10. Phoebe Cosmicray
  11. Willow Waverider
  12. Echo Nightwalker
  13. Elara Star
jefftriplett commented 1 year ago

I don't see the same issue you are seeing, but I am running into issues where I can't seem to exceed more then ~1500 or 2k tokens. I did notice this tonight while playing around with the ollama library and wanted to pass it along in case it's helpful.

I was comparing values and one that stuck out to me was from the ollama library. see num_ctx here: https://github.com/jmorganca/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values

I can change ollama's max token limit (num_ctx=8192) and it works as expected. This makes me wonder if the wrong value isn't being accepted (or I don't know how to use llm which is quite possible)

cmungall commented 1 year ago

Can you provide more details? Is l2c aliases to a llama model accessed via the replicate plugin, or llm-llama-cpp?

I'm guessing the latter, since l2c is the suggested alias here: https://simonwillison.net/2023/Aug/1/llama-2-mac/

A general tip: try re-running again with --no-log, to force it to generate a different solution (without this you are stuck getting the truncated response for the same prompt).

As a general observation I have found the 7B model to give quite short or truncated answers. Weirdly when I access it via the replicate plugin I sometimes get responses as short as one letter or word (and I can't seem to replicate this on the playground).

I have had more success with 70B but YMMV

iam4x commented 1 year ago

Increasing the -o max_gen_len 4096 worked for me with different models.

sfc-gh-jroes commented 1 year ago

Doesn't seem to be an option for me?

~ via 🌙 via 🐍 v3.11.5 took 24s 
λ llm models list --options  
[...]
LlamaModel: llama-2-7b-chat.ggmlv3.q8_0 (aliases: llama2-chat, l2c)
  verbose: boolean

~ via 🌙 via 🐍 v3.11.5 
λ llm "how do i increase the output token count for this llm" -m l2c -o max_gen_len 4096
Error: max_gen_len
  Extra inputs are not permitted