simonw / llm-llama-cpp

LLM plugin for running models using llama.cpp
Apache License 2.0
136 stars 19 forks source link

Responses are truncated too early #6

Open simonw opened 1 year ago

simonw commented 1 year ago

https://twitter.com/mullinsms/status/1686480711211945984

Solution may be the max_tokens parameter.

wbonack commented 1 year ago

max_tokens seems like the right approach, this change gets me better results

diff --git a/llm_llama_cpp.py b/llm_llama_cpp.py
index f2fc977..09c00f6 100644
--- a/llm_llama_cpp.py
+++ b/llm_llama_cpp.py
@@ -234,7 +234,7 @@ class LlamaModel(llm.Model):
                 response._prompt_json = {"prompt_bits": prompt_bits}
             else:
                 prompt_text = prompt.prompt
-            stream = llm_model(prompt_text, stream=True)
+            stream = llm_model(prompt_text, stream=True, max_tokens=4000)
             for item in stream:
                 # Each item looks like this:
                 # {'id': 'cmpl-00...', 'object': 'text_completion', 'created': .., 'model': '/path', 'choices': [

Using llm -m l2c '400 names for a cat' I get the following:

Before

I'm glad you're interested in finding a unique name for your feline friend! Here are 400 creative and fun name suggestions for cats:

  1. Luna
  2. Felix
  3. Whiskers
  4. Fluffy
  5. Patches
  6. Mittens
  7. Snowball
  8. Muffin
  9. Cookie
  10. Cuddles
  11. Sparky
  12. Bubbles
  13. Tiger
  14. Simba
  15. Lola
  16. D

After

I'm glad you're interested in finding a unique name for your feline friend! Here are 400 creative and fun name suggestions for cats:

  1. Luna
  2. Felix
  3. Whiskers
  4. Fluffy
  5. Patches
  6. Mittens
  7. Snowball
  8. Muffin
  9. Cookie
  10. Cuddles
  11. Sparky
  12. Bubbles
  13. Tiger
  14. Simba ...
  15. Sassy Sensation Sparkles
  16. Tiger Tantrum Tango
  17. Lola LaRue Lollipop
  18. Chip off the Old Block Party
  19. Whiskerdoodle Wizard
  20. Purrfectly Pawsome Puddles
  21. Snuggles Sensation Socks
  22. Fluffy Fantastic Furball
  23. Daisy Darling Delight
  24. Luna Lovegood Lollipop
  25. Patches the Brave Purrfectly
  26. Mittens McFluffinator Magic
  27. Ginger the Great Glitter
  28. Sir Whiskers the Magnificent Muffin
  29. Peanut Butter Pandemonium Pawsitive
  30. Sassy Sensation Sparkles Socks
  31. Tiger Tantrum Tango Toes
  32. Lola LaRue Lollipop Legs
  33. Chip off the Old Block Party Pants
  34. Whiskerdoodle Wizard Wings
  35. Purrfectly Pawsome Puddles Paws
  36. Snuggles Sensation Socks Shoes
  37. Fluffy Fantastic Furball Flip Flops I hope these name suggestions help you find the purrfect name for your new furry friend!

Only 98 names for some reason from Llama2 but many more tokens than the 110ish I was getting before

jaanli commented 1 year ago

Also running into this -- responses getting truncated; this is an amazing tool already though :)

simonw commented 1 year ago

This seems to help:

diff --git a/llm_llama_cpp.py b/llm_llama_cpp.py
index f2fc977..62f716b 100644
--- a/llm_llama_cpp.py
+++ b/llm_llama_cpp.py
@@ -226,7 +226,9 @@ class LlamaModel(llm.Model):
     def execute(self, prompt, stream, response, conversation):
         with SuppressOutput(verbose=prompt.options.verbose):
             llm_model = Llama(
-                model_path=self.path, verbose=prompt.options.verbose, n_ctx=4000
+                model_path=self.path,
+                verbose=prompt.options.verbose,
+                n_ctx=4000,
             )
             if self.is_llama2_chat:
                 prompt_bits = self.build_llama2_chat_prompt(prompt, conversation)
@@ -234,7 +236,7 @@ class LlamaModel(llm.Model):
                 response._prompt_json = {"prompt_bits": prompt_bits}
             else:
                 prompt_text = prompt.prompt
-            stream = llm_model(prompt_text, stream=True)
+            stream = llm_model(prompt_text, stream=True, max_tokens=4000)
             for item in stream:
                 # Each item looks like this:
                 # {'id': 'cmpl-00...', 'object': 'text_completion', 'created': .., 'model': '/path', 'choices': [
brandonrobertz commented 1 year ago

I ran into this problem immediately with local models and this fixed it FWIW.