Responses are truncated too early

simonw commented 1 year ago

https://twitter.com/mullinsms/status/1686480711211945984

Solution may be the max_tokens parameter.

wbonack commented 1 year ago

max_tokens seems like the right approach, this change gets me better results

diff --git a/llm_llama_cpp.py b/llm_llama_cpp.py
index f2fc977..09c00f6 100644
--- a/llm_llama_cpp.py
+++ b/llm_llama_cpp.py
@@ -234,7 +234,7 @@ class LlamaModel(llm.Model):
                 response._prompt_json = {"prompt_bits": prompt_bits}
             else:
                 prompt_text = prompt.prompt
-            stream = llm_model(prompt_text, stream=True)
+            stream = llm_model(prompt_text, stream=True, max_tokens=4000)
             for item in stream:
                 # Each item looks like this:
                 # {'id': 'cmpl-00...', 'object': 'text_completion', 'created': .., 'model': '/path', 'choices': [

Using llm -m l2c '400 names for a cat' I get the following:

Before

I'm glad you're interested in finding a unique name for your feline friend! Here are 400 creative and fun name suggestions for cats:

Luna
Felix
Whiskers
Fluffy
Patches
Mittens
Snowball
Muffin
Cookie
Cuddles
Sparky
Bubbles
Tiger
Simba
Lola
D

After

I'm glad you're interested in finding a unique name for your feline friend! Here are 400 creative and fun name suggestions for cats:

Luna
Felix
Whiskers
Fluffy
Patches
Mittens
Snowball
Muffin
Cookie
Cuddles
Sparky
Bubbles
Tiger
Simba ...
Sassy Sensation Sparkles
Tiger Tantrum Tango
Lola LaRue Lollipop
Chip off the Old Block Party
Whiskerdoodle Wizard
Purrfectly Pawsome Puddles
Snuggles Sensation Socks
Fluffy Fantastic Furball
Daisy Darling Delight
Luna Lovegood Lollipop
Patches the Brave Purrfectly
Mittens McFluffinator Magic
Ginger the Great Glitter
Sir Whiskers the Magnificent Muffin
Peanut Butter Pandemonium Pawsitive
Sassy Sensation Sparkles Socks
Tiger Tantrum Tango Toes
Lola LaRue Lollipop Legs
Chip off the Old Block Party Pants
Whiskerdoodle Wizard Wings
Purrfectly Pawsome Puddles Paws
Snuggles Sensation Socks Shoes
Fluffy Fantastic Furball Flip Flops I hope these name suggestions help you find the purrfect name for your new furry friend!

Only 98 names for some reason from Llama2 but many more tokens than the 110ish I was getting before

jaanli commented 1 year ago

Also running into this -- responses getting truncated; this is an amazing tool already though :)

simonw commented 1 year ago

This seems to help:

diff --git a/llm_llama_cpp.py b/llm_llama_cpp.py
index f2fc977..62f716b 100644
--- a/llm_llama_cpp.py
+++ b/llm_llama_cpp.py
@@ -226,7 +226,9 @@ class LlamaModel(llm.Model):
     def execute(self, prompt, stream, response, conversation):
         with SuppressOutput(verbose=prompt.options.verbose):
             llm_model = Llama(
-                model_path=self.path, verbose=prompt.options.verbose, n_ctx=4000
+                model_path=self.path,
+                verbose=prompt.options.verbose,
+                n_ctx=4000,
             )
             if self.is_llama2_chat:
                 prompt_bits = self.build_llama2_chat_prompt(prompt, conversation)
@@ -234,7 +236,7 @@ class LlamaModel(llm.Model):
                 response._prompt_json = {"prompt_bits": prompt_bits}
             else:
                 prompt_text = prompt.prompt
-            stream = llm_model(prompt_text, stream=True)
+            stream = llm_model(prompt_text, stream=True, max_tokens=4000)
             for item in stream:
                 # Each item looks like this:
                 # {'id': 'cmpl-00...', 'object': 'text_completion', 'created': .., 'model': '/path', 'choices': [

brandonrobertz commented 1 year ago

I ran into this problem immediately with local models and this fixed it FWIW.

simonw / llm-llama-cpp

Responses are truncated too early #6

Before

After