noamgat / lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model
MIT License
1.01k stars 46 forks source link

Output ending prematurely #34

Closed jpeig closed 4 months ago

jpeig commented 7 months ago

Output frequently (20% of runs) stops prematurely and returns invalid json, even ignoring closing brackets. I already tried to resolve this issue by playing around with the constants, limiting subsequent whitspaces. But it did not seem to resolve anything.

I am using the vllm integration.

{
"index": 0,
"text":  { \"event_body\": \"As you continue to perform in the bustling market square, you notice a group of children gathered around a colorful circus wagon. They're captivated by the vibrant marionette show being performed by a skilled marionettist. The children's laughter and excitement remind you of the joy your circus act once brought to others. You decide to approach the marionettist and learn from his masterful performance, hoping to incorporate this new skill into your act and rekindle the magic of your performances. The marionettist, a kind-hearted man named Rembrandt, welcomes you with open arms and shares his knowledge of the art form. As you practice together, you realize that combining your circus skills with Rembrandt's marionette expertise could create a truly enchanting show that captivates audiences of all ages. The bustling market square becomes the perfect stage for this unique collaboration, where the vibrant colors and lively atmosphere complement your combined talents. As you perform, you can't help but feel a renewed sense of purpose and excitement for the future of your artistic journey. The children's laughter and applause serve as a testament to the success of your newfound partnership, inspiring you to continue pushing the boundaries of Baroque-era entertainment in this era of grandeur and drama.  Analysis",
"logprobs": null,
"finish_reason": "stop"
}

Tends to happen with more complicated schemas. No raised errors.

Are there situations where the parser can give a response back without hitting END_OBJECT?

jpeig commented 7 months ago

Debugged it some more. This may happen if an attribute is "required" that is not included in the schema.

What to expect instead: an invalid json error.

noamgat commented 7 months ago

Thanks for the clarification! I am going to do a bug squash in a few days, I hope to address as many issues as possible.

noamgat commented 7 months ago

Can you give an example schema + prompt where this happens? If the token limit is reached before the schema completes, lm-format-enforcer can't do anything about it. It may be the case here.

jpeig commented 6 months ago

Seem to still occur very frequently @noamgat, even when the required attributes are all in place. Using the latest branch of VLLM and LM Format Enforcer. No custom code.

Ok so here is the full report:

Prompt:

    ### System:
    You are a event writer for grand strategy games.
    The theme of the game is the (Dutch) Golden Age of the Baroque era.
    You are tasked with generating a player event for a a mission in JSON format that is consistent with the current gamestate.

    ### Character (needed for hooks and ideas):

    Your name: Jorrit (this is you - don't refer to yourself from the third perspective!)
    Your age: 30
    Your occupation: Entrepreneur
    Your current location:At a resting place.
    Your worldview: Mercantilism: Your aspirations mirror the economic pursuits of this age, emphasizing commerce and enterprise. Commerce and maritime dominance drive you, but don't let the sparkle of gilders lead you into the grasp of greed.
    Your social class: Foreign Merchants & Diplomats: Navigating a complex web of trade and politics, you bring diverse perspectives and have a keen understanding of global dynamics.
    Your personality: The Logician (INTP): Innovative and abstract, you excel in solving complex problems. Your analytical skills are top-notch, though you may be perceived as absentminded.
    Your traits: Entrepreneurial, Tactful, Innovative, Greedy, Aloof, Absentminded
    Your communication style: Persuasive, Measured, Analytical, Direct, Cultured, Abstract
    Your notoriety: You're practically invisible to the world; no one knows your name.
Your standing: Despite being relatively unknown outside of your inner circle, your reputation precedes you amongst those who appreciate innovation and calculated risk-taking. Some view you with suspicion, while others see potential in your unique perspective and willingness to challenge established norms.

Your significant people:
Jacob van der Meer: As a fellow entrepreneur like you, Jacob has faced similar challenges and successes in his endeavors. He is known for his innovative ideas and willingness to take risks, making him a valuable ally or fierce competitor depending on your relationship. Their disposition towards me is: 'neutral'
Maria de Wijk: A wealthy widow who shares your love for art and culture. She frequently hosts lavish parties where you can showcase your latest acquisitions and network with other influential individuals. However, her affinity for extravagance could also lead to dangerous debt traps. Their disposition towards me is: 'friendly'
Jan van den Boek: An ambitious young artist seeking recognition and patronage. His talent and passion inspire you, but he is also prone to reckless behavior and poor decision-making. Whether you choose to mentor him or exploit his talents remains to be seen. Their disposition towards me is: 'respectful'
Adriaen van der Stok: A powerful merchant rival who controls much of the trade in spices and exotic goods. He is cunning and ruthless, but also highly respected for his business acumen. Keep a close eye on him, lest he steal away your hard-earned profits. Their disposition towards me is: 'hostile'
Sophie van Delft: A renowned historian and scholar who studies the rise and fall of empires throughout history. Her insights into politics, economics, and society provide valuable lessons for navigating the ever-changing landscape of power dynamics. She is also rumored to possess knowledge about ancient artifacts hidden across Europe. Their disposition towards me is: 'curious'
Lena van Leeuwen: A skilled physician and scientist who works tirelessly to advance medical knowledge during a time when superstition still holds sway. Despite being ostracized by some for her unconventional views, she continues to push boundaries and challenge traditional beliefs. Their disposition towards me is: 'inquisitive'

    Your significant places:
De Vijverhof: This is your modest yet charming residence located near the picturesque lake of De Vijver. It serves as both your sanctuary and base of operations.
Het Spiekerhuisje: A small cottage nestled amidst the rolling hills of your estate. This humble abode provides solace after long days spent trading and exploring new lands.
De Gouda Waag: As a prominent merchant, this iconic building represents the heart of your business dealings. Its towering clock tower stands tall over the bustling marketplace below, reminding you of the ticking clock of opportunity.
De Keizersgracht: One of Amsterdam's most famous waterways, it connects you to the city's vibrant center and its many opportunities. As you sail along these historic waters, imagine the countless ships that have traversed here before you.
Your significant objects:
Your journal: As a logician, you meticulously document your thoughts, observations, and plans in this journal. It serves as a repository of your ideas and a tool for organizing your thoughts.
Your compass: A trusted companion on your travels, this compass helps guide you through unfamiliar territories and ensures you never get lost. With it, you can navigate even the most treacherous landscapes.
Your map collection: Your extensive collection of maps serves as a visual representation of the vast world beyond your immediate surroundings. They offer insight into unexplored regions and potential routes for future adventures.
Your financial health: Although you've accumulated some wealth, your lifestyle is tempered by the monthly reminders of significant debt.
Your lifestyle: Now settled in Amsterdam, you spend your days navigating the intricate web of politics, commerce, and diplomacy that defined the era. Your home, De Vijverhof, serves as both your refuge and hub of activity, filled with books, maps, and curiosities collected from your travels.

    ### Mission:
    The current active mission (important):
The Scholar's Secret:  Sophie van Delft approaches you after one of her lectures at Het Spiekerhuisje, expressing interest in collaborating on a project involving ancient artifacts hidden throughout Europe. She believes these relics hold clues about past civilizations and their rise and fall - knowledge that could prove valuable during this era of exploration and expansion. However, working closely together may expose secrets better left untold...
Your choices:
- Accept Sophie's proposal: Delve into history alongside Sophie as you search for these mysterious artifacts across Europe.
- Decline politely but remain open for future collaborations: Respectfully turn down Sophie's offer while leaving room for future opportunities should they arise.
present yourself as an expert historian without committing fully : Pretend familiarity with ancient relics while keeping distance from any potentially dangerous discoveries.

    ### Journal:
You have no event entries related to your mission yet.

    ### Instruction:
    - Always write in English in the SECOND perspective - as if written for the player. E.g. "You have.." or "You are.." or "You need to.." or "Your x.." or "You are facing..". So do not say "Jorrit's father", instead say "your father".

    - Write the event_body directly from on the proceeds and analysis from the last event in the "### Journal".
    - Important: do not repeat the last event. Instead, build on it.
    - The event should align with the current mission.
    - Write the event_body briefly and succinctly (max 100 words), and keep it natural and easy to read.
    - Write the event based on the input provided by the "### Journal".
    - Take inspiration in writing the event / event_body from the "What may happen next" under "### Journal"

    # Other rules
    - Decide on the event location. Ensure it is not too far away from the current location.
    - Decide on the time of day it should trigger. This can be either morning, afternoon, evening or night.
    - Align the event for the set time of day and location, with the aim of further progressing the current mission.
    - Compose the event so that it requires the player to carry out a simple task, involving minimal thought or effort. For example, making a familiar meal.
    - Write the player_options in the order of the difficulty level, from easiest to hardest. The player must pick only 1 option.
    - For  "challenges" list the skill checks the player should perform, or a payment to perform the action. Do not mention this anywhere else. Use the following info to determine whether or not the player needs to perform a skill check:
    'Insight' revolves around the realm of deep intellectual exploration and esoteric wisdom. Rooted in the synthesis of intuitive perception with structured thought, it captures the essence of understanding phenomena that often transcend conventional boundaries.        
    'Force' is the confluence of physical might with the ideals of principled leadership. It emphasizes the exertion of authority driven by both inner strength and a commitment to honorable action.
    'Diplomacy', at its core, is the art of navigating and harmonizing interpersonal relationships. It accentuates the importance of building bridges, fostering communal bonds, and skillfully managing social dynamics.
    - Do not use the above gameplay jargon in the event_body. Keep it natural and focus on storytelling.
    - For the event_body / narrative effects of each option: write in the present tense and second person (e.g. "You managed to..."). Always directly branch off the options from the event_body.
    - For "gameplay" list which player's stats would change. Do not mention this anywhere else.
    - For each option/decision write a short line of internal dialogue for the player, that is consistent with the player communication style, character and the consequences of the option. Write in the present tense and first person.

    ### Response:

Schema:

{
    "type": "object",
    "properties": {
        "prop1_title": {"type": "string"},
        "prop2_location": {"type": "string"},
        "prop3_trigger_time_of_day": {
            "type": "string",
            "enum": ["morning", "afternoon", "evening", "night"]
        },
        "prop4_trigger_date": {
            "type": "string",
            "enum": ["now", "today", "tomorrow", "this_week"]
        },
        "prop5_event_body": {
            "type": "string",
            "description": "keep the player event succinct."
        },

        "option-1": {
            "type": "object",
            "properties": {
                "prop1_player_option": {"type": "string", "description": "write in the second perspective ('you') and write in the active form"},
                "prop2_challenges": {
                    "description": "List the checks the player is required to pass in order to determine success or failure for this action. Pick from 'payment', 'insight', 'force' and 'diplomacy'",
                    "type": "array",
                    "items": {
                        "type": "string",
                        "enum": ["payment", "insight", "force", "diplomacy"]
                    },
                    "minItems": 1,
                    "maxItems": 3,
                    "uniqueItems": true
                },
                "prop3_internal_dialogue": {
                    "type": "string",
                    "description": "write a line of internal dialogue where the player ponders the consequences of the action in accordance with the communication style and character of the player"
                }
            },
            "required": ["prop1_player_option", "prop2_challenges", "prop3_internal_dialogue"]
        }
        ,

        "option-2": {
            "type": "object",
            "properties": {
                "prop1_player_option": {"type": "string", "description": "write in the second perspective ('you') and write in the active form"},
                "prop2_challenges": {
                    "description": "List the checks the player is required to pass in order to determine success or failure for this action. Pick from 'payment', 'insight', 'force' and 'diplomacy'",
                    "type": "array",
                    "items": {
                        "type": "string",
                        "enum": ["payment", "insight", "force", "diplomacy"]
                    },
                    "minItems": 1,
                    "maxItems": 3,
                    "uniqueItems": true
                },
                "prop3_internal_dialogue": {
                    "type": "string",
                    "description": "write a line of internal dialogue where the player ponders the consequences of the action in accordance with the communication style and character of the player"
                }
            },
            "required": ["prop1_player_option", "prop2_challenges", "prop3_internal_dialogue"]
        }
        ,

        "option-3": {
            "type": "object",
            "properties": {
                "prop1_player_option": {"type": "string", "description": "write in the second perspective ('you') and write in the active form"},
                "prop2_challenges": {
                    "description": "List the checks the player is required to pass in order to determine success or failure for this action. Pick from 'payment', 'insight', 'force' and 'diplomacy'",
                    "type": "array",
                    "items": {
                        "type": "string",
                        "enum": ["payment", "insight", "force", "diplomacy"]
                    },
                    "minItems": 1,
                    "maxItems": 3,
                    "uniqueItems": true
                },
                "prop3_internal_dialogue": {
                    "type": "string",
                    "description": "write a line of internal dialogue where the player ponders the consequences of the action in accordance with the communication style and character of the player"
                }
            },
            "required": ["prop1_player_option", "prop2_challenges", "prop3_internal_dialogue"]
        }

    },
    "required": [
        "prop1_title",
        "prop2_location",
        "prop3_trigger_time_of_day",
        "prop4_trigger_date",
        "prop5_event_body",

        "option-1",

        "option-2",

        "option-3"

    ]
}

Response:

'\n    {\n        "prop1_title": "The Scholar\'s Secret",\n        "prop2_location": "De Vijverhof",\n        "prop3_trigger_time_of_day": "afternoon",\n        "prop4_trigger_date": "now",\n        "prop5_event_body": "As you pore over your journal, you come across a note about an upcoming meeting with Sophie van Delft. You\'re eager to discuss the ancient artifacts and their potential secrets. You can\'t help but wonder what dangers might lie ahead in this quest for knowledge. Will you embrace the challenge and uncover the mysteries of the past?",\n        "option-1": {\n           "prop1_player_option": "Accept Sophie\'s proposal",\n           "prop2_challenges": [\n\n\n\n\n\n\n\n\n\n\n\n"diplomacy"\n],\n           "prop3_internal_dialogue": "I\'m ready to dive into history with Sophie and uncover the secrets of these artifacts. The risks are worth the potential rewards for my understanding of the world and its past civilizations. Let\'s do this! \t\t\t\t\t\t\t\t\t\t\t\t'
jpeig commented 6 months ago

Max_tokens is set at 10k Context window is 16k So these should not be issues.

Furthermore, the parser did not throw a pydantic error I am using AIDC-ai-business_Marcoroni-7B-v3 as model (no AWQ or quantization) Running on a 3090RTX.

noamgat commented 6 months ago

Does this still happen after the latest \' fix? I see that the response included that string several times, which may have caused the parser to get to an erroneous state.

motaatmo commented 5 months ago

This happens for me as well. I'm using llama.cpp and Capybara-34B. When asked for this schema:

{'properties': {'name': {'title': 'Name', 'type': 'string'}},
              'required': ['name'], 'title': 'person', 'type': 'object'}

with this question:

What is the name of Michael Jackson?

my last results were:

{"name":"Michael

{
  "
{
 "name

In other words, the output did never return a complete JSON object

remixer-dec commented 5 months ago

I'm also facing this issue

noamgat commented 4 months ago

Output ending prematurely usually means that there was an exception, and in that case LMFE returns a ForceStopParser which what causes the premature ending. When this happens, there should be a lot with more details. Can you check the logs to see if anything suspicious is outputted? If it is possible to attach a notebook / python file with a reproducing case for you it will be the best way for me to fix it.

motaatmo commented 4 months ago

This is my test code:

from llama_cpp import Llama, LogitsProcessorList  # type: ignore
import lmformatenforcer as lme # type: ignore
import lmformatenforcer.integrations.llamacpp as lmellama  # type: ignore
import json
import os
import logging
logging.basicConfig(level=logging.DEBUG)
logging.getLogger().setLevel(logging.DEBUG)
for name in logging.root.manager.loggerDict:
    logging.getLogger(name).setLevel(logging.DEBUG)
formatter = logging.Formatter("%(name)s - %(levelname)s - %(message)s")
for handler in logging.getLogger().handlers:
    handler.setFormatter(formatter)

def test_lmenforcer_call():
    """
    Just for testing

    """
    llm = Llama(
        os.path.join(
            os.environ["TRANSFORMERS_CACHE"], 
            "Nous-Capybara-34B-GGUF/nous-capybara-34b.Q4_0.gguf"
        ),
        **{"n_gpu_layers": 15000, "n_ctx": 10 * 1024})
    tokenizer_data = lmellama.build_token_enforcer_tokenizer_data(llm)
    schema = {'properties': {'name': {'title': 'Name', 'type': 'string'}},
              'required': ['name'], 'title': 'person', 'type': 'object'}
    character_level_parser = lme.JsonSchemaParser(schema)
    logits_processors = LogitsProcessorList(
        [lmellama.build_llamacpp_logits_processor(
            tokenizer_data, character_level_parser)])
    prompt = ("What is the name of Michael Jackson? Answer in JSON "
            + "according to this schema: " + json.dumps(schema) + "\n")
    print("\n\nPrompt: " + prompt)
    output_1 = llm(prompt + "{", None) 
    print(output_1["choices"][0]["text"])
    output_2 = llm(prompt, 
            logits_processor=logits_processors)
    print(output_2["choices"][0]["text"])
    print("Raw output:")
    print(output_2)
    print("Test call finished")

test_lmenforcer_call()

It didn't however produce a lot of logging information from python. The full output is:

ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes
llama_model_loader: loaded meta data with 22 key-value pairs and 543 tensors from /data/cache/huggingface/models/Nous-Capybara-34B-GGUF/nous-capybara-34b.Q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = nousresearch_nous-capybara-34b
llama_model_loader: - kv   2:                       llama.context_length u32              = 200000
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 7168
llama_model_loader: - kv   4:                          llama.block_count u32              = 60
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 20480
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 56
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 5000000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 2
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,64000]   = ["<unk>", "<|startoftext|>", "<|endof...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,64000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,64000]   = [2, 3, 3, 3, 3, 3, 1, 1, 1, 3, 3, 3, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  18:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  19:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  20:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  121 tensors
llama_model_loader: - type q4_0:  421 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: mismatch in special tokens definition ( 498/64000 vs 267/64000 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 64000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 200000
llm_load_print_meta: n_embd           = 7168
llm_load_print_meta: n_head           = 56
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 60
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 7
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 20480
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 5000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 200000
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 30B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 34.39 B
llm_load_print_meta: model size       = 18.13 GiB (4.53 BPW) 
llm_load_print_meta: general.name     = nousresearch_nous-capybara-34b
llm_load_print_meta: BOS token        = 1 '<|startoftext|>'
llm_load_print_meta: EOS token        = 2 '<|endoftext|>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 0 '<unk>'
llm_load_print_meta: LF token         = 315 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.41 MiB
llm_load_tensors: offloading 60 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 61/61 layers to GPU
llm_load_tensors:        CPU buffer size =   246.09 MiB
llm_load_tensors:      CUDA0 buffer size = 18317.20 MiB
...................................................................................................
llama_new_context_with_model: n_ctx      = 10240
llama_new_context_with_model: freq_base  = 5000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      CUDA0 KV buffer size =  2400.00 MiB
llama_new_context_with_model: KV self size  = 2400.00 MiB, K (f16): 1200.00 MiB, V (f16): 1200.00 MiB
llama_new_context_with_model:  CUDA_Host input buffer size   =    34.04 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =  1300.20 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =    15.40 MiB
llama_new_context_with_model: graph splits (measure): 3
AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
Model metadata: {'tokenizer.ggml.add_eos_token': 'false', 'tokenizer.ggml.padding_token_id': '0', 'tokenizer.ggml.eos_token_id': '2', 'general.architecture': 'llama', 'llama.rope.freq_base': '5000000.000000', 'llama.context_length': '200000', 'general.name': 'nousresearch_nous-capybara-34b', 'tokenizer.ggml.add_bos_token': 'false', 'llama.embedding_length': '7168', 'llama.feed_forward_length': '20480', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '56', 'llama.block_count': '60', 'llama.attention.head_count_kv': '8', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '2'}

llama_print_timings:        load time =     231.02 ms
llama_print_timings:      sample time =       8.65 ms /    14 runs   (    0.62 ms per token,  1618.31 tokens per second)
llama_print_timings: prompt eval time =     230.78 ms /    65 tokens (    3.55 ms per token,   281.66 tokens per second)
llama_print_timings:        eval time =     329.81 ms /    13 runs   (   25.37 ms per token,    39.42 tokens per second)
llama_print_timings:       total time =    1644.01 ms /    78 tokens
Llama.generate: prefix-match hit
root - DEBUG - Received an invalid character '"', switching to ForceStopParser

llama_print_timings:        load time =     231.02 ms
llama_print_timings:      sample time =       4.31 ms /     7 runs   (    0.62 ms per token,  1623.00 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =     177.81 ms /     7 runs   (   25.40 ms per token,    39.37 tokens per second)
llama_print_timings:       total time =     213.71 ms /     8 tokens

Prompt: What is the name of Michael Jackson? Answer in JSON according to this schema: {"properties": {"name": {"title": "Name", "type": "string"}}, "required": ["name"], "title": "person", "type": "object"}

  "name": "Michael Jackson"
}</s>
{
  ""name
Raw output:
{'id': 'cmpl-81862638-5769-4d73-914e-4bf71ae333b5', 'object': 'text_completion', 'created': 1707750170, 'model': '/data/cache/huggingface/models/Nous-Capybara-34B-GGUF/nous-capybara-34b.Q4_0.gguf', 'choices': [{'text': '{\n  ""name', 'index': 0, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 64, 'completion_tokens': 6, 'total_tokens': 70}}
Test call finished
isamu-isozaki commented 4 months ago

@motaatmo not sure if this fixes it but can you try llamacpp 0.2.37? That fixed my issue

motaatmo commented 4 months ago

Yes - works for me as well, at least for the test case, thank you!

noamgat commented 4 months ago

It seems like in a later version, the default number of max_tokens was reduced, so I increased it in the sample notebook. Now llama.cpp functions correctly including in the sample notebook.