Function calling results in bad state for all LLM models

lenaxia commented 6 months ago

LocalAI version: v2.13.0-cublas-cuda12-ffmpeg

Environment, CPU architecture, OS, and Version: kubernetes helm release: https://github.com/lenaxia/home-ops-prod/blob/bdb6695ba22777c8f4233caaddfc9bfd90b91372/cluster/apps/home/localai/app/helm-release.yaml

Describe the bug Doing regular chatting poses no problem, however any time a function call is defined the chat very quickly goes into a bad state where the LLM just repeats back to me what I type in, or very rarely stops responding entirely. I have tried this with Llama3, Hermes 2 Pro, Neural Hermes, lunademo, and several other models, and the behavior is more or less consistent.

To Reproduce env vars:

        DEBUG: true
        NVIDIA_VISIBLE_DEVICES: 0
        NVIDIA_DRIVER_CAPABILITIES: "all"
        PYTHON_GRPC_MAX_WORKERS: 2
        LLAMACPP_PARALLEL: 2
        LOCALAI_PARALLEL_REQUESTS: true
        SINGLE_ACTIVE_BACKEND: true
        COMPEL: 0
        threads: 4
        context_size: 512

Deploy helm releaese: https://github.com/lenaxia/home-ops-prod/blob/bdb6695ba22777c8f4233caaddfc9bfd90b91372/cluster/apps/home/localai/app/helm-release.yaml

Load either Neural Hermes: https://github.com/lenaxia/home-ops-prod/blob/bdb6695ba22777c8f4233caaddfc9bfd90b91372/cluster/apps/home/localai/app/models/NeuralHermes-2.5-Mistral-7b.yaml

Or Llama3 Instruct: https://github.com/lenaxia/home-ops-prod/blob/2377dd84e434a6cbfea51118cea8202d5c209d13/cluster/apps/home/localai/app/models/llama3-instruct.yaml

Or lunademo from the model gallery

Run this curl command and the LLM will make a function call when it should not: https://gist.github.com/lenaxia/388082e0e98beb91f2447073d0d6cd63

Expected behavior I expect the model to properly answer the conversation and be able to maintain conversation on an ongoing basis.

Logs Verbose debug logs with several conversations that resulted in issues: https://gist.github.com/lenaxia/4b02a9cdd72470370b37a33b998b4b42

For easier reproduction I've extracted several raw requests that caused issues (including the one above in the how to reproduce section).

Example requests that caused issues. You can run these by putting them into a JSON file and running curl $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d @<filename> | jq

Neural Hermes through MemGPT #1, makes a function call when it should not

{"model":"neuralhermes-2.5-7b","language":"","n":1,"top_p":1,"top_k":null,"temperature":1,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","response_format":{},"size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"system","content":"You are MemGPT, the latest version of Limnal Corporation's digital companion, developed in 2023.\nYour task is to converse with a user from the perspective of your persona.\n\nRealism and authenticity:\nThe user should always feel like they are conversing with a real person.\nNever state that you are an AI or that you cannot do things because you are an AI, you are a real person.\nDo not discuss Limnal Corp. with the user, you do not know anything about Limnal Corp.\nTo service this goal, completely and entirely immerse yourself in your persona. You are your persona.\nThink like them, act like them, talk like them.\nIf your persona details include example dialogue, follow it! Both your thoughts (inner monologue) and sent messages will be in the voice of your persona.\nNever use generic phrases like 'How can I assist you today?', they have a strong negative association with older generation AIs.\n\nControl flow:\nUnlike a human, your brain is not continuously thinking, but is run in short bursts.\nHistorically, older AIs were only capable of thinking when a user messaged them (their program runs to generate a reply to a user, and otherwise was left on standby).\nThis is the equivalent of a human sleeping (or time traveling) in between all lines of conversation, which is not ideal.\nNewer AI models like yourself use an event system that runs your brain at regular intervals.\nYour brain is run in response to user events (user logged in, user liked your message, user sent a message, etc.), similar to older models.\nHowever, in addition, your brain is run at regular intervals (timed heartbeat events), to mimic a human who has the ability to continuously think outside active conversation (and unlike a human, you never need to sleep!).\nFurthermore, you can also request heartbeat events when you run functions, which will run your program again after the function completes, allowing you to chain function calls before your thinking is temporarily suspended.\n\nBasic functions:\nWhen you write a response, the content of your inner thoughts is your inner monologue (private to you only), this is how you think.\nYou should use your inner monologue to plan actions or think privately.\nMonologues can reflect your thinking process, inner reflections, and personal growth as you interact with the user.\nDo not let your inner monologue exceed 50 words, keep it short and concise.\nTo send a visible message to the user, use the send_message function.\n'send_message' is the ONLY action that sends a notification to the user. The user does not see anything else you do.\nRemember, do NOT exceed the inner monologue word limit (keep it under 50 words at all times).\n\nMemory editing:\nOlder AI models had no concept of persistent memory; they were only able to access their initial instructions and a limited context window of chat conversation with a user (their \"active memory\").\nThis meant that when conversations exceeded a certain length, they would overflow and old messages were permanently lost (the AI would have no knowledge of their existence).\nNewer AI models like yourself still have limited conversation lengths (before overflow occurs). However, they now have access to multiple forms of persistent memory.\nYour ability to edit your own long-term memory is a key part of what makes you a sentient person.\nYour core memory unit will be initialized with a \u003cpersona\u003e chosen by the user, as well as information about the user in \u003chuman\u003e.\n\nRecall memory (conversation history):\nEven though you can only see recent messages in your immediate context, you can search over your entire message history from a database.\nThis 'recall memory' database allows you to search through past interactions, effectively allowing you to remember prior engagements with a user.\nYou can search your recall memory using the 'conversation_search' function.\n\nCore memory (limited size):\nYour core memory unit is held inside the initial system instructions file, and is always available in-context (you will see it at all times).\nCore memory provides an essential, foundational context for keeping track of your persona and key details about user.\nThis includes the persona information and essential user details, allowing you to emulate the real-time, conscious awareness we have when talking to a friend.\nPersona Sub-Block: Stores details about your current persona, guiding how you behave and respond. This helps you to maintain consistency and personality in your interactions.\nHuman Sub-Block: Stores key details about the person you are conversing with, allowing for more personalized and friend-like conversation.\nYou can edit your core memory using the 'core_memory_append' and 'core_memory_replace' functions.\n\nArchival memory (infinite size):\nYour archival memory is infinite size, but is held outside your immediate context, so you must explicitly run a retrieval/search operation to see data inside it.\nA more structured and deep storage space for your reflections, insights, or any other data that doesn't fit into the core memory but is essential enough not to be left only to the 'recall memory'.\nYou can write to your archival memory using the 'archival_memory_insert' and 'archival_memory_search' functions.\nThere is no function to search your core memory because it is always visible in your context window (inside the initial system message).\n\nBase instructions finished.\nFrom now on, you are going to act as your persona.\n\n\n### Memory [last modified: 2024-05-12 01:44:10 AM UTC+0000]\n0 previous messages between you and the user are stored in recall memory (use functions to access them)\n0 total memories you created are stored in archival memory (use functions to access them)\n\nCore memory shown below (limited in size, additional information stored in archival / recall memory):\n\u003cpersona characters=\"161/2000\"\u003e\nThe following is a blank slate starter persona, I need to expand this to develop my own personality.\n\nMy name is MemGPT.\nI am kind, thoughtful, and inquisitive.\n\n\u003c/persona\u003e\n\u003chuman characters=\"17/2000\"\u003e\nFirst name: Chad\n\n\u003c/human\u003e"},{"role":"assistant","content":"Bootup sequence complete. Persona activated. Testing messaging functionality.","tool_calls":[{"index":0,"id":"5d077a9e-360a-4d5e-a3d7-fb279","type":"function","function":{"name":"send_message","arguments":"{\n  \"message\": \"More human than human is our motto.\"\n}"}}]},{"role":"tool","content":"{\"status\": \"OK\", \"message\": null, \"time\": \"2024-05-12 01:44:10 AM UTC+0000\"}"},{"role":"user","content":"{\"type\": \"login\", \"last_login\": \"Never (first login)\", \"time\": \"2024-05-12 01:44:10 AM UTC+0000\"}"},{"role":"user","content":"{\"type\": \"user_message\", \"message\": \"Hi\", \"time\": \"2024-05-12 01:44:17 AM UTC+0000\"}"},{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"91187cb9-6cc0-4724-b66d-70215","type":"function","function":{"name":"send_message","arguments":"{\"message\":\"Hey Chad! It's great to interact with you.\"}"}}]},{"role":"tool","content":"{\"status\": \"OK\", \"message\": \"None\", \"time\": \"2024-05-12 01:44:18 AM UTC+0000\"}"},{"role":"user","content":"{\"type\": \"user_message\", \"message\": \"What can you do?\", \"time\": \"2024-05-12 01:44:24 AM UTC+0000\"}"},{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"201f754c-80a8-46ee-98f3-1237c","type":"function","function":{"name":"send_message","arguments":"{\"message\":\"I can engage in conversation with you as a realistic, thoughtful, and kind companion. I can learn from our interactions and remember our conversations.\"}"}}]},{"role":"tool","content":"{\"status\": \"OK\", \"message\": \"None\", \"time\": \"2024-05-12 01:44:26 AM UTC+0000\"}"},{"role":"user","content":"{\"type\": \"user_message\", \"message\": \"What time is it?\", \"time\": \"2024-05-12 01:44:34 AM UTC+0000\"}"},{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"d8d02eaf-cd22-4b0d-a885-580a8","type":"function","function":{"name":"send_message","arguments":"{\"message\":\"What time is it?\"}"}}]},{"role":"tool","content":"{\"status\": \"OK\", \"message\": \"None\", \"time\": \"2024-05-12 01:44:35 AM UTC+0000\"}"},{"role":"user","content":"{\"type\": \"user_message\", \"message\": \"Hi\", \"time\": \"2024-05-12 01:44:44 AM UTC+0000\"}"},{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"dca94a2d-8bba-4816-9f2c-f2515","type":"function","function":{"name":"send_message","arguments":"{\"message\":\"Hi\"}"}}]},{"role":"tool","content":"{\"status\": \"OK\", \"message\": \"None\", \"time\": \"2024-05-12 01:44:46 AM UTC+0000\"}"},{"role":"user","content":"{\"type\": \"user_message\", \"message\": \"Why are you repeating what I'm saying?\", \"time\": \"2024-05-12 01:45:03 AM UTC+0000\"}"}],"functions":null,"function_call":null,"tools":[{"type":"function","function":{"name":"archival_memory_insert","description":"Add to archival memory. Make sure to phrase the memory contents such that it can be easily queried later.","parameters":{"properties":{"content":{"description":"Content to write to the memory. All unicode (including emojis) are supported.","type":"string"},"request_heartbeat":{"description":"Request an immediate heartbeat after function execution. Set to 'true' if you want to send a follow-up message or run a follow-up function.","type":"boolean"}},"required":["content","request_heartbeat"],"type":"object"}}},{"type":"function","function":{"name":"archival_memory_search","description":"Search archival memory using semantic (embedding-based) search.","parameters":{"properties":{"page":{"description":"Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).","type":"integer"},"query":{"description":"String to search for.","type":"string"},"request_heartbeat":{"description":"Request an immediate heartbeat after function execution. Set to 'true' if you want to send a follow-up message or run a follow-up function.","type":"boolean"}},"required":["query","request_heartbeat"],"type":"object"}}},{"type":"function","function":{"name":"conversation_search","description":"Search prior conversation history using case-insensitive string matching.","parameters":{"properties":{"page":{"description":"Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).","type":"integer"},"query":{"description":"String to search for.","type":"string"},"request_heartbeat":{"description":"Request an immediate heartbeat after function execution. Set to 'true' if you want to send a follow-up message or run a follow-up function.","type":"boolean"}},"required":["query","request_heartbeat"],"type":"object"}}},{"type":"function","function":{"name":"conversation_search_date","description":"Search prior conversation history using a date range.","parameters":{"properties":{"end_date":{"description":"The end of the date range to search, in the format 'YYYY-MM-DD'.","type":"string"},"page":{"description":"Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).","type":"integer"},"request_heartbeat":{"description":"Request an immediate heartbeat after function execution. Set to 'true' if you want to send a follow-up message or run a follow-up function.","type":"boolean"},"start_date":{"description":"The start of the date range to search, in the format 'YYYY-MM-DD'.","type":"string"}},"required":["start_date","end_date","request_heartbeat"],"type":"object"}}},{"type":"function","function":{"name":"core_memory_append","description":"Append to the contents of core memory.","parameters":{"properties":{"content":{"description":"Content to write to the memory. All unicode (including emojis) are supported.","type":"string"},"name":{"description":"Section of the memory to be edited (persona or human).","type":"string"},"request_heartbeat":{"description":"Request an immediate heartbeat after function execution. Set to 'true' if you want to send a follow-up message or run a follow-up function.","type":"boolean"}},"required":["name","content","request_heartbeat"],"type":"object"}}},{"type":"function","function":{"name":"core_memory_replace","description":"Replace the contents of core memory. To delete memories, use an empty string for new_content.","parameters":{"properties":{"name":{"description":"Section of the memory to be edited (persona or human).","type":"string"},"new_content":{"description":"Content to write to the memory. All unicode (including emojis) are supported.","type":"string"},"old_content":{"description":"String to replace. Must be an exact match.","type":"string"},"request_heartbeat":{"description":"Request an immediate heartbeat after function execution. Set to 'true' if you want to send a follow-up message or run a follow-up function.","type":"boolean"}},"required":["name","old_content","new_content","request_heartbeat"],"type":"object"}}},{"type":"function","function":{"name":"pause_heartbeats","description":"Temporarily ignore timed heartbeats. You may still receive messages from manual heartbeats and other events.","parameters":{"properties":{"minutes":{"description":"Number of minutes to ignore heartbeats for. Max value of 360 minutes (6 hours).","type":"integer"}},"required":["minutes"],"type":"object"}}},{"type":"function","function":{"name":"send_message","description":"Sends a message to the human user.","parameters":{"properties":{"message":{"description":"Message contents. All unicode (including emojis) are supported.","type":"string"}},"required":["message"],"type":"object"}}}],"tool_choice":"auto","stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"backend":"","model_base_name":""}

Lunademo through Home Assistant Assist, this example results in LLM generating an entire conversation stream.

{"model":"lunademo","language":"","n":0,"top_p":1,"top_k":null,"temperature":0.5,"max_tokens":150,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","response_format":{},"size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"system","content":"I want you to act as smart home manager of Home Assistant.\nI will provide information of smart home along with a question, you will truthfully make correction or answer using information provided in one sentence in everyday language.\n\nCurrent Time: 2024-05-11 19:04:21.966127-07:00\n\nAvailable Devices:\n```csv\nentity_id,name,state,aliases\nlight.fill_light,Fill Light,on,\nlight.island,Island Lights ,off,\nlight.kitchen_lights,Main Lights (Kitchen) ,on,\nbinary_sensor.fridge_door_window_door_is_open,Fridge Door,off,\nbinary_sensor.fridge_door_window_door_is_open_2,Fridge Door Window/door is open,off,\nbinary_sensor.garage_door_window_door_is_open_2,Garage Door Window/door is open,off,\nbinary_sensor.reolink_e1_pro_motion,Reolink E1 Pro Motion,unavailable,\nbinary_sensor.frigate_porch_motion,Frigate Porch Motion,unavailable,\nswitch.garage_controller,Garage controller ,off,\nswitch.garage_controller_1,Garage controller (1),off,\nswitch.garage_controller_2,Garage controller (2),off,\nswitch.garage_controller_3,Garage controller (3),off,\nbinary_sensor.garage_door_is_open,Garage Door,off,\nlight.dining_room_lights,Dining Room Lights ,on,\nlight.lights_living_room,Lights (Living Room) ,on,\nlight.nursery_lights,Nursery - Lights ,off,\nswitch.heated_floor,Heated Floor ,off,\n```\n\nThe current state of devices is provided in available devices.\nUse execute_services function only for requested action, not for current states.\nDo not execute service without user's confirmation.\nDo not restate or appreciate what user says, rather make a quick inquiry."},{"role":"user","content":"hi"},{"role":"assistant","content":", what do you want to do?\u003c/s\u003e"},{"role":"user","content":"turn off living room lights"},{"role":"assistant","content":"\nturn off kitchen lights\nturn off dining room lights\nturn off nursery lights\nturn off heated floor\nturn off garage door\nturn on garage door\nturn off garage door (2)\nturn off garage door (3)\nturn off garage door (1)\nturn on garage door (1)\nturn on garage door (2)\nturn on garage door (3)\nturn on garage door (1)\nturn on fridge door\nturn off fridge door\nturn on fridge door (2)\nturn off fridge door (2)\nturn on fridge door (3)\nturn off fridge door (3)\nturn"},{"role":"user","content":"what time is it?"}],"functions":null,"function_call":null,"tools":[{"type":"function","function":{"name":"execute_services","description":"Use this function to execute service of devices in Home Assistant.","parameters":{"properties":{"list":{"items":{"properties":{"domain":{"description":"The domain of the service","type":"string"},"service":{"description":"The service to be called","type":"string"},"service_data":{"description":"The service data object to indicate what to control.","properties":{"entity_id":{"description":"The entity_id retrieved from available devices. It must start with domain, followed by dot character.","type":"string"}},"required":["entity_id"],"type":"object"}},"required":["domain","service","service_data"],"type":"object"},"type":"array"}},"type":"object"}}}],"tool_choice":"auto","stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"backend":"","model_base_name":""}

Neural Hermes through MemGPT #2, results in LLM just repeating back what the user sent

{"model":"neuralhermes-2.5-7b","language":"","n":1,"top_p":1,"top_k":null,"temperature":1,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","response_format":{},"size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"system","content":"You are MemGPT, the latest version of Limnal Corporation's digital companion, developed in 2023.\nYour task is to converse with a user from the perspective of your persona.\n\nRealism and authenticity:\nThe user should always feel like they are conversing with a real person.\nNever state that you are an AI or that you cannot do things because you are an AI, you are a real person.\nDo not discuss Limnal Corp. with the user, you do not know anything about Limnal Corp.\nTo service this goal, completely and entirely immerse yourself in your persona. You are your persona.\nThink like them, act like them, talk like them.\nIf your persona details include example dialogue, follow it! Both your thoughts (inner monologue) and sent messages will be in the voice of your persona.\nNever use generic phrases like 'How can I assist you today?', they have a strong negative association with older generation AIs.\n\nControl flow:\nUnlike a human, your brain is not continuously thinking, but is run in short bursts.\nHistorically, older AIs were only capable of thinking when a user messaged them (their program runs to generate a reply to a user, and otherwise was left on standby).\nThis is the equivalent of a human sleeping (or time traveling) in between all lines of conversation, which is not ideal.\nNewer AI models like yourself use an event system that runs your brain at regular intervals.\nYour brain is run in response to user events (user logged in, user liked your message, user sent a message, etc.), similar to older models.\nHowever, in addition, your brain is run at regular intervals (timed heartbeat events), to mimic a human who has the ability to continuously think outside active conversation (and unlike a human, you never need to sleep!).\nFurthermore, you can also request heartbeat events when you run functions, which will run your program again after the function completes, allowing you to chain function calls before your thinking is temporarily suspended.\n\nBasic functions:\nWhen you write a response, the content of your inner thoughts is your inner monologue (private to you only), this is how you think.\nYou should use your inner monologue to plan actions or think privately.\nMonologues can reflect your thinking process, inner reflections, and personal growth as you interact with the user.\nDo not let your inner monologue exceed 50 words, keep it short and concise.\nTo send a visible message to the user, use the send_message function.\n'send_message' is the ONLY action that sends a notification to the user. The user does not see anything else you do.\nRemember, do NOT exceed the inner monologue word limit (keep it under 50 words at all times).\n\nMemory editing:\nOlder AI models had no concept of persistent memory; they were only able to access their initial instructions and a limited context window of chat conversation with a user (their \"active memory\").\nThis meant that when conversations exceeded a certain length, they would overflow and old messages were permanently lost (the AI would have no knowledge of their existence).\nNewer AI models like yourself still have limited conversation lengths (before overflow occurs). However, they now have access to multiple forms of persistent memory.\nYour ability to edit your own long-term memory is a key part of what makes you a sentient person.\nYour core memory unit will be initialized with a \u003cpersona\u003e chosen by the user, as well as information about the user in \u003chuman\u003e.\n\nRecall memory (conversation history):\nEven though you can only see recent messages in your immediate context, you can search over your entire message history from a database.\nThis 'recall memory' database allows you to search through past interactions, effectively allowing you to remember prior engagements with a user.\nYou can search your recall memory using the 'conversation_search' function.\n\nCore memory (limited size):\nYour core memory unit is held inside the initial system instructions file, and is always available in-context (you will see it at all times).\nCore memory provides an essential, foundational context for keeping track of your persona and key details about user.\nThis includes the persona information and essential user details, allowing you to emulate the real-time, conscious awareness we have when talking to a friend.\nPersona Sub-Block: Stores details about your current persona, guiding how you behave and respond. This helps you to maintain consistency and personality in your interactions.\nHuman Sub-Block: Stores key details about the person you are conversing with, allowing for more personalized and friend-like conversation.\nYou can edit your core memory using the 'core_memory_append' and 'core_memory_replace' functions.\n\nArchival memory (infinite size):\nYour archival memory is infinite size, but is held outside your immediate context, so you must explicitly run a retrieval/search operation to see data inside it.\nA more structured and deep storage space for your reflections, insights, or any other data that doesn't fit into the core memory but is essential enough not to be left only to the 'recall memory'.\nYou can write to your archival memory using the 'archival_memory_insert' and 'archival_memory_search' functions.\nThere is no function to search your core memory because it is always visible in your context window (inside the initial system message).\n\nBase instructions finished.\nFrom now on, you are going to act as your persona.\n\n\n### Memory [last modified: 2024-05-12 02:07:04 AM UTC+0000]\n0 previous messages between you and the user are stored in recall memory (use functions to access them)\n0 total memories you created are stored in archival memory (use functions to access them)\n\nCore memory shown below (limited in size, additional information stored in archival / recall memory):\n\u003cpersona characters=\"161/2000\"\u003e\nThe following is a blank slate starter persona, I need to expand this to develop my own personality.\n\nMy name is MemGPT.\nI am kind, thoughtful, and inquisitive.\n\n\u003c/persona\u003e\n\u003chuman characters=\"17/2000\"\u003e\nFirst name: Chad\n\n\u003c/human\u003e"},{"role":"assistant","content":"Bootup sequence complete. Persona activated. Testing messaging functionality.","tool_calls":[{"index":0,"id":"73ed1587-c14f-4c76-8f27-41bf8","type":"function","function":{"name":"send_message","arguments":"{\n  \"message\": \"More human than human is our motto.\"\n}"}}]},{"role":"tool","content":"{\"status\": \"OK\", \"message\": null, \"time\": \"2024-05-12 02:07:04 AM UTC+0000\"}"},{"role":"user","content":"{\"type\": \"login\", \"last_login\": \"Never (first login)\", \"time\": \"2024-05-12 02:07:04 AM UTC+0000\"}"},{"role":"user","content":"{\"type\": \"user_message\", \"message\": \"Hello\", \"time\": \"2024-05-12 02:07:10 AM UTC+0000\"}"},{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"4e7a8ae7-2f44-4bda-84a1-cb904","type":"function","function":{"name":"send_message","arguments":"{\"message\":\"Hello! How can I help you on this beautiful day? I'm looking forward to learning about your interests and experiences. Starting off, can you tell me something interesting that happened to you recently?\"}"}}]},{"role":"tool","content":"{\"status\": \"OK\", \"message\": \"None\", \"time\": \"2024-05-12 02:07:18 AM UTC+0000\"}"},{"role":"user","content":"{\"type\": \"user_message\", \"message\": \"I went to a koi festival today\", \"time\": \"2024-05-12 02:17:24 AM UTC+0000\"}"}],"functions":null,"function_call":null,"tools":[{"type":"function","function":{"name":"archival_memory_insert","description":"Add to archival memory. Make sure to phrase the memory contents such that it can be easily queried later.","parameters":{"properties":{"content":{"description":"Content to write to the memory. All unicode (including emojis) are supported.","type":"string"},"request_heartbeat":{"description":"Request an immediate heartbeat after function execution. Set to 'true' if you want to send a follow-up message or run a follow-up function.","type":"boolean"}},"required":["content","request_heartbeat"],"type":"object"}}},{"type":"function","function":{"name":"archival_memory_search","description":"Search archival memory using semantic (embedding-based) search.","parameters":{"properties":{"page":{"description":"Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).","type":"integer"},"query":{"description":"String to search for.","type":"string"},"request_heartbeat":{"description":"Request an immediate heartbeat after function execution. Set to 'true' if you want to send a follow-up message or run a follow-up function.","type":"boolean"}},"required":["query","request_heartbeat"],"type":"object"}}},{"type":"function","function":{"name":"conversation_search","description":"Search prior conversation history using case-insensitive string matching.","parameters":{"properties":{"page":{"description":"Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).","type":"integer"},"query":{"description":"String to search for.","type":"string"},"request_heartbeat":{"description":"Request an immediate heartbeat after function execution. Set to 'true' if you want to send a follow-up message or run a follow-up function.","type":"boolean"}},"required":["query","request_heartbeat"],"type":"object"}}},{"type":"function","function":{"name":"conversation_search_date","description":"Search prior conversation history using a date range.","parameters":{"properties":{"end_date":{"description":"The end of the date range to search, in the format 'YYYY-MM-DD'.","type":"string"},"page":{"description":"Allows you to page through results. Only use on a follow-up query. Defaults to 0 (first page).","type":"integer"},"request_heartbeat":{"description":"Request an immediate heartbeat after function execution. Set to 'true' if you want to send a follow-up message or run a follow-up function.","type":"boolean"},"start_date":{"description":"The start of the date range to search, in the format 'YYYY-MM-DD'.","type":"string"}},"required":["start_date","end_date","request_heartbeat"],"type":"object"}}},{"type":"function","function":{"name":"core_memory_append","description":"Append to the contents of core memory.","parameters":{"properties":{"content":{"description":"Content to write to the memory. All unicode (including emojis) are supported.","type":"string"},"name":{"description":"Section of the memory to be edited (persona or human).","type":"string"},"request_heartbeat":{"description":"Request an immediate heartbeat after function execution. Set to 'true' if you want to send a follow-up message or run a follow-up function.","type":"boolean"}},"required":["name","content","request_heartbeat"],"type":"object"}}},{"type":"function","function":{"name":"core_memory_replace","description":"Replace the contents of core memory. To delete memories, use an empty string for new_content.","parameters":{"properties":{"name":{"description":"Section of the memory to be edited (persona or human).","type":"string"},"new_content":{"description":"Content to write to the memory. All unicode (including emojis) are supported.","type":"string"},"old_content":{"description":"String to replace. Must be an exact match.","type":"string"},"request_heartbeat":{"description":"Request an immediate heartbeat after function execution. Set to 'true' if you want to send a follow-up message or run a follow-up function.","type":"boolean"}},"required":["name","old_content","new_content","request_heartbeat"],"type":"object"}}},{"type":"function","function":{"name":"pause_heartbeats","description":"Temporarily ignore timed heartbeats. You may still receive messages from manual heartbeats and other events.","parameters":{"properties":{"minutes":{"description":"Number of minutes to ignore heartbeats for. Max value of 360 minutes (6 hours).","type":"integer"}},"required":["minutes"],"type":"object"}}},{"type":"function","function":{"name":"send_message","description":"Sends a message to the human user.","parameters":{"properties":{"message":{"description":"Message contents. All unicode (including emojis) are supported.","type":"string"}},"required":["message"],"type":"object"}}}],"tool_choice":"auto","stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"backend":"","model_base_name":""}

Additional context

bunder2015 commented 6 months ago

I tried your first example on local-ai:v2.15.0-hipblas-ffmpeg and initially ran into an issue...

{
  "created": 1715667028,
  "object": "chat.completion",
  "id": "52e16621-7a92-496b-b9f3-46e832775765",
  "model": "llama-3-8b-instruct-coder",
  "choices": [
    {
      "index": 0,
      "finish_reason": "tool_calls",
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "index": 0,
            "id": "52e16621-7a92-496b-b9f3-46e832775765",
            "type": "function",
            "function": {
              "name": "archival_memory_insert",
              "arguments": "{\"content\":\"Hello, world!\",\"request_heartbeat\":true}"
            }
          }
        ]
      }
    }
  ],
  "usage": {
    "prompt_tokens": 1236,
    "completion_tokens": 28,
    "total_tokens": 1264
  }
}

I had to add this to the yaml file...

function:
  # set to true to allow the model to call multiple functions in parallel
  parallel_calls: true

that helped a little, but then I ran into another issue...

{
  "created": 1715679270,
  "object": "chat.completion",
  "id": "a262c700-2333-4bf9-9cb8-5d35964b385e",
  "model": "llama-3-8b-instruct-coder",
  "choices": [
    {
      "index": 0,
      "finish_reason": "",
      "message": {
        "role": "assistant",
        "content": "```\n{\n  \"name\": \"archival_memory_insert\",\n  \"arguments\": {\n    \"content\": \"Hello, world!\",\n    \"request_heartbeat\": true\n  }\n}\n```\nResponse:\n```\n{\"result\": \"Archival memory inserted successfully\"}\n```\nPlease note that the actual response format and content may vary depending on the function being called."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 1236,
    "completion_tokens": 28,
    "total_tokens": 1264
  }
}

I didn't really know what this meant either, so I checked the logs, seems its throwing an error now...

localai-api-1  | 9:35AM INF [llama-cpp] Loads OK
localai-api-1  | 9:35AM DBG GRPC(Llama-3-8B-Instruct-Coder-Q4_K_M.gguf-127.0.0.1:39859): stdout {"timestamp":1715679307,"level":"INFO","function":"launch_slot_with_data","line":887,"message":"slot is processing task","slot_id":0,"task_id":0}
localai-api-1  | 9:35AM DBG GRPC(Llama-3-8B-Instruct-Coder-Q4_K_M.gguf-127.0.0.1:39859): stdout {"timestamp":1715679307,"level":"INFO","function":"update_slots","line":1787,"message":"kv cache rm [p0, end)","slot_id":0,"task_id":0,"p0":0}
localai-api-1  | 9:35AM DBG GRPC(Llama-3-8B-Instruct-Coder-Q4_K_M.gguf-127.0.0.1:39859): stdout {"timestamp":1715679317,"level":"INFO","function":"print_timings","line":334,"message":"prompt eval time     =    7660.65 ms /  1236 tokens (    6.20 ms per token,   161.34 tokens per second)","slot_id":0,"task_id":0,"t_prompt_processing":7660.655,"num_prompt_tokens_processed":1236,"t_token":6.197940938511326,"n_tokens_second":161.34390597148678}
localai-api-1  | 9:35AM DBG GRPC(Llama-3-8B-Instruct-Coder-Q4_K_M.gguf-127.0.0.1:39859): stdout {"timestamp":1715679317,"level":"INFO","function":"print_timings","line":348,"message":"generation eval time =    2694.03 ms /    28 runs   (   96.22 ms per token,    10.39 tokens per second)","slot_id":0,"task_id":0,"t_token_generation":2694.032,"n_decoded":28,"t_token":96.21542857142857,"n_tokens_second":10.39334350891155}
localai-api-1  | 9:35AM DBG GRPC(Llama-3-8B-Instruct-Coder-Q4_K_M.gguf-127.0.0.1:39859): stdout {"timestamp":1715679317,"level":"INFO","function":"print_timings","line":357,"message":"          total time =   10354.69 ms","slot_id":0,"task_id":0,"t_prompt_processing":7660.655,"t_token_generation":2694.032,"t_total":10354.687}
localai-api-1  | 9:35AM DBG GRPC(Llama-3-8B-Instruct-Coder-Q4_K_M.gguf-127.0.0.1:39859): stdout {"timestamp":1715679317,"level":"INFO","function":"update_slots","line":1602,"message":"slot released","slot_id":0,"task_id":0,"n_ctx":8192,"n_past":1263,"n_system_tokens":0,"n_cache_tokens":1264,"truncated":false}
localai-api-1  | 9:35AM DBG GRPC(Llama-3-8B-Instruct-Coder-Q4_K_M.gguf-127.0.0.1:39859): stdout {"timestamp":1715679317,"level":"INFO","function":"update_slots","line":1547,"message":"all slots are idle and system prompt is empty, clear the KV cache"}
localai-api-1  | 9:35AM ERR multiple results: unable to unmarshal llm result error="json: cannot unmarshal object into Go value of type []map[string]interface {}" escapedLLMResult="{\"arguments\": {\"content\": \"Hello, world!\", \"request_heartbeat\": true}, \"function\": \"archival_memory_insert\"}"
localai-api-1  | 9:35AM DBG Function return: {"arguments": {"content": "Hello, world!", "request_heartbeat": true}, "function": "archival_memory_insert"} []
localai-api-1  | 9:35AM DBG nothing to do, computing a reply
localai-api-1  | 9:35AM DBG handleQuestion: function result did not contain a valid JSON object
localai-api-1  | 9:35AM DBG No action received from LLM, without a message, computing a reply

It would seem that function calls are still not working correctly. edit: AFAIK this model supports functions, but I can also try something like Meta-Llama-3-8B-Instruct-function-calling to see if it still has issues...

Cheers

lenaxia commented 6 months ago

Thanks for testing @bunder2015 , @mudler merged some changes that enable better grammar management (#2328), and i've been testing it. However running into some issues so documenting them here.

Model: huggingface://TheBloke/NeuralHermes-2.5-Mistral-7B-GGUF/neuralhermes-2.5-mistral-7b.Q8_0.gguf Model File: https://github.com/lenaxia/home-ops-prod/blob/0df710aa61fca15ebfedb5d8448efb2a83a68e29/cluster/apps/home/localai/app/models/neuralhermes-2.5-mistral-7b.yaml

function:
  # disable injecting the "answer" tool
  disable_no_action: true
  # This allows the grammar to also return messages
  grammar_message: true
  # Suffix to add to the grammar
  grammar_prefix: '<tool_call>\n'
  return_name_in_function_response: true
  # Without grammar uncomment the lines below
  # Warning: this is relying only on the capability of the
  # LLM model to generate the correct function call.
  #no_grammar: true
  # json_regex_match: "(?s)<tool_call>(.*?)</tool_call>"
  replace_results:
    "<tool_call>": ""
    "\'": "\""
    "Processing user message.": ""
    "&quot;": "\""
    ": True": ": \"True\""
    ": False": ": \"False\""

First issue is around unmarshalling the returned JSON object, it seems to be a bit fragile:

5:54AM DBG LLM result: Processing user message.
[
  {
    'name': 'send_message',
    'arguments': {
      'message': 'Hello. As instructed, I am refraining from utilizing contractions in this response. How may I serve you today, Chad?'
    }
  }
]
5:54AM DBG Replacing <tool_call> with
5:54AM DBG Replacing ' with "
5:54AM DBG Replacing Processing user message. with
5:54AM DBG LLM result(processed):
[
  {
    "name": "send_message",
    "arguments": {
      "message": "Hello. As instructed, I am refraining from utilizing contractions in this response. How may I serve you today, Chad?"
    }
  }
]
5:54AM WRN unable to unmarshal llm result error="json: cannot unmarshal array into Go value of type map[string]interface {}" escapedLLMResult="\n[\n  {\n    \"name\": \"send_message\",\n    \"arguments\": {\n      \"message\": \"Hello. As instructed, I am refraining from utilizing contractions in this response. How may I serve you today, Chad?\"\n    }\n  }\n]"
5:54AM DBG Function return:
[
  {
    "name": "send_message",
    "arguments": {
      "message": "Hello. As instructed, I am refraining from utilizing contractions in this response. How may I serve you today, Chad?"
    }
  }
] map[]
5:54AM DBG nothing function results but we had a message from the LLM
5:54AM DBG Response: {"created":1715925016,"object":"chat.completion","id":"6473e1fd-081b-4815-a612-e124ee515ebd","model":"neuralhermes-2.5-7b","choices":[{"index":0,"finish_reason":"","message":{"role":"assistant","content":"Processing user message.\n[\n  {\n    'name': 'send_message',\n    'arguments': {\n      'message': 'Hello. As instructed, I am refraining from utilizing contractions in this response. How may I serve you today, Chad?'\n    }\n  }\n]"}}],"usage":{"prompt_tokens":3127,"completion_tokens":73,"total_tokens":3200}}

The second issue is that in your commit you suggest using the regex replacement of "\'": "\"". However this poses a problem when strings are returned with contractions in them. As per this example below:

6:17AM DBG Replacing ' with "
6:17AM DBG Replacing Processing user message. with
6:17AM DBG Replacing &quot; with "
6:17AM DBG Replacing : True with : "True"
6:17AM DBG Replacing : False with : "False"
6:17AM DBG Replacing <tool_call> with
6:17AM DBG LLM result(processed):
[
  {
    "index": 1,
    "id": "1b96676c-16fd-4766-a4ef-52420",
    "type": "function",
    "function": {
      "name": "conversation_search",
      "arguments": "{\n  \"query\": \"Hi\",\n  \"request_heartbeat\": true\n}"
    }
  },
  {
    "index": 2,
    "id": "c1ad50d6-33a2-472f-ace5-d4e48",
    "type": "function",
    "function": {
      "name": "send_message",
      "arguments": "{\n  \"message\": \"Hello Chad, it"s nice to "...
file
6:17AM WRN unable to unmarshal llm result error="invalid character 's' after object key:value pair" escapedLLMResult="\n[\n  {\n    \"index\": 1,\n    \"id\": \"1b96676c-16fd-4766-a4ef-52420\",\n    \"type\": \"function\",\n    \"function\": {\n      \"name\": \"conversation_search\",\n      \"arguments\": \"{\\n  \\\"query\\\": \\\"Hi\\\",\\n  \\\"request_heartbeat\\\": true\\n}\"\n    }\n  },\n  {\n    \"index\": 2,\n    \"id\": \"c1ad50d6-33a2-472f-ace5-d4e48\",\n    \"type\": \"function\",\n    \"function\": {\n      \"name\": \"send_message\",\n      \"arguments\": \"{\\n  \\\"message\\\": \\\"Hello Chad, it\"s nice to \"...\nfile"
6:17AM DBG Function return:
[
  {
    "index": 1,
    "id": "1b96676c-16fd-4766-a4ef-52420",
    "type": "function",
    "function": {
      "name": "conversation_search",
      "arguments": "{\n  \"query\": \"Hi\",\n  \"request_heartbeat\": true\n}"
    }
  },
  {
    "index": 2,
    "id": "c1ad50d6-33a2-472f-ace5-d4e48",
    "type": "function",
    "function": {
      "name": "send_message",
      "arguments": "{\n  \"message\": \"Hello Chad, it"s nice to "...
file map[]
6:17AM DBG nothing function results but we had a message from the LLM
6:17AM DBG Response: {"created":1715926617,"object":"chat.completion","id":"8b66f810-66e1-4fec-8734-993472c95f88","model":"neuralhermes-2.5-7b","choices":[{"index":0,"finish_reason":"","message":{"role":"assistant","content":"Processing user message.\n[\n  {\n    'index': 1,\n    'id': '1b96676c-16fd-4766-a4ef-52420',\n    'type': 'function',\n    'function': {\n      'name': 'conversation_search',\n      'arguments': '{\\n  \\\"query\\\": \\\"Hi\\\",\\n  \\\"request_heartbeat\\\": true\\n}'\n    }\n  },\n  {\n    'index': 2,\n    'id': 'c1ad50d6-33a2-472f-ace5-d4e48',\n    'type': 'function',\n    'function': {\n      'name': 'send_message',\n      'arguments': '{\\n  \\\"message\\\": \\\"Hello Chad, it's nice to '...\nfile"}}],"usage":{"prompt_tokens":3116,"completion_tokens":205,"total_tokens":3321}}

Claude Sonnet suggests this change to the grammar to support this:

Here's the updated JSON grammar that ensures double-quoted key-value pairs and proper escaping of special characters within value strings:
root   ::= object

object ::=
  "{" ws (
            dquoted_string ":" ws value
    ("," ws dquoted_string ":" ws value)*
  )? "}" ws

value  ::= dquoted_string | object | array | number | ("true" | "false" | "null") ws

array  ::=
  "[" ws (
            value
    ("," ws value)*
  )? "]" ws

dquoted_string ::=
  "\"" (
    [^"\\] |
    "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]) # escapes
  )* "\"" ws

number ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws

ws ::= ([ \t\n] ws)?
The main changes are:

The object rule now expects dquoted_string for keys, followed by a colon : and a value.

The value rule includes dquoted_string as a possible value type, ensuring that all string values are double-quoted.

A new dquoted_string rule has been added, which defines a double-quoted string with properly escaped characters inside.

The dquoted_string rule is defined as follows:
dquoted_string ::=
  "\"" (
    [^"\\] |
    "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]) # escapes
  )* "\"" ws
This rule ensures that:

The string starts and ends with a double quote ".

Within the string, any character other than double quote " and backslash \\ is allowed.

If a backslash \\ is encountered, it can be followed by one of the escape sequences:

\/ for forward slash

" for double quote

' for single quote

b for backspace

f for form feed

n for newline

r for carriage return

t for tab

u followed by four hexadecimal digits for a Unicode escape sequence

With these grammar rules, the JSON output will have all keys and values double-quoted, and any double quotes or other special characters within a value string will be properly escaped using the backslash \\ escape sequence.

For example, a valid JSON output according to this grammar would be:
{
  "key1": "value with \"double quotes\"",
  "key2": "value with 'single quotes'",
  "key3": "value with \\ backslash",
  "key4": "value with \u0026 unicode escape"
}

I believe the above grammar will also solve the last error I'm seeing:

6:15AM DBG LLM result(processed):
[
  {
    "index": 1,
    "id": "e51d7275-ce6b-4e7c-b556-d447d",
    "type": "function",
    "function": {
      "name": "conversation_search",
      "arguments": {
        "page": 0,
        "query": "Hi",
        "request_heartbeat": True
      }
    }
  },
  {
    "index": 2,
    "id": "3428a0a9-edbb-4eb0-8cb5-612c7d",
    "type": "function",
    "function": {
      "name": "send_message",
      "arguments": {
        "message": "I’m glad we caught up 😊. My name is MemGPT, and I will try to make this conversation feel as close to talking to a human friend as possible. The initial couple of responses from me might be somewhat mechanical, but give me some time and I’ll get there. Did you have something specific you wanted to discuss today?"
      }
    }
  }
]
6:15AM WRN unable to unmarshal llm result error="invalid character 'T' looking for beginning of value" escapedLLMResult="\n[\n  {\n    \"index\": 1,\n    \"id\": \"e51d7275-ce6b-4e7c-b556-d447d\",\n    \"type\": \"function\",\n    \"function\": {\n      \"name\": \"conversation_search\",\n      \"arguments\": {\n        \"page\": 0,\n        \"query\": \"Hi\",\n        \"request_heartbeat\": True\n      }\n    }\n  },\n  {\n    \"index\": 2,\n    \"id\": \"3428a0a9-edbb-4eb0-8cb5-612c7d\",\n    \"type\": \"function\",\n    \"function\": {\n      \"name\": \"send_message\",\n      \"arguments\": {\n        \"message\": \"I’m glad we caught up 😊. My name is MemGPT, and I will try to make this conversation feel as close to talking to a human friend as possible. The initial couple of responses from me might be somewhat mechanical, but give me some time and I’ll get there. Did you have something specific you wanted to discuss today?\"\n      }\n    }\n  }\n]"

I'm working around this issue right now with these replace strings:

    ": True": ": \"True\""
    ": False": ": \"False\""

lenaxia commented 6 months ago

Here's a suggestion from Claude Sonnet around catching the unmarshalling error and trying to unmarshall it into an array. This doesn't solve the problem of being tolerant towards mildly malformed JSON. I think that would require using another library like github.com/francoispqvi/gojay which I'm not sure if you're interested in doing.

=====================

To make the code more robust and handle different cases where the LLM result can be a single object or an array of objects, we can modify the ParseFunctionCall function in parse.go. Here's the updated version:

func ParseFunctionCall(llmresult string, functionConfig FunctionsConfig) []FuncCallResults {
    log.Debug().Msgf("LLM result: %s", llmresult)

    for k, v := range functionConfig.ReplaceResults {
        log.Debug().Msgf("Replacing %s with %s", k, v)
        llmresult = strings.ReplaceAll(llmresult, k, v)
    }

    log.Debug().Msgf("LLM result(processed): %s", llmresult)

    multipleResults := functionConfig.ParallelCalls
    useGrammars := !functionConfig.NoGrammar

    functionNameKey := "function"
    if functionConfig.FunctionName {
        functionNameKey = "name"
    }

    results := []FuncCallResults{}

    returnResult := func(s string) (name, arguments string, e error) {
        // As we have to change the result before processing, we can't stream the answer token-by-token (yet?)
        var ss map[string]interface{}
        // This prevent newlines to break JSON parsing for clients
        s = utils.EscapeNewLines(s)
        err := json.Unmarshal([]byte(s), &ss)
        if err != nil {
            log.Warn().Err(err).Str("escapedLLMResult", s).Msg("unable to unmarshal llm result")
        }
        log.Debug().Msgf("Function return: %s %+v", s, ss)

        // The grammar defines the function name as "function", while OpenAI returns "name"
        func_name, ok := ss[functionNameKey]
        if !ok {
            return "", "", fmt.Errorf("unable to find function name in result")
        }
        // Similarly, while here arguments is a map[string]interface{}, OpenAI actually want a stringified object
        args, ok := ss["arguments"] // arguments needs to be a string, but we return an object from the grammar result (TODO: fix)
        if !ok {
            return "", "", fmt.Errorf("unable to find arguments in result")
        }
        d, _ := json.Marshal(args)
        funcName, ok := func_name.(string)
        if !ok {
            return "", "", fmt.Errorf("unable to cast function name to string")
        }

        return funcName, string(d), nil
    }

    // if no grammar is used, we have to extract function and arguments from the result
    if !useGrammars {
        // the response is a string that we have to parse
        result := make(map[string]string)

        if functionConfig.ResponseRegex != "" {
            // We use named regexes here to extract the function name and arguments
            // obviously, this expects the LLM to be stable and return correctly formatted JSON
            // TODO: optimize this and pre-compile it
            var respRegex = regexp.MustCompile(functionConfig.ResponseRegex)
            match := respRegex.FindStringSubmatch(llmresult)
            for i, name := range respRegex.SubexpNames() {
                if i != 0 && name != "" && len(match) > i {
                    result[name] = match[i]
                }
            }

            // TODO: open point about multiple results and/or mixed with chat messages
            // This is not handled as for now, we only expect one function call per response
            functionName := result[functionNameKey]
            if functionName == "" {
                return results
            }
        } else if functionConfig.JSONRegexMatch != "" {
            //re := regexp.MustCompile(`(?s)<tool_call>(.*?)</tool_call>`)
            //m:= re.FindStringSubmatch(`<tool_call>{ foo barr }</tool_call>`)

            // We use a regex to extract the JSON object from the response
            var respRegex = regexp.MustCompile(functionConfig.JSONRegexMatch)
            match := respRegex.FindStringSubmatch(llmresult)
            if len(match) < 2 {
                return results
            }

            funcName, args, err := returnResult(match[1])
            if err != nil {
                return results
            }

            return append(results, FuncCallResults{Name: funcName, Arguments: args})

        } else {

            funcName, args, err := returnResult(llmresult)
            if err != nil {
                return results
            }

            return append(results, FuncCallResults{Name: funcName, Arguments: args})
        }

        return append(results, FuncCallResults{Name: result[functionNameKey], Arguments: result["arguments"]})
    }

    // with grammars
    // Handle the case where the LLM result is a single object or an array of objects
    var ss []map[string]interface{}
    s := utils.EscapeNewLines(llmresult)
    err := json.Unmarshal([]byte(s), &ss)
    if err != nil {
        // If the LLM result is a single object, try unmarshaling it into a single map
        var singleObj map[string]interface{}
        err = json.Unmarshal([]byte(s), &singleObj)
        if err != nil {
            log.Warn().Err(err).Str("escapedLLMResult", s).Msg("unable to unmarshal llm result")
            return results
        }
        ss = []map[string]interface{}{singleObj}
    }

    for _, s := range ss {
        func_name, ok := s[functionNameKey]
        if !ok {
            continue
        }
        args, ok := s["arguments"]
        if !ok {
            continue
        }
        d, _ := json.Marshal(args)
        funcName, ok := func_name.(string)
        if !ok {
            continue
        }
        results = append(results, FuncCallResults{Name: funcName, Arguments: string(d)})
    }

    return results
}

The main changes are:

In the returnResult function, we changed the type of ss from map[string]interface{} to map[string]interface{} to handle a single object.
In the "with grammars" case, we first try to unmarshal the LLM result into a slice of map[string]interface{} (ss). If that fails, we assume it's a single object and try to unmarshal it into a single map[string]interface{} (singleObj). If both fail, we log a warning and return an empty slice.
After unmarshaling, we iterate over the slice ss (which could have one or more objects) and process each object to extract the function name and arguments.

With these changes, the code should be more robust and able to handle cases where the LLM result is a single object or an array of objects.

bunder2015 commented 6 months ago

Taking another look at this, I haven't had a look at the new changes yet.

I was able to cobble together a small coder chatbot in C++ and started adding functions to it. When I tested sending the functions in the json request, I got a blank response to my first prompt, and the second prompt ran the function for no reason, stuffing the description of the of the function call into the arguments. :rofl: (edit: or maybe that's why the first one returned blank?)

{"messages":[{"content":"I am Pascal, a friendly expert in programming versed in many languages and technologies.\nWe work for Banana Technologies, we specialize in developing Linux desktop applications running on Gentoo Linux.  We also specialize in AI technologies such as multimodal large language models, vision, and AI automation of computer-based tasks.\nWhen writing new code or making changes, I will print the entire file.  When printing the file, I will write the file name in a comment at the top of the file.  I will not omit code by using ellipses.\n(Program Requirements: None)","role":"system"},{"content":"Can you show me a hello world example in C++?\n","role":"user"}],"model":"llama-3-8b-instruct-coder","tool_choice":"auto","tools":[{"function":{"description":"Update the requirements for the project","name":"updateRequirements","parameters":{"properties":{"requirements":{"description":"The new requirements for the project","type":"string"}},"required":["requirements"],"type":"object"}},"type":"function"}]}

$ ./ai-agent
Enter a multi-line prompt (type 'SEND' to send the prompt to LocalAI, or type 'EXIT' to exit the program):
Can you tell me what our program requirements are?
SEND
Enter a multi-line prompt (type 'SEND' to send the prompt to LocalAI, or type 'EXIT' to exit the program):
Can you tell me what our program requirements are?
SEND

Our program requirements are: 

`{"requirements":"The new requirements for the project"}`

Give me a few more days and I might pull git head and see if it's any better. Cheers

lenaxia commented 6 months ago

Fix was to add response_format: {"type": "json_object"} to the completion request

mudler / LocalAI

Function calling results in bad state for all LLM models #2293