mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
23.11k stars 1.75k forks source link

Invalid json crashes the server / Json schema suppport incomplete #2938

Open vaaale opened 1 month ago

vaaale commented 1 month ago

LocalAI version: Latest

Environment, CPU architecture, OS, and Version: Not relevant

Describe the bug This bug has two elements to it. The first is that the server segfaults if it receives an invalid JSON schema. The second element is that only the 2019-09 JSON schema format is supported but not the 06,07 version. https://opis.io/json-schema/2.x/definitions.html The relevant difference is that the keyword "definitions" was changed to "$defs" in the 2019-09 version. Unfortunately, Python libraries such as Pydantic version 1 use the 06,07 version, which libraries such as LlamaIndex use for function calling. (Unfortunately). The crash happens when the resolveReference function in grammar_json_schema.go fails to find #/$defs/ in the schema. (See below)

To Reproduce To reproduce you can either run the LlamaIndex QueryPlanner example from here: https://docs.llamaindex.ai/en/stable/examples/agent/openai_agent_query_plan/

Or execute the following 'curl' request:

curl -X POST -H "Content-Type: application/json" -d @msg.json http://localhost:5000/v1/chat/completions Where msg.json has the following content: (JSON file [attached) msg.json

{
  "model": "gpt-3.5-turbo",
  "language": "",
  "translate": false,
  "n": 0,
  "top_p": null,
  "top_k": null,
  "temperature": 0.1,
  "max_tokens": null,
  "echo": false,
  "batch": 0,
  "ignore_eos": false,
  "repeat_penalty": 0,
  "repeat_last_n": 0,
  "n_keep": 0,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "tfz": null,
  "typical_p": null,
  "seed": null,
  "negative_prompt": "",
  "rope_freq_base": 0,
  "rope_freq_scale": 0,
  "negative_prompt_scale": 0,
  "use_fast_tokenizer": false,
  "clip_skip": 0,
  "tokenizer": "",
  "file": "",
  "size": "",
  "prompt": null,
  "instruction": "",
  "input": null,
  "stop": null,
  "messages": [
    {
      "role": "user",
      "content": "What were the risk factors in sept 2022?"
    }
  ],
  "functions": null,
  "function_call": null,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "query_plan_tool",
        "description": "        This is a query plan tool that takes in a list of tools and executes a query plan over these tools to answer a query. The query plan is a DAG of query nodes.\n\nGiven a list of tool names and the query plan schema, you can choose to generate a query plan to answer a question.\n\nThe tool names and descriptions are as follows:\n\n\n\n        Tool Name: sept_2022\nTool Description: Provides information about Uber quarterly financials ending September 2022 \n\nTool Name: june_2022\nTool Description: Provides information about Uber quarterly financials ending June 2022 \n\nTool Name: march_2022\nTool Description: Provides information about Uber quarterly financials ending March 2022 \n        ",
        "parameters": {
          "definitions": {
            "QueryNode": {
              "description": "Query node.\n\nA query node represents a query (query_str) that must be answered.\nIt can either be answered by a tool (tool_name), or by a list of child nodes\n(child_nodes).\nThe tool_name and child_nodes fields are mutually exclusive.",
              "properties": {
                "dependencies": {
                  "description": "List of sub-questions that need to be answered in order to answer the question given by `query_str`.Should be blank if there are no sub-questions to be specified, in which case `tool_name` is specified.",
                  "items": {
                    "type": "integer"
                  },
                  "title": "Dependencies",
                  "type": "array"
                },
                "id": {
                  "description": "ID of the query node.",
                  "title": "Id",
                  "type": "integer"
                },
                "query_str": {
                  "description": "Question we are asking. This is the query string that will be executed. ",
                  "title": "Query Str",
                  "type": "string"
                },
                "tool_name": {
                  "description": "Name of the tool to execute the `query_str`.",
                  "title": "Tool Name",
                  "type": "string"
                }
              },
              "required": [
                "id",
                "query_str"
              ],
              "title": "QueryNode",
              "type": "object"
            }
          },
          "properties": {
            "nodes": {
              "description": "The original question we are asking.",
              "items": {
                "$ref": "#/definitions/QueryNode"
              },
              "title": "Nodes",
              "type": "array"
            }
          },
          "required": [
            "nodes"
          ],
          "type": "object"
        }
      }
    }
  ],
  "tool_choice": "auto",
  "stream": false,
  "mode": 0,
  "step": 0,
  "grammar": "",
  "grammar_json_functions": null,
  "grammar_json_name": null,
  "backend": "",
  "model_base_name": ""
}

Expected behavior

Logs 10:59AM DBG Request received: {"model":"gpt-3.5-turbo","language":"","translate":false,"n":0,"top_p":null,"top_k":null,"temperature":0.1,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"repeat_last_n":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"What were the risk factors in sept 2022?"}],"functions":null,"function_call":null,"tools":[{"type":"function","function":{"name":"query_plan_tool","description":" This is a query plan tool that takes in a list of tools and executes a query plan over these tools to answer a query. The query plan is a DAG of query nodes.\n\nGiven a list of tool names and the query plan schema, you can choose to generate a query plan to answer a question.\n\nThe tool names and descriptions are as follows:\n\n\n\n Tool Name: sept_2022\nTool Description: Provides information about Uber quarterly financials ending September 2022 \n\nTool Name: june_2022\nTool Description: Provides information about Uber quarterly financials ending June 2022 \n\nTool Name: march_2022\nTool Description: Provides information about Uber quarterly financials ending March 2022 \n ","parameters":{"definitions":{"QueryNode":{"description":"Query node.\n\nA query node represents a query (query_str) that must be answered.\nIt can either be answered by a tool (tool_name), or by a list of child nodes\n(child_nodes).\nThe tool_name and child_nodes fields are mutually exclusive.","properties":{"dependencies":{"description":"List of sub-questions that need to be answered in order to answer the question given byquery_str.Should be blank if there are no sub-questions to be specified, in which casetool_nameis specified.","items":{"type":"integer"},"title":"Dependencies","type":"array"},"id":{"description":"ID of the query node.","title":"Id","type":"integer"},"query_str":{"description":"Question we are asking. This is the query string that will be executed. ","title":"Query Str","type":"string"},"tool_name":{"description":"Name of the tool to execute thequery_str`.","title":"Tool Name","type":"string"}},"required":["id","query_str"],"title":"QueryNode","type":"object"}},"properties":{"nodes":{"description":"The original question we are asking.","items":{"$ref":"#/definitions/QueryNode"},"title":"Nodes","type":"array"}},"required":["nodes"],"type":"object"}}}],"tool_choice":"auto","stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"grammar_json_name":null,"backend":"","model_base_name":""} 10:59AM DBG guessDefaultsFromFile: template already set name=gpt-3.5-turbo 10:59AM DBG Configuration read: &{PredictionOptions:{Model:00c61256fb64047d5f0ffecaaffec50a Language: Translate:false N:0 TopP:0xc00003c7a0 TopK:0xc00003c7a8 Temperature:0xc000712e00 Maxtokens:0xc00003c7e0 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc00003c7d8 TypicalP:0xc00003c7d0 Seed:0xc00003c7f8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:gpt-3.5-turbo F16:0xc00003c780 Threads:0xc00003c790 Debug:0xc000713140 Roles:map[] Embeddings:0xc00003c7f1 Backend: TemplateConfig:{Chat:{{.Input -}} <|im_start|>assistant ChatMessage:<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}} {{- if .FunctionCall }}

{{- else if eq .RoleName "tool" }} {{- end }} {{- if .Content}} {{.Content }} {{- end }} {{- if .FunctionCall}} {{toJson .FunctionCall}} {{- end }} {{- if .FunctionCall }}

{{- else if eq .RoleName "tool" }} {{- end }}<|im_end|> Completion:{{.Input}} Edit: Functions:<|im_start|>system You are a function calling AI model. You are provided with function signatures within XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: {{range .Functions}}{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}}}}{{end}} Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']} Use the 'answer' function when you are ready to write your final answer to the users request. For each function call return a json object with function name and arguments within XML tags as follows:

{'arguments': , 'name': } <|im_end|> {{.Input -}} <|im_start|>assistant UseTokenizerTemplate:false JoinChatMessagesByCharacter:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder:} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionName:false} FeatureFlag:map[usage:0xc00003c746] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc00003c7c8 MirostatTAU:0xc00003c7c0 Mirostat:0xc00003c7b8 NGPULayers:0xc00003c7e8 MMap:0xc00003c745 MMlock:0xc00003c7f1 LowVRAM:0xc00003c7f1 Grammar: StopWords:[<|begin_of_text|> <|end_of_text|> <|im_end|> <|eot_id|> assistant:] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc00003c770 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: FlashAttention:false NoKVOffloading:false RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: VallE:{AudioPath:}} CUDA:false DownloadFiles:[] Description: Usage:curl http://localhost:5000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] }' } 10:59AM DBG Response needs to process functions panic: Invalid reference format: #/definitions/QueryNode goroutine 42 [running]: github.com/mudler/LocalAI/pkg/functions.(*JSONSchemaConverter).resolveReference(0x18b6f60?, {0xc000710b28?, 0x1b616e4?}, 0x4?) /build/pkg/functions/grammar_json_schema.go:314 +0x239 github.com/mudler/LocalAI/pkg/functions.(*JSONSchemaConverter).visit(0xc00071cb20, 0xc000562b70, {0xc000a726c0, 0x1b}, 0xc000562810) /build/pkg/functions/grammar_json_schema.go:245 +0x26b github.com/mudler/LocalAI/pkg/functions.(*JSONSchemaConverter).visit(0xc00071cb20, 0xc000562ae0, {0xc000710b58, 0x16}, 0xc000562810) /build/pkg/functions/grammar_json_schema.go:298 +0x4b9 github.com/mudler/LocalAI/pkg/functions.(*JSONSchemaConverter).visit(0xc00071cb20, 0xc0005629f0, {0xc00081c6e0, 0x10}, 0xc000562810) /build/pkg/functions/grammar_json_schema.go:286 +0xa6c github.com/mudler/LocalAI/pkg/functions.(*JSONSchemaConverter).visit(0xc00071cb20, 0xc000562870, {0xc00081c6ca, 0x6}, 0xc000562810) /build/pkg/functions/grammar_json_schema.go:286 +0xa6c github.com/mudler/LocalAI/pkg/functions.(*JSONSchemaConverter).visit(0xc00071cb20, 0xc000562810, {0x0, 0x0}, 0xc000562810) /build/pkg/functions/grammar_json_schema.go:232 +0x1278 github.com/mudler/LocalAI/pkg/functions.(*JSONSchemaConverter).Grammar(0xc00071cb20, 0xc000562810, {0xc000fa68f0, 0x1, 0x1}) /build/pkg/functions/grammar_json_schema.go:336 +0x85 github.com/mudler/LocalAI/pkg/functions.(*JSONSchemaConverter).GrammarFromBytes(0xc00071cb20, {0xc0012761e0, 0x1c4, 0x1e0}, {0xc000fa68f0, 0x1, 0x1}) /build/pkg/functions/grammar_json_schema.go:343 +0x89 github.com/mudler/LocalAI/pkg/functions.JSONFunctionStructureFunction.Grammar({{0xc000768150, 0x2, 0x2}, {0x0, 0x0, 0x0}, 0x0}, {0xc000fa68f0, 0x1, 0x1}) /build/pkg/functions/grammar_json_schema.go:405 +0xf8 github.com/mudler/LocalAI/core/http/endpoints/openai.ChatEndpoint.func3(0xc0002dc308) /build/core/http/endpoints/openai/chat.go:234 +0xe9d github.com/gofiber/fiber/v2.(*Ctx).Next(0xc0007685b0?) /root/go/pkg/mod/github.com/gofiber/fiber/v2@v2.52.5/ctx.go:1031 +0x3d github.com/mudler/LocalAI/core/http.App.func5(0xc0002055c0?) /build/core/http/app.go:143 +0x1d4 github.com/gofiber/fiber/v2.(*App).next(0xc000248a08, 0xc0002dc308) /root/go/pkg/mod/github.com/gofiber/fiber/v2@v2.52.5/router.go:145 +0x1be github.com/gofiber/fiber/v2.(*Ctx).Next(0xc0002dc308?) /root/go/pkg/mod/github.com/gofiber/fiber/v2@v2.52.5/ctx.go:1034 +0x4d github.com/mudler/LocalAI/core/http.App.LocalAIMetricsAPIMiddleware.func8(0xc0002dc308) /build/core/http/endpoints/localai/metrics.go:41 +0xa5 github.com/gofiber/fiber/v2.(*Ctx).Next(0xc0002dc308?) /root/go/pkg/mod/github.com/gofiber/fiber/v2@v2.52.5/ctx.go:1031 +0x3d github.com/gofiber/contrib/fiberzerolog.New.func1(0xc0002dc308) /root/go/pkg/mod/github.com/gofiber/contrib/fiberzerolog@v1.0.2/zerolog.go:36 +0xb7 github.com/gofiber/fiber/v2.(*App).next(0xc000248a08, 0xc0002dc308) /root/go/pkg/mod/github.com/gofiber/fiber/v2@v2.52.5/router.go:145 +0x1be github.com/gofiber/fiber/v2.(*App).handler(0xc000248a08, 0x49d6cf?) /root/go/pkg/mod/github.com/gofiber/fiber/v2@v2.52.5/router.go:172 +0x78 github.com/valyala/fasthttp.(*Server).serveConn(0xc0002ea488, {0x71ff2d78, 0xc000fa6010}) /root/go/pkg/mod/github.com/valyala/fasthttp@v1.55.0/server.go:2379 +0xe70 github.com/valyala/fasthttp.(*workerPool).workerFunc(0xc00042ac80, 0xc00043e040) /root/go/pkg/mod/github.com/valyala/fasthttp@v1.55.0/workerpool.go:224 +0xa4 github.com/valyala/fasthttp.(*workerPool).getCh.func1() /root/go/pkg/mod/github.com/valyala/fasthttp@v1.55.0/workerpool.go:196 +0x32 created by github.com/valyala/fasthttp.(*workerPool).getCh in goroutine 1 /root/go/pkg/mod/github.com/valyala/fasthttp@v1.55.0/workerpool.go:195 +0x190 `
mudler commented 1 month ago

good catch, this definitely should be handled :+1: