gptscript-ai / knowledge

Knowledge for GPTScript
https://gptscript-ai.github.io/knowledge/
Apache License 2.0
24 stars 11 forks source link

Ingesting a valid PDF fails with - `not a PDF file: invalid header` and still returns 200. #3

Closed sangee2004 closed 4 months ago

sangee2004 commented 5 months ago

Steps to reproduce the problem:

  1. make run from https://github.com/gptscript-ai/knowledge to launch knowledge in server mode.
  2. copy bin/knowledge to path (/usr/local/bin)
  3. Run the following script to ingest and search a PDF file
    
    # Set KNOW_SERVER_URL=standalone to use the client in standalone mode (i.e. without a server running)

tools: Create Dataset, sys.find, Ingest, Retrieve

Create a new Knowledge Base Dataset with ID ${id} Then, ingest ${filepath} into the dataset. Then, figure out ${query} from the previously ingested files.


name: create dataset description: Create a new Dataset in the Knowledge Base args: id: ID of the Dataset

!knowledge client create-dataset ${id}


name: ingest description: Ingest a file or all files from a directory into a Knowledge Base Dataset args: id: ID of the Dataset args: filepath: Path to the file or directory to be ingested

!knowledge client ingest ${id} ${filepath}


name: retrieve description: Retrieve information from a Knowledge Base Dataset args: id: ID of the Dataset args: query: Query to be executed against the Knowledge Base Dataset

!knowledge client retrieve ${id} ${query}

gptscript --disable-cache --debug testknowledge.gpt --id 55552 --filepath /Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf --query "What are the different types of insurance that needs to be purchased by business owners?"

This fails with following error

2024/04/30 09:51:58 ERROR: 2024/04/30 09:51:58 API request failed: 500 Internal Server Error : exit status 1


Ingestion call itself fails but still returns 200 ok. When trying to retrieve info from this file , we get a "500" error.

Logs from knowledge:

[GIN] 2024/04/30 - 09:51:52 | 200 | 1.32675ms | ::1 | POST "/v1/datasets/create" 2024/04/30 09:51:55 INFO Ingesting content into dataset dataset=55552 2024/04/30 09:51:55 DEBUG Received ingest request content_size=2348008 metadata="&{Name:Insurance_Handbook_20103.pdf AbsolutePath:/Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf Size:1320753 ModifiedAt:2024-03-29 17:28:46.90522173 -0700 PDT}" 2024/04/30 09:51:55 INFO IngestOpts opts="{Filename:0x140037e8d10 FileMetadata:0x14002b06640 IsDuplicateFuncName: IsDuplicateFunc:}" 2024/04/30 09:51:55 DEBUG Loading data type=.pdf filename=Insurance_Handbook_20103.pdf 2024/04/30 09:51:55 ERROR Failed to load PDF filename=Insurance_Handbook_20103.pdf error="not a PDF file: invalid header" 2024/04/30 09:51:55 ERROR Failed to load documents error="failed to load PDF: not a PDF file: invalid header" [GIN] 2024/04/30 - 09:51:55 | 200 | 123.42475ms | ::1 | POST "/v1/datasets/55552/ingest" 2024/04/30 09:51:58 INFO Retrieving content from dataset dataset=55552 2024/04/30 09:51:58 DEBUG Retrieving content from dataset dataset=55552 query="{Prompt:What are the different types of insurance that needs to be purchased by business owners? TopK:0x14001d9c0f0}" 2024/04/30 09:51:58 DEBUG Reduced number of documents to search for numDocuments=0 2024/04/30 09:51:58 ERROR Failed to retrieve documents error="nResults must be > 0" [GIN] 2024/04/30 - 09:51:58 | 500 | 230.54625ms | ::1 | POST "/v1/datasets/55552/retrieve"


Debug logs from gptscript:

{ "level": "debug", "logger": "/pkg/loader", "msg": "opened testknowledge.gpt", "time": "2024-04-30T09:51:47-07:00" } { "input": "--id 55552 --filepath /Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf --query What are the diffent types of insurance that needs to be purchased by business owners?", "level": "debug", "logger": "/pkg/monitor", "msg": "Run started", "program": { "name": "testknowledge.gpt", "entryToolId": "testknowledge.gpt:1", "toolSet": { "sys.find": { "name": "sys.find", "description": "Traverse a directory looking for files that match a pattern in the style of the unix find command", "modelName": "gpt-4-turbo-preview", "internalPrompt": null, "arguments": { "properties": { "directory": { "description": "The directory to search in. The current directory \".\" will be used as the default if no argument is passed", "type": "string" }, "pattern": { "description": "The file pattern to look for. The pattern is a traditional unix glob format with matching any character and ? matching a single character", "type": "string" } }, "type": "object" }, "instructions": "#!sys.find", "id": "sys.find", "source": {} }, "testknowledge.gpt:1": { "modelName": "gpt-4-turbo-preview", "internalPrompt": null, "tools": [ "create dataset", "sys.find", "ingest", "retrieve" ], "instructions": "Create a new Knowledge Base Dataset with ID ${id}\nThen, ingest ${filepath} into the dataset.\nThen, figure out ${query} from the previously ingested files.", "id": "testknowledge.gpt:1", "toolMapping": { "create dataset": "testknowledge.gpt:10", "ingest": "testknowledge.gpt:17", "retrieve": "testknowledge.gpt:25", "sys.find": "sys.find" }, "localTools": { "": "testknowledge.gpt:1", "create dataset": "testknowledge.gpt:10", "ingest": "testknowledge.gpt:17", "retrieve": "testknowledge.gpt:25" }, "source": { "location": "testknowledge.gpt", "lineNo": 1 }, "workingDir": "." }, "testknowledge.gpt:10": { "name": "create dataset", "description": "Create a new Dataset in the Knowledge Base", "modelName": "gpt-4-turbo-preview", "internalPrompt": null, "arguments": { "properties": { "id": { "description": "ID of the Dataset", "type": "string" } }, "type": "object" }, "instructions": "#!knowledge client create-dataset ${id}", "id": "testknowledge.gpt:10", "localTools": { "": "testknowledge.gpt:1", "create dataset": "testknowledge.gpt:10", "ingest": "testknowledge.gpt:17", "retrieve": "testknowledge.gpt:25" }, "source": { "location": "testknowledge.gpt", "lineNo": 10 }, "workingDir": "." }, "testknowledge.gpt:17": { "name": "ingest", "description": "Ingest a file or all files from a directory into a Knowledge Base Dataset", "modelName": "gpt-4-turbo-preview", "internalPrompt": null, "arguments": { "properties": { "filepath": { "description": "Path to the file or directory to be ingested", "type": "string" }, "id": { "description": "ID of the Dataset", "type": "string" } }, "type": "object" }, "instructions": "#!knowledge client ingest ${id} ${filepath}", "id": "testknowledge.gpt:17", "localTools": { "": "testknowledge.gpt:1", "create dataset": "testknowledge.gpt:10", "ingest": "testknowledge.gpt:17", "retrieve": "testknowledge.gpt:25" }, "source": { "location": "testknowledge.gpt", "lineNo": 17 }, "workingDir": "." }, "testknowledge.gpt:25": { "name": "retrieve", "description": "Retrieve information from a Knowledge Base Dataset", "modelName": "gpt-4-turbo-preview", "internalPrompt": null, "arguments": { "properties": { "id": { "description": "ID of the Dataset", "type": "string" }, "query": { "description": "Query to be executed against the Knowledge Base Dataset", "type": "string" } }, "type": "object" }, "instructions": "#!knowledge client retrieve ${id} ${query}", "id": "testknowledge.gpt:25", "localTools": { "": "testknowledge.gpt:1", "create dataset": "testknowledge.gpt:10", "ingest": "testknowledge.gpt:17", "retrieve": "testknowledge.gpt:25" }, "source": { "location": "testknowledge.gpt", "lineNo": 25 }, "workingDir": "." } } }, "runID": "1", "time": "2024-04-30T09:51:47-07:00" } { "id": "1", "input": "--id 55552 --filepath /Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf --query What are the diffent types of insurance that needs to be purchased by business owners?", "level": "info", "logger": "/pkg/monitor", "msg": "started [main]", "parentID": "", "time": "2024-04-30T09:51:47-07:00", "toolID": "testknowledge.gpt:1" } { "id": "1", "level": "info", "logger": "/pkg/monitor", "msg": "sent [main]", "parentID": "", "time": "2024-04-30T09:51:47-07:00", "toolID": "testknowledge.gpt:1" } { "completionID": "1", "id": "1", "level": "debug", "logger": "/pkg/monitor", "msg": "debug", "parentID": "", "request": { "model": "gpt-4-turbo-preview", "messages": [ { "role": "system", "content": "\nYou are task oriented system.\nYou receive input from a user, process the input from the given instructions, and then output the result.\nYour objective is to provide consistent and correct results.\nYou do not need to explain the steps taken, only provide the result to the given instructions.\nYou are referred to as a tool.\nYou don't move to the next step until you have a result.\n\nCreate a new Knowledge Base Dataset with ID ${id}\nThen, ingest ${filepath} into the dataset.\nThen, figure out ${query} from the previously ingested files." }, { "role": "user", "content": "--id 55552 --filepath /Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf --query What are the diffent types of insurance that needs to be purchased by business owners?" } ], "temperature": 0, "tools": [ { "type": "function", "function": { "name": "createDataset", "description": "Create a new Dataset in the Knowledge Base", "parameters": { "properties": { "id": { "description": "ID of the Dataset", "type": "string" } }, "type": "object" } } }, { "type": "function", "function": { "name": "find", "description": "Traverse a directory looking for files that match a pattern in the style of the unix find command", "parameters": { "properties": { "directory": { "description": "The directory to search in. The current directory \".\" will be used as the default if no argument is passed", "type": "string" }, "pattern": { "description": "The file pattern to look for. The pattern is a traditional unix glob format with matching any character and ? matching a single character", "type": "string" } }, "type": "object" } } }, { "type": "function", "function": { "name": "ingest", "description": "Ingest a file or all files from a directory into a Knowledge Base Dataset", "parameters": { "properties": { "filepath": { "description": "Path to the file or directory to be ingested", "type": "string" }, "id": { "description": "ID of the Dataset", "type": "string" } }, "type": "object" } } }, { "type": "function", "function": { "name": "retrieve", "description": "Retrieve information from a Knowledge Base Dataset", "parameters": { "properties": { "id": { "description": "ID of the Dataset", "type": "string" }, "query": { "description": "Query to be executed against the Knowledge Base Dataset", "type": "string" } }, "type": "object" } } } ] }, "time": "2024-04-30T09:51:47-07:00", "toolID": "testknowledge.gpt:1" } content [1] content | Waiting for model response... content [1] content | tool call createDataset -> {"id":"55552"} { "cached": false, "completionID": "1", "id": "1", "level": "debug", "logger": "/pkg/monitor", "msg": "debug", "parentID": "", "response": { "role": "assistant", "content": [ { "toolCall": { "index": 0, "id": "call_rYL6JtFkWek5R5u7k9FL8lSk", "function": { "name": "createDataset", "arguments": "{\"id\":\"55552\"}" } } } ] }, "time": "2024-04-30T09:51:52-07:00", "toolID": "testknowledge.gpt:1" } { "id": "call_rYL6JtFkWek5R5u7k9FL8lSk", "input": "{\"id\":\"55552\"}", "level": "info", "logger": "/pkg/monitor", "msg": "started [create dataset(2)]", "parentID": "1", "time": "2024-04-30T09:51:52-07:00", "toolID": "testknowledge.gpt:10" } { "level": "debug", "logger": "/pkg/env", "msg": "Looking for knowledge in /Users/sangeethahariharan/.rd/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/go/bin:/Users/sangeethahariharan/.cargo/bin", "time": "2024-04-30T09:51:52-07:00" } { "level": "debug", "logger": "/pkg/env", "msg": "Found /usr/local/bin/knowledge for knowledge in /Users/sangeethahariharan/.rd/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/go/bin:/Users/sangeethahariharan/.cargo/bin", "time": "2024-04-30T09:51:52-07:00" } { "id": "call_rYL6JtFkWek5R5u7k9FL8lSk", "level": "info", "logger": "/pkg/monitor", "msg": "sent [create dataset(2)]", "parentID": "1", "time": "2024-04-30T09:51:52-07:00", "toolID": "testknowledge.gpt:10" } { "completionID": "1", "id": "call_rYL6JtFkWek5R5u7k9FL8lSk", "level": "debug", "logger": "/pkg/monitor", "msg": "debug", "parentID": "1", "request": { "command": [ "/usr/local/bin/knowledge", "client", "create-dataset", "55552" ], "input": "{\"id\":\"55552\"}" }, "time": "2024-04-30T09:51:52-07:00", "toolID": "testknowledge.gpt:10" } { "cached": false, "completionID": "1", "id": "call_rYL6JtFkWek5R5u7k9FL8lSk", "level": "debug", "logger": "/pkg/monitor", "msg": "debug", "parentID": "1", "response": { "err": null, "output": "Created dataset \"55552\"\n" }, "time": "2024-04-30T09:51:52-07:00", "toolID": "testknowledge.gpt:10" } { "id": "call_rYL6JtFkWek5R5u7k9FL8lSk", "level": "info", "logger": "/pkg/monitor", "msg": "ended [create dataset(2)]", "output": "Created dataset \"55552\"\n", "parentID": "1", "time": "2024-04-30T09:51:52-07:00", "toolID": "testknowledge.gpt:10" } { "id": "1", "level": "info", "logger": "/pkg/monitor", "msg": "continue [main]", "parentID": "", "time": "2024-04-30T09:51:52-07:00", "toolID": "testknowledge.gpt:1", "toolResults": 1 } { "id": "1", "level": "info", "logger": "/pkg/monitor", "msg": "sent [main]", "parentID": "", "time": "2024-04-30T09:51:53-07:00", "toolID": "testknowledge.gpt:1" } { "completionID": "2", "id": "1", "level": "debug", "logger": "/pkg/monitor", "msg": "debug", "parentID": "", "request": { "model": "gpt-4-turbo-preview", "messages": [ { "role": "system", "content": "\nYou are task oriented system.\nYou receive input from a user, process the input from the given instructions, and then output the result.\nYour objective is to provide consistent and correct results.\nYou do not need to explain the steps taken, only provide the result to the given instructions.\nYou are referred to as a tool.\nYou don't move to the next step until you have a result.\n\nCreate a new Knowledge Base Dataset with ID ${id}\nThen, ingest ${filepath} into the dataset.\nThen, figure out ${query} from the previously ingested files." }, { "role": "user", "content": "--id 55552 --filepath /Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf --query What are the diffent types of insurance that needs to be purchased by business owners?" }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "call_rYL6JtFkWek5R5u7k9FL8lSk", "type": "function", "function": { "name": "createDataset", "arguments": "{\"id\":\"55552\"}" } } ] }, { "role": "tool", "content": "Created dataset \"55552\"\n", "name": "createDataset", "tool_call_id": "call_rYL6JtFkWek5R5u7k9FL8lSk" } ], "temperature": 0, "tools": [ { "type": "function", "function": { "name": "createDataset", "description": "Create a new Dataset in the Knowledge Base", "parameters": { "properties": { "id": { "description": "ID of the Dataset", "type": "string" } }, "type": "object" } } }, { "type": "function", "function": { "name": "find", "description": "Traverse a directory looking for files that match a pattern in the style of the unix find command", "parameters": { "properties": { "directory": { "description": "The directory to search in. The current directory \".\" will be used as the default if no argument is passed", "type": "string" }, "pattern": { "description": "The file pattern to look for. The pattern is a traditional unix glob format with matching any character and ? matching a single character", "type": "string" } }, "type": "object" } } }, { "type": "function", "function": { "name": "ingest", "description": "Ingest a file or all files from a directory into a Knowledge Base Dataset", "parameters": { "properties": { "filepath": { "description": "Path to the file or directory to be ingested", "type": "string" }, "id": { "description": "ID of the Dataset", "type": "string" } }, "type": "object" } } }, { "type": "function", "function": { "name": "retrieve", "description": "Retrieve information from a Knowledge Base Dataset", "parameters": { "properties": { "id": { "description": "ID of the Dataset", "type": "string" }, "query": { "description": "Query to be executed against the Knowledge Base Dataset", "type": "string" } }, "type": "object" } } } ] }, "time": "2024-04-30T09:51:53-07:00", "toolID": "testknowledge.gpt:1" } content [1] content | Waiting for model response... content [1] content | tool call ingest -> {"filepath":"/Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf","id":"55552"} { "cached": false, "completionID": "2", "id": "1", "level": "debug", "logger": "/pkg/monitor", "msg": "debug", "parentID": "", "response": { "role": "assistant", "content": [ { "toolCall": { "index": 0, "id": "call_OIIcay5wP5Bk40FEHC8meQFP", "function": { "name": "ingest", "arguments": "{\"filepath\":\"/Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf\",\"id\":\"55552\"}" } } } ] }, "time": "2024-04-30T09:51:55-07:00", "toolID": "testknowledge.gpt:1" } { "id": "call_OIIcay5wP5Bk40FEHC8meQFP", "input": "{\"filepath\":\"/Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf\",\"id\":\"55552\"}", "level": "info", "logger": "/pkg/monitor", "msg": "started [ingest(3)]", "parentID": "1", "time": "2024-04-30T09:51:55-07:00", "toolID": "testknowledge.gpt:17" } { "level": "debug", "logger": "/pkg/env", "msg": "Looking for knowledge in /Users/sangeethahariharan/.rd/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/go/bin:/Users/sangeethahariharan/.cargo/bin", "time": "2024-04-30T09:51:55-07:00" } { "level": "debug", "logger": "/pkg/env", "msg": "Found /usr/local/bin/knowledge for knowledge in /Users/sangeethahariharan/.rd/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/go/bin:/Users/sangeethahariharan/.cargo/bin", "time": "2024-04-30T09:51:55-07:00" } { "id": "call_OIIcay5wP5Bk40FEHC8meQFP", "level": "info", "logger": "/pkg/monitor", "msg": "sent [ingest(3)]", "parentID": "1", "time": "2024-04-30T09:51:55-07:00", "toolID": "testknowledge.gpt:17" } { "completionID": "2", "id": "call_OIIcay5wP5Bk40FEHC8meQFP", "level": "debug", "logger": "/pkg/monitor", "msg": "debug", "parentID": "1", "request": { "command": [ "/usr/local/bin/knowledge", "client", "ingest", "55552", "/Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf" ], "input": "{\"filepath\":\"/Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf\",\"id\":\"55552\"}" }, "time": "2024-04-30T09:51:55-07:00", "toolID": "testknowledge.gpt:17" } { "cached": false, "completionID": "2", "id": "call_OIIcay5wP5Bk40FEHC8meQFP", "level": "debug", "logger": "/pkg/monitor", "msg": "debug", "parentID": "1", "response": { "err": null, "output": "Ingested \"/Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf\" into dataset \"55552\"\n" }, "time": "2024-04-30T09:51:55-07:00", "toolID": "testknowledge.gpt:17" } { "id": "call_OIIcay5wP5Bk40FEHC8meQFP", "level": "info", "logger": "/pkg/monitor", "msg": "ended [ingest(3)]", "output": "Ingested \"/Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf\" into dataset \"55552\"\n", "parentID": "1", "time": "2024-04-30T09:51:55-07:00", "toolID": "testknowledge.gpt:17" } { "id": "1", "level": "info", "logger": "/pkg/monitor", "msg": "continue [main]", "parentID": "", "time": "2024-04-30T09:51:55-07:00", "toolID": "testknowledge.gpt:1", "toolResults": 1 } { "id": "1", "level": "info", "logger": "/pkg/monitor", "msg": "sent [main]", "parentID": "", "time": "2024-04-30T09:51:55-07:00", "toolID": "testknowledge.gpt:1" } { "completionID": "3", "id": "1", "level": "debug", "logger": "/pkg/monitor", "msg": "debug", "parentID": "", "request": { "model": "gpt-4-turbo-preview", "messages": [ { "role": "system", "content": "\nYou are task oriented system.\nYou receive input from a user, process the input from the given instructions, and then output the result.\nYour objective is to provide consistent and correct results.\nYou do not need to explain the steps taken, only provide the result to the given instructions.\nYou are referred to as a tool.\nYou don't move to the next step until you have a result.\n\nCreate a new Knowledge Base Dataset with ID ${id}\nThen, ingest ${filepath} into the dataset.\nThen, figure out ${query} from the previously ingested files." }, { "role": "user", "content": "--id 55552 --filepath /Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf --query What are the diffent types of insurance that needs to be purchased by business owners?" }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "call_rYL6JtFkWek5R5u7k9FL8lSk", "type": "function", "function": { "name": "createDataset", "arguments": "{\"id\":\"55552\"}" } } ] }, { "role": "tool", "content": "Created dataset \"55552\"\n", "name": "createDataset", "tool_call_id": "call_rYL6JtFkWek5R5u7k9FL8lSk" }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "call_OIIcay5wP5Bk40FEHC8meQFP", "type": "function", "function": { "name": "ingest", "arguments": "{\"filepath\":\"/Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf\",\"id\":\"55552\"}" } } ] }, { "role": "tool", "content": "Ingested \"/Users/sangeethahariharan/Downloads/Insurance_Handbook_20103.pdf\" into dataset \"55552\"\n", "name": "ingest", "tool_call_id": "call_OIIcay5wP5Bk40FEHC8meQFP" } ], "temperature": 0, "tools": [ { "type": "function", "function": { "name": "createDataset", "description": "Create a new Dataset in the Knowledge Base", "parameters": { "properties": { "id": { "description": "ID of the Dataset", "type": "string" } }, "type": "object" } } }, { "type": "function", "function": { "name": "find", "description": "Traverse a directory looking for files that match a pattern in the style of the unix find command", "parameters": { "properties": { "directory": { "description": "The directory to search in. The current directory \".\" will be used as the default if no argument is passed", "type": "string" }, "pattern": { "description": "The file pattern to look for. The pattern is a traditional unix glob format with matching any character and ? matching a single character", "type": "string" } }, "type": "object" } } }, { "type": "function", "function": { "name": "ingest", "description": "Ingest a file or all files from a directory into a Knowledge Base Dataset", "parameters": { "properties": { "filepath": { "description": "Path to the file or directory to be ingested", "type": "string" }, "id": { "description": "ID of the Dataset", "type": "string" } }, "type": "object" } } }, { "type": "function", "function": { "name": "retrieve", "description": "Retrieve information from a Knowledge Base Dataset", "parameters": { "properties": { "id": { "description": "ID of the Dataset", "type": "string" }, "query": { "description": "Query to be executed against the Knowledge Base Dataset", "type": "string" } }, "type": "object" } } } ] }, "time": "2024-04-30T09:51:55-07:00", "toolID": "testknowledge.gpt:1" } content [1] content | Waiting for model response... content [1] content | tool call retrieve -> {"id":"55552","query":"What are the different types of insurance that needs to be purchased by business owners?"} { "cached": false, "completionID": "3", "id": "1", "level": "debug", "logger": "/pkg/monitor", "msg": "debug", "parentID": "", "response": { "role": "assistant", "content": [ { "toolCall": { "index": 0, "id": "call_yc3jOsI5AlYhlYc9iYlwWmCg", "function": { "name": "retrieve", "arguments": "{\"id\":\"55552\",\"query\":\"What are the different types of insurance that needs to be purchased by business owners?\"}" } } } ] }, "time": "2024-04-30T09:51:57-07:00", "toolID": "testknowledge.gpt:1" } { "id": "call_yc3jOsI5AlYhlYc9iYlwWmCg", "input": "{\"id\":\"55552\",\"query\":\"What are the different types of insurance that needs to be purchased by business owners?\"}", "level": "info", "logger": "/pkg/monitor", "msg": "started [retrieve(4)]", "parentID": "1", "time": "2024-04-30T09:51:57-07:00", "toolID": "testknowledge.gpt:25" } { "level": "debug", "logger": "/pkg/env", "msg": "Looking for knowledge in /Users/sangeethahariharan/.rd/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/go/bin:/Users/sangeethahariharan/.cargo/bin", "time": "2024-04-30T09:51:57-07:00" } { "level": "debug", "logger": "/pkg/env", "msg": "Found /usr/local/bin/knowledge for knowledge in /Users/sangeethahariharan/.rd/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/go/bin:/Users/sangeethahariharan/.cargo/bin", "time": "2024-04-30T09:51:57-07:00" } { "id": "call_yc3jOsI5AlYhlYc9iYlwWmCg", "level": "info", "logger": "/pkg/monitor", "msg": "sent [retrieve(4)]", "parentID": "1", "time": "2024-04-30T09:51:57-07:00", "toolID": "testknowledge.gpt:25" } { "completionID": "3", "id": "call_yc3jOsI5AlYhlYc9iYlwWmCg", "level": "debug", "logger": "/pkg/monitor", "msg": "debug", "parentID": "1", "request": { "command": [ "/usr/local/bin/knowledge", "client", "retrieve", "55552", "What are the different types of insurance that needs to be purchased by business owners?" ], "input": "{\"id\":\"55552\",\"query\":\"What are the different types of insurance that needs to be purchased by business owners?\"}" }, "time": "2024-04-30T09:51:57-07:00", "toolID": "testknowledge.gpt:25" } 2024/04/30 09:51:58 API request failed: 500 Internal Server Error { "level": "error", "logger": "/pkg/engine", "msg": "failed to run tool [retrieve] cmd [/usr/local/bin/knowledge client retrieve 55552 What are the different types of insurance that needs to be purchased by business owners?]: exit status 1", "time": "2024-04-30T09:51:58-07:00" } { "cached": false, "completionID": "3", "id": "call_yc3jOsI5AlYhlYc9iYlwWmCg", "level": "debug", "logger": "/pkg/monitor", "msg": "debug", "parentID": "1", "response": { "err": {}, "output": "" }, "time": "2024-04-30T09:51:58-07:00", "toolID": "testknowledge.gpt:25" } { "err": "ERROR: 2024/04/30 09:51:58 API request failed: 500 Internal Server Error\n: exit status 1", "level": "debug", "logger": "/pkg/monitor", "msg": "Run stopped", "output": "", "runID": "1", "time": "2024-04-30T09:51:58-07:00" } 2024/04/30 09:51:58 ERROR: 2024/04/30 09:51:58 API request failed: 500 Internal Server Error : exit status 1

sangee2004 commented 5 months ago

PDF file used - Insurance_Handbook_20103.pdf

Note - I see this problem for any pdf file I use.

iwilltry42 commented 4 months ago

This is working for me with v0.1.1:

knowledge ingest ~/Downloads/Insurance_Handbook_20103.pdf 
2024/05/03 18:58:50 INFO IngestOpts opts="{Filename:0xc00109d9b0 FileMetadata:0xc001093ac0 IsDuplicateFuncName: IsDuplicateFunc:0x1487a60}"
2024/05/03 18:59:23 INFO Ingested document filename=Insurance_Handbook_20103.pdf count=759 absolute_path=/home/thklein/Downloads/Insurance_Handbook_20103.pdf
Ingested 1 files from "/home/thklein/Downloads/Insurance_Handbook_20103.pdf" into dataset "default"
sangee2004 commented 4 months ago

Able to ingest PDF file successfully. But I am not able to retrieve information from this file - https://github.com/gptscript-ai/knowledge/issues/8.