"corpus cannot be empty" error when chatting with assistant that has workspace tool and knowledge files added to thread.

sangee2004 commented 2 months ago

Electron build - 9d4616691 Steps to reproduce the problem:

Create an assistant with workspace tool.
Chat with this assistant and add knowledge file - Reunion-Under-The-Stars.pdf
Ask a question about this knowledge tool.

Following errors are presented to the user:

cjellick commented 2 months ago

Is this reproducible?

sangee2004 commented 2 months ago

Yes. This issue is reproducible for me.

If i do theRestart Script after this error is seen, I dont see knowledge too getting used

Stack trace:

Chatted with WorkspaceAssistant
Input
"Who are the main characters in Reunion under the starts story?"

Messages
"Reunion Under the Stars" is a fictional story, and without additional context or a specific source, I can't provide the exact main characters. However, if you have a text file or document containing the story, you can upload it, and I can help identify the main characters for you. Would you like to do that?

Calls
Loaded context from Workspace Context
Input
""

Messages
The workspace directory is "/Users/sangeethahariharan/Library/Application Support/acorn/Acorn/threads/7caehg/workspace" and contains no files. Always use absolute paths to interact with files in the workspace. Prefer creating new files in the workspace if the user does not imply a specific directory. If the user refers to '.' that is the current working directory and not the workspace directory.

Loaded provider from GPTScript Gateway Provider
Input
""

Messages
http://127.0.0.1:10664

iwilltry42 commented 2 months ago

I wasn't able to replicate this issue, but I added some error handling, so that it won't surface to the user. Also I'm adding some logging so we can debug this from the traces in the future.

EDIT: I actually didn't refresh the page so didn't see your last addition. This looks like the initial ingestion failed and is kind of similar to @cjellick 's case where a dataset was only partially existent. Meaning the dataset was initialized completely (in this case), but the data was never ingested at all. This is the only way that could lead to this error (no text content is in the dataset). I'm trying to find a way to replicate it, but it's really hard to debug.

iwilltry42 commented 2 months ago

Mitigations and debugging support landed here: https://github.com/gptscript-ai/knowledge/releases/tag/v0.4.13 Waiting for https://github.com/gptscript-ai/desktop/pull/450 to be merged

sangee2004 commented 2 months ago

@iwilltry42 I am testing with the 0.4.13 version of knowledge tool.

When I follow the exact steps mentioned in the issue , I dont see the "corpus cannot be empty" error anymore.

But i am not able to get the knowledge tool to provide me any information about the the knowledge file i have added.

https://github.com/user-attachments/assets/84fdd274-f7e7-49e9-9b67-85ca34b7dd04

sangee2004 commented 2 months ago

When I tried to do the same steps using Tildy assistant , I was able to get information for the same query. Why would there be a difference in behavior seen when using Tildy vs an assistant with workspace context tool ?

iwilltry42 commented 2 months ago

Did you reload the assistant before trying again? I assume it was a pre-existing assistant (or thread?) and needed to pull the updated tool first 🤔

sangee2004 commented 2 months ago

This is a new assistant that i created just now . I had cleared all configs and cache before I pulled latest desktop code (which also pulled the new knowledge tool).

sangee2004 commented 2 months ago

The same behavior of not being able to work with thread knowledge files is also seen when testing with Assistants that are created with no tools.

Even in this case I see the thread id being used as 0 in datasets="[0 824] in stack trace:

Loaded context from Knowledge Retrieval Context
Input
"Who are the main characters in reunion under the stars "

Messages
2024/09/06 11:27:01 INFO Retrieving sources for query query="Who are the main characters in Reunion Under the Stars" datasets="[0 824]"

When testing with Tildy and thread knowledge , I am able to get answers as expected from thread knowledge . In this case I see thread id being used as expected in datasets="[ju3h07 0] in stack trace:


Loaded context from Knowledge Retrieval Context
Input
"who are the main characters in reunion under the stars story?"

Messages
2024/09/06 11:29:59 INFO Retrieving sources for query query="who are the main characters in reunion under the stars story?" datasets="[ju3h07 0]"

iwilltry42 commented 2 months ago

Tested both variants in desktop commit 2be2f66 with https://github.com/gptscript-ai/knowledge/releases/tag/v0.4.14-rc.1 and didn't see the issue occur. However, I also do not see the thread ID being set as 0. Can you confirm that the issue persists for you?

sangee2004 commented 2 months ago

Tested with latest build of desktop - a66cd29ecd which uses knowledge from https://github.com/gptscript-ai/knowledge/releases/download/v0.4.14-rc.1/knowledge-darwin-amd64

I am still not able to work with Thread knowledge

Create an assistant (Add no tools)
Chat with this assistant.
Add a knowledge file using "Add Knowledge"
Ask any question relating to the file added. No information is fetched relating to knowledge file that was added

Stack Trace shows - thread id being used as 0 - datasets="[0 862

Input
"Who are the main characters in reunion under star?"

Messages
<tool call> knowledgeRetrieval -> {"query":"main characters in Reunion Under the Star"}

I couldn't find specific information about the main characters in "Reunion Under the Star." If you have any other questions or need information on a different topic, feel free to ask!

Calls
Loaded context from Knowledge Retrieval Context
Input
"Who are the main characters in reunion under star?"

Messages
2024/09/09 10:53:23 INFO Retrieving sources for query query="Who are the main characters in reunion under star?" datasets="[0 862]" 2024/09/09 10:53:23 DEBUG Using default DSN dsn="sqlite:///Users/sangeethahariharan/Library/Application Support/gptscript/knowledge/knowledge.db" 2024/09/09 10:53:23 DEBUG Using default VectorDBPath vectordbPath="/Users/sangeethahariharan/Library/Application Support/gptscript/knowledge/vector.db" 2024/09/09 10:53:23 DEBUG Using embedding model provider provider=openai config="{BaseURL:https://gateway-api.gptscript.ai/llm APIKey:REDACTED Model:gpt-4 EmbeddingModel:text-embedding-ada-002 EmbeddingEndpoint:/embeddings APIVersion:2024-02-01 APIType:OPEN_AI AzureOpenAIConfig:{Deployment:}}" 2024/09/09 10:53:23 DEBUG Loading retrieval flows from config flows_file=blueprint:context dataset="[0 862]" 2024/09/09 10:53:23 DEBUG Query Modifier custom configuration name=enhance config="{Model:{OpenAI:{BaseURL:https://gateway-api.gptscript.ai/llm APIKey:REDACTED Model:gpt-4o EmbeddingModel: EmbeddingEndpoint: APIVersion: APIType:OPEN_AI AzureOpenAIConfig:{Deployment:}}}}" 2024/09/09 10:53:23 DEBUG Retriever custom configuration name=subquery config="{Model:{OpenAI:{BaseURL:https://gateway-api.gptscript.ai/llm APIKey:REDACTED Model:gpt-4o EmbeddingModel: EmbeddingEndpoint: APIVersion: APIType:OPEN_AI AzureOpenAIConfig:{Deployment:}}} Limit:3 TopK:10}" 2024/09/09 10:53:23 DEBUG Retriever custom configuration name=bm25 config="{TopN:10 K1:1.2 B:0.75 CleanStopWords:[auto]}" 2024/09/09 10:53:23 DEBUG Retriever custom configuration name=merge config="{TopK:10 Retrievers:[{Name:subquery Weight:0xc0012c3818 Options:map[limit:3 model:map[openai:map[apiKey:39108f5fd5151b50:d4038456a23d4db2614d5f233ff73f3dbb30ac9f0e92acc6cda1481ba344da3b apiType:OPEN_AI baseURL:https://gateway-api.gptscript.ai/llm model:gpt-4o]] topK:10]} {Name:bm25 Weight:0xc0012c3848 Options:map[b:0.75 cleanStopWords:[auto] k1:1.2 topN:10]}] retrievers:[]}" 2024/09/09 10:53:23 DEBUG Postprocessor custom configuration name=similarity config="{Threshold:0.4 KeepMin:5}" 2024/09/09 10:53:23 DEBUG Postprocessor custom configuration name=reduce config={TopK:10} 2024/09/09 10:53:23 DEBUG Loaded retrieval flow from config flows_file=blueprint:context dataset="[0 862]" 2024/09/09 10:53:23 DEBUG Retrieving content from dataset dataset="[0 862]" query="Who are the main characters in reunion under star?" 2024/09/09 10:53:23 DEBUG Prompting LLM prompt="The following query will be used for a vector similarity search.\nPlease enhance it to improve the semantic similarity search.\nQuery: \"Who are the main characters in reunion under star?\"\nReply only with the JSON {\"result\": \"<enhanced-query>\"}.\nDo not include anything else in your response and don't use markdown highlighting or formatting, just raw JSON." 2024/09/09 10:53:24 DEBUG Modified queries before="[Who are the main characters in reunion under star?]" queryModifier=enhance after="[List the primary characters in the book 'Reunion Under the Star'.]" 2024/09/09 10:53:24 DEBUG Updated query set query="Who are the main characters in reunion under star?" modified_query_set="[List the primary characters in the book 'Reunion Under the Star'.]" 2024/09/09 10:53:24 DEBUG Retrieving documents from merging retriever query="List the primary characters in the book 'Reunion Under the Star'." datasetIDs="[0 862]" where=map[] whereDocument=[] 2024/09/09 10:53:24 DEBUG Retrieving documents from retriever component=MergingRetriever retriever=subquery 2024/09/09 10:53:24 DEBUG Prompting LLM prompt="The following query will be used for a vector similarity search.\nIf it is too complex or covering multiple topics or entities, please split it into multiple subqueries.\nI.e. a comparative query like \"What are the differences between cats and dogs?\" could be split into subqueries concerning cats and dogs separately.\nThe resulting subqueries will then be used for separate vector similarity searches.\nJust changing the phrasing of the input question often won't change the semantic meaning, so those may not be good candidates.\nLimit the number of subqueries to a maximum of 3 (less is ok).\nQuery: \"List the primary characters in the book 'Reunion Under the Star'.\"\nReply with all subqueries in a json list like the following and don't reply with anything else (also don't use any markdown syntax).\nResponse schema: {\"results\": [\"<subquery-1>\", \"<subquery-2>\"]}" 2024/09/09 10:53:24 DEBUG SubqueryQueryRetriever generated subqueries queries="Who are the primary characters in the book 'Reunion Under the Star'?" 2024/09/09 10:53:24 DEBUG Retrieved documents from retriever retriever=subquery numDocs=0 2024/09/09 10:53:24 DEBUG Retrieving documents from retriever component=MergingRetriever retriever=bm25 2024/09/09 10:53:24 INFO No documents found for BM25 retrieval datasets="[0 862]" 2024/09/09 10:53:24 DEBUG Retrieved documents from retriever retriever=bm25 numDocs=0 2024/09/09 10:53:24 DEBUG MergingRetriever topK=0 numDocs=0 2024/09/09 10:53:24 DEBUG Retrieved documents num_documents=0 query="List the primary characters in the book 'Reunion Under the Star'." datasets="[0 862]" retriever=merge 2024/09/09 10:53:24 DEBUG Postprocessed RetrievalResponse num_responses=1 original_query="Who are the main characters in reunion under star?" Retrieved the following 1 source collections for the original query "Who are the main characters in reunion under star?": {"List the primary characters in the book 'Reunion Under the Star'.":null}

Calls
Loaded input from QueryRelevancy
Input
"{\"input\":\"Who are the main characters in reunion under star?\"}"

Messages
Who are the main characters in reunion under star?

Calls
Loaded context from LastUserInputOverview
Input
""

Messages
<USER_MESSAGES> </USER_MESSAGES>

Loaded output from KnowledgeInstructions
Input
"{\"chat\":false,\"continuation\":false,\"output\":\"2024/09/09 10:53:23 INFO Retrieving sources for query query=\\\"Who are the main characters in reunion under star?\\\" datasets=\\\"[0 862]\\\"\\n2024/09/09 10:53:23 DEBUG Using default DSN dsn=\\\"sqlite:///Users/sangeethahariharan/Library/Application Support/gptscript/knowledge/knowledge.db\\\"\\n2024/09/09 10:53:23 DEBUG Using default VectorDBPath vectordbPath=\\\"/Users/sangeethahariharan/Library/Application Support/gptscript/knowledge/vector.db\\\"\\n2024/09/09 10:53:23 DEBUG Using embedding model provider provider=openai config=\\\"{BaseURL:https://gateway-api.gptscript.ai/llm APIKey:REDACTED Model:gpt-4 EmbeddingModel:text-embedding-ada-002 EmbeddingEndpoint:/embeddings APIVersion:2024-02-01 APIType:OPEN_AI AzureOpenAIConfig:{Deployment:}}\\\"\\n2024/09/09 10:53:23 DEBUG Loading retrieval flows from config flows_file=blueprint:context dataset=\\\"[0 862]\\\"\\n2024/09/09 10:53:23 DEBUG Query Modifier custom configuration name=enhance config=\\\"{Model:{OpenAI:{BaseURL:https://gateway-api.gptscript.ai/llm APIKey:REDACTED Model:gpt-4o EmbeddingModel: EmbeddingEndpoint: APIVersion: APIType:OPEN_AI AzureOpenAIConfig:{Deployment:}}}}\\\"\\n2024/09/09 10:53:23 DEBUG Retriever custom configuration name=subquery config=\\\"{Model:{OpenAI:{BaseURL:https://gateway-api.gptscript.ai/llm APIKey:REDACTED Model:gpt-4o EmbeddingModel: EmbeddingEndpoint: APIVersion: APIType:OPEN_AI AzureOpenAIConfig:{Deployment:}}} Limit:3 TopK:10}\\\"\\n2024/09/09 10:53:23 DEBUG Retriever custom configuration name=bm25 config=\\\"{TopN:10 K1:1.2 B:0.75 CleanStopWords:[auto]}\\\"\\n2024/09/09 10:53:23 DEBUG Retriever custom configuration name=merge config=\\\"{TopK:10 Retrievers:[{Name:subquery Weight:0xc0012c3818 Options:map[limit:3 model:map[openai:map[apiKey:39108f5fd5151b50:d4038456a23d4db2614d5f233ff73f3dbb30ac9f0e92acc6cda1481ba344da3b apiType:OPEN_AI baseURL:https://gateway-api.gptscript.ai/llm model:gpt-4o]] topK:10]} {Name:bm25 Weight:0xc0012c3848 Options:map[b:0.75 cleanStopWords:[auto] k1:1.2 topN:10]}] retrievers:[]}\\\"\\n2024/09/09 10:53:23 DEBUG Postprocessor custom configuration name=similarity config=\\\"{Threshold:0.4 KeepMin:5}\\\"\\n2024/09/09 10:53:23 DEBUG Postprocessor custom configuration name=reduce config={TopK:10}\\n2024/09/09 10:53:23 DEBUG Loaded retrieval flow from config flows_file=blueprint:context dataset=\\\"[0 862]\\\"\\n2024/09/09 10:53:23 DEBUG Retrieving content from dataset dataset=\\\"[0 862]\\\" query=\\\"Who are the main characters in reunion under star?\\\"\\n2024/09/09 10:53:23 DEBUG Prompting LLM prompt=\\\"The following query will be used for a vector similarity search.\\\\nPlease enhance it to improve the semantic similarity search.\\\\nQuery: \\\\\\\"Who are the main characters in reunion under star?\\\\\\\"\\\\nReply only with the JSON {\\\\\\\"result\\\\\\\": \\\\\\\"\\u003cenhanced-query\\u003e\\\\\\\"}.\\\\nDo not include anything else in your response and don't use markdown highlighting or formatting, just raw JSON.\\\"\\n2024/09/09 10:53:24 DEBUG Modified queries before=\\\"[Who are the main characters in reunion under star?]\\\" queryModifier=enhance after=\\\"[List the primary characters in the book 'Reunion Under the Star'.]\\\"\\n2024/09/09 10:53:24 DEBUG Updated query set query=\\\"Who are the main characters in reunion under star?\\\" modified_query_set=\\\"[List the primary characters in the book 'Reunion Under the Star'.]\\\"\\n2024/09/09 10:53:24 DEBUG Retrieving documents from merging retriever query=\\\"List the primary characters in the book 'Reunion Under the Star'.\\\" datasetIDs=\\\"[0 862]\\\" where=map[] whereDocument=[]\\n2024/09/09 10:53:24 DEBUG Retrieving documents from retriever component=MergingRetriever retriever=subquery\\n2024/09/09 10:53:24 DEBUG Prompting LLM prompt=\\\"The following query will be used for a vector similarity search.\\\\nIf it is too complex or covering multiple topics or entities, please split it into multiple subqueries.\\\\nI.e. a comparative query like \\\\\\\"What are the differences between cats and dogs?\\\\\\\" could be split into subqueries concerning cats and dogs separately.\\\\nThe resulting subqueries will then be used for separate vector similarity searches.\\\\nJust changing the phrasing of the input question often won't change the semantic meaning, so those may not be good candidates.\\\\nLimit the number of subqueries to a maximum of 3 (less is ok).\\\\nQuery: \\\\\\\"List the primary characters in the book 'Reunion Under the Star'.\\\\\\\"\\\\nReply with all subqueries in a json list like the following and don't reply with anything else (also don't use any markdown syntax).\\\\nResponse schema: {\\\\\\\"results\\\\\\\": [\\\\\\\"\\u003csubquery-1\\u003e\\\\\\\", \\\\\\\"\\u003csubquery-2\\u003e\\\\\\\"]}\\\"\\n2024/09/09 10:53:24 DEBUG SubqueryQueryRetriever generated subqueries queries=\\\"Who are the primary characters in the book 'Reunion Under the Star'?\\\"\\n2024/09/09 10:53:24 DEBUG Retrieved documents from retriever retriever=subquery numDocs=0\\n2024/09/09 10:53:24 DEBUG Retrieving documents from retriever component=MergingRetriever retriever=bm25\\n2024/09/09 10:53:24 INFO No documents found for BM25 retrieval datasets=\\\"[0 862]\\\"\\n2024/09/09 10:53:24 DEBUG Retrieved documents from retriever retriever=bm25 numDocs=0\\n2024/09/09 10:53:24 DEBUG MergingRetriever topK=0 numDocs=0\\n2024/09/09 10:53:24 DEBUG Retrieved documents num_documents=0 query=\\\"List the primary characters in the book 'Reunion Under the Star'.\\\" datasets=\\\"[0 862]\\\" retriever=merge\\n2024/09/09 10:53:24 DEBUG Postprocessed RetrievalResponse num_responses=1 original_query=\\\"Who are the main characters in reunion under star?\\\"\\nRetrieved the following 1 source collections for the original query \\\"Who are the main characters in reunion under star?\\\": {\\\"List the primary characters in the book 'Reunion Under the Star'.\\\":null}\\n\"}"

Messages
Use the content within the following <KNOWLEDGE></KNOWLEDGE> tags as your learned knowledge. <KNOWLEDGE> Retrieved the following 1 source collections for the original query "Who are the main characters in reunion under star?": {"List the primary characters in the book 'Reunion Under the Star'.":null} </KNOWLEDGE> If this knowledge seems irrelevant to the user query, ignore it. Avoid mentioning that you retrieved the information from the context or the knowledge tool. Only provide citations if explicitly asked for it and if the source references are available in the knowledge. Answer in the language that the user asked the question in.

Loaded context from Knowledge
Input
""

Messages
You have access to a RAG tool named "Knowledge Retrieval". It will work with files previously uploaded by the user. Use it to answer questions from the user. Only consider this tool if the knowledge from your context doesn't already hold the answer to the user query. Give citations or source references only if asked for it and that doesn't conflict with any other instructions you've received in your system prompt. If the answers that the knowledge tool returns seem irrelevant, you may use another tool.

Loaded context from Knowledge Retrieval Context
Input
""

Messages
2024/09/09 10:53:28 INFO Retrieving sources for query query="Who are the main characters in reunion under star?" datasets="[0 862]" 2024/09/09 10:53:28 DEBUG Using default DSN dsn="sqlite:///Users/sangeethahariharan/Library/Application Support/gptscript/knowledge/knowledge.db" 2024/09/09 10:53:28 DEBUG Using default VectorDBPath vectordbPath="/Users/sangeethahariharan/Library/Application Support/gptscript/knowledge/vector.db" 2024/09/09 10:53:28 DEBUG Using embedding model provider provider=openai config="{BaseURL:https://gateway-api.gptscript.ai/llm APIKey:REDACTED Model:gpt-4 EmbeddingModel:text-embedding-ada-002 EmbeddingEndpoint:/embeddings APIVersion:2024-02-01 APIType:OPEN_AI AzureOpenAIConfig:{Deployment:}}" 2024/09/09 10:53:28 DEBUG Loading retrieval flows from config flows_file=blueprint:context dataset="[0 862]" 2024/09/09 10:53:28 DEBUG Query Modifier custom configuration name=enhance config="{Model:{OpenAI:{BaseURL:https://gateway-api.gptscript.ai/llm APIKey:REDACTED Model:gpt-4o EmbeddingModel: EmbeddingEndpoint: APIVersion: APIType:OPEN_AI AzureOpenAIConfig:{Deployment:}}}}" 2024/09/09 10:53:28 DEBUG Retriever custom configuration name=subquery config="{Model:{OpenAI:{BaseURL:https://gateway-api.gptscript.ai/llm APIKey:REDACTED Model:gpt-4o EmbeddingModel: EmbeddingEndpoint: APIVersion: APIType:OPEN_AI AzureOpenAIConfig:{Deployment:}}} Limit:3 TopK:10}" 2024/09/09 10:53:28 DEBUG Retriever custom configuration name=bm25 config="{TopN:10 K1:1.2 B:0.75 CleanStopWords:[auto]}" 2024/09/09 10:53:28 DEBUG Retriever custom configuration name=merge config="{TopK:10 Retrievers:[{Name:subquery Weight:0xc0012a9828 Options:map[limit:3 model:map[openai:map[apiKey:39108f5fd5151b50:d4038456a23d4db2614d5f233ff73f3dbb30ac9f0e92acc6cda1481ba344da3b apiType:OPEN_AI baseURL:https://gateway-api.gptscript.ai/llm model:gpt-4o]] topK:10]} {Name:bm25 Weight:0xc0012a9858 Options:map[b:0.75 cleanStopWords:[auto] k1:1.2 topN:10]}] retrievers:[]}" 2024/09/09 10:53:28 DEBUG Postprocessor custom configuration name=similarity config="{Threshold:0.4 KeepMin:5}" 2024/09/09 10:53:28 DEBUG Postprocessor custom configuration name=reduce config={TopK:10} 2024/09/09 10:53:28 DEBUG Loaded retrieval flow from config flows_file=blueprint:context dataset="[0 862]" 2024/09/09 10:53:28 DEBUG Retrieving content from dataset dataset="[0 862]" query="Who are the main characters in reunion under star?" 2024/09/09 10:53:28 DEBUG Prompting LLM prompt="The following query will be used for a vector similarity search.\nPlease enhance it to improve the semantic similarity search.\nQuery: \"Who are the main characters in reunion under star?\"\nReply only with the JSON {\"result\": \"<enhanced-query>\"}.\nDo not include anything else in your response and don't use markdown highlighting or formatting, just raw JSON." 2024/09/09 10:53:29 DEBUG Modified queries before="[Who are the main characters in reunion under star?]" queryModifier=enhance after="[main characters in the book Reunion Under the Star]" 2024/09/09 10:53:29 DEBUG Updated query set query="Who are the main characters in reunion under star?" modified_query_set="[main characters in the book Reunion Under the Star]" 2024/09/09 10:53:29 DEBUG Retrieving documents from merging retriever query="main characters in the book Reunion Under the Star" datasetIDs="[0 862]" where=map[] whereDocument=[] 2024/09/09 10:53:29 DEBUG Retrieving documents from retriever component=MergingRetriever retriever=subquery 2024/09/09 10:53:29 DEBUG Prompting LLM prompt="The following query will be used for a vector similarity search.\nIf it is too complex or covering multiple topics or entities, please split it into multiple subqueries.\nI.e. a comparative query like \"What are the differences between cats and dogs?\" could be split into subqueries concerning cats and dogs separately.\nThe resulting subqueries will then be used for separate vector similarity searches.\nJust changing the phrasing of the input question often won't change the semantic meaning, so those may not be good candidates.\nLimit the number of subqueries to a maximum of 3 (less is ok).\nQuery: \"main characters in the book Reunion Under the Star\"\nReply with all subqueries in a json list like the following and don't reply with anything else (also don't use any markdown syntax).\nResponse schema: {\"results\": [\"<subquery-1>\", \"<subquery-2>\"]}" 2024/09/09 10:53:30 DEBUG SubqueryQueryRetriever generated subqueries queries="main characters in the book Reunion Under the Star" 2024/09/09 10:53:30 DEBUG Retrieved documents from retriever retriever=subquery numDocs=0 2024/09/09 10:53:30 DEBUG Retrieving documents from retriever component=MergingRetriever retriever=bm25 2024/09/09 10:53:30 INFO No documents found for BM25 retrieval datasets="[0 862]" 2024/09/09 10:53:30 DEBUG Retrieved documents from retriever retriever=bm25 numDocs=0 2024/09/09 10:53:30 DEBUG MergingRetriever topK=0 numDocs=0 2024/09/09 10:53:30 DEBUG Retrieved documents num_documents=0 query="main characters in the book Reunion Under the Star" datasets="[0 862]" retriever=merge 2024/09/09 10:53:30 DEBUG Postprocessed RetrievalResponse num_responses=1 original_query="Who are the main characters in reunion under star?" Retrieved the following 1 source collections for the original query "Who are the main characters in reunion under star?": {"main characters in the book Reunion Under the Star":null}

Calls
Loaded input from QueryRelevancy
Input
"{\"input\":\"\"}"

Messages
Who are the main characters in reunion under star?

Calls
Loaded context from LastUserInputOverview
Input
""

Messages
<USER_MESSAGES> [User Message #1] Who are the main characters in reunion under star? </USER_MESSAGES>

Calls
Loaded context from sys.chat.current
Input
""

Messages
{"id":"1725903464","tool":{"name":"sys.chat.current","description":"Retrieves the current chat dialog","modelName":"gpt-4o","internalPrompt":null,"arguments":{"type":"object"},"instructions":"#!sys.chat.current","id":"sys.chat.current","source":{}},"completion":{"model":"gpt-4o","internalSystemPrompt":false,"tools":[{"function":{"toolID":"https://raw.githubusercontent.com/gptscript-ai/knowledge/f54ce7c597eab1fa3d85b109690b37bb21496d7c/gateway/tool.gpt:Knowledge Retrieval","name":"knowledgeRetrieval","description":"Retrieve information from files uploaded by the user.","parameters":{"properties":{"debug":{"description":"(OPTIONAL) Set to \"true\" to enable debug mode - only use if you are explicitly asked to do so.","type":"string"},"query":{"description":"The query to search for in the knowledge base. It will be used for semantic similarity search, so enhance it accordingly to yield good results.","type":"string"}},"type":"object"}}}],"messages":[{"role":"system","content":[{"text":"\n\nYou have access to a RAG tool named \"Knowledge Retrieval\".\nIt will work with files previously uploaded by the user.\nUse it to answer questions from the user.\nOnly consider this tool if the knowledge from your context doesn't already hold the answer to the user query.\nGive citations or source references only if asked for it and that doesn't conflict with any other instructions you've received in your system prompt.\nIf the answers that the knowledge tool returns seem irrelevant, you may use another tool.\n\nUse the content within the following \u003cKNOWLEDGE\u003e\u003c/KNOWLEDGE\u003e tags as your learned knowledge.\n\u003cKNOWLEDGE\u003e\nRetrieved the following 1 source collections for the original query \"Who are the main characters in reunion under star?\": {\"List the primary characters in the book 'Reunion Under the Star'.\":null}\n\n\u003c/KNOWLEDGE\u003e\nIf this knowledge seems irrelevant to the user query, ignore it.\nAvoid mentioning that you retrieved the information from the context or the knowledge tool.\nOnly provide citations if explicitly asked for it and if the source references are available in the knowledge.\nAnswer in the language that the user asked the question in.\n\n\nYou are a helpful assistant named New Assistant. When you first start, just introduce yourself and wait for the user's next message."}],"usage":{}},{"role":"assistant","content":[{"text":"Hello, I'm New Assistant. How can I help you today?"}],"usage":{}},{"role":"user","content":[{"text":"Who are the main characters in reunion under star?"}],"usage":{}},{"role":"assistant","content":[{"toolCall":{"index":0,"id":"call_gLF6paQEVLToTWy3PScMnQaY","function":{"name":"knowledgeRetrieval","arguments":"{\"query\":\"main characters in Reunion Under the Star\"}"}}}],"usage":{}}],"chat":true}}

Loaded output from KnowledgeInstructions
Input
"{\"chat\":false,\"continuation\":false,\"output\":\"2024/09/09 10:53:28 INFO Retrieving sources for query query=\\\"Who are the main characters in reunion under star?\\\" datasets=\\\"[0 862]\\\"\\n2024/09/09 10:53:28 DEBUG Using default DSN dsn=\\\"sqlite:///Users/sangeethahariharan/Library/Application Support/gptscript/knowledge/knowledge.db\\\"\\n2024/09/09 10:53:28 DEBUG Using default VectorDBPath vectordbPath=\\\"/Users/sangeethahariharan/Library/Application Support/gptscript/knowledge/vector.db\\\"\\n2024/09/09 10:53:28 DEBUG Using embedding model provider provider=openai config=\\\"{BaseURL:https://gateway-api.gptscript.ai/llm APIKey:REDACTED Model:gpt-4 EmbeddingModel:text-embedding-ada-002 EmbeddingEndpoint:/embeddings APIVersion:2024-02-01 APIType:OPEN_AI AzureOpenAIConfig:{Deployment:}}\\\"\\n2024/09/09 10:53:28 DEBUG Loading retrieval flows from config flows_file=blueprint:context dataset=\\\"[0 862]\\\"\\n2024/09/09 10:53:28 DEBUG Query Modifier custom configuration name=enhance config=\\\"{Model:{OpenAI:{BaseURL:https://gateway-api.gptscript.ai/llm APIKey:REDACTED Model:gpt-4o EmbeddingModel: EmbeddingEndpoint: APIVersion: APIType:OPEN_AI AzureOpenAIConfig:{Deployment:}}}}\\\"\\n2024/09/09 10:53:28 DEBUG Retriever custom configuration name=subquery config=\\\"{Model:{OpenAI:{BaseURL:https://gateway-api.gptscript.ai/llm APIKey:REDACTED Model:gpt-4o EmbeddingModel: EmbeddingEndpoint: APIVersion: APIType:OPEN_AI AzureOpenAIConfig:{Deployment:}}} Limit:3 TopK:10}\\\"\\n2024/09/09 10:53:28 DEBUG Retriever custom configuration name=bm25 config=\\\"{TopN:10 K1:1.2 B:0.75 CleanStopWords:[auto]}\\\"\\n2024/09/09 10:53:28 DEBUG Retriever custom configuration name=merge config=\\\"{TopK:10 Retrievers:[{Name:subquery Weight:0xc0012a9828 Options:map[limit:3 model:map[openai:map[apiKey:39108f5fd5151b50:d4038456a23d4db2614d5f233ff73f3dbb30ac9f0e92acc6cda1481ba344da3b apiType:OPEN_AI baseURL:https://gateway-api.gptscript.ai/llm model:gpt-4o]] topK:10]} {Name:bm25 Weight:0xc0012a9858 Options:map[b:0.75 cleanStopWords:[auto] k1:1.2 topN:10]}] retrievers:[]}\\\"\\n2024/09/09 10:53:28 DEBUG Postprocessor custom configuration name=similarity config=\\\"{Threshold:0.4 KeepMin:5}\\\"\\n2024/09/09 10:53:28 DEBUG Postprocessor custom configuration name=reduce config={TopK:10}\\n2024/09/09 10:53:28 DEBUG Loaded retrieval flow from config flows_file=blueprint:context dataset=\\\"[0 862]\\\"\\n2024/09/09 10:53:28 DEBUG Retrieving content from dataset dataset=\\\"[0 862]\\\" query=\\\"Who are the main characters in reunion under star?\\\"\\n2024/09/09 10:53:28 DEBUG Prompting LLM prompt=\\\"The following query will be used for a vector similarity search.\\\\nPlease enhance it to improve the semantic similarity search.\\\\nQuery: \\\\\\\"Who are the main characters in reunion under star?\\\\\\\"\\\\nReply only with the JSON {\\\\\\\"result\\\\\\\": \\\\\\\"\\u003cenhanced-query\\u003e\\\\\\\"}.\\\\nDo not include anything else in your response and don't use markdown highlighting or formatting, just raw JSON.\\\"\\n2024/09/09 10:53:29 DEBUG Modified queries before=\\\"[Who are the main characters in reunion under star?]\\\" queryModifier=enhance after=\\\"[main characters in the book Reunion Under the Star]\\\"\\n2024/09/09 10:53:29 DEBUG Updated query set query=\\\"Who are the main characters in reunion under star?\\\" modified_query_set=\\\"[main characters in the book Reunion Under the Star]\\\"\\n2024/09/09 10:53:29 DEBUG Retrieving documents from merging retriever query=\\\"main characters in the book Reunion Under the Star\\\" datasetIDs=\\\"[0 862]\\\" where=map[] whereDocument=[]\\n2024/09/09 10:53:29 DEBUG Retrieving documents from retriever component=MergingRetriever retriever=subquery\\n2024/09/09 10:53:29 DEBUG Prompting LLM prompt=\\\"The following query will be used for a vector similarity search.\\\\nIf it is too complex or covering multiple topics or entities, please split it into multiple subqueries.\\\\nI.e. a comparative query like \\\\\\\"What are the differences between cats and dogs?\\\\\\\" could be split into subqueries concerning cats and dogs separately.\\\\nThe resulting subqueries will then be used for separate vector similarity searches.\\\\nJust changing the phrasing of the input question often won't change the semantic meaning, so those may not be good candidates.\\\\nLimit the number of subqueries to a maximum of 3 (less is ok).\\\\nQuery: \\\\\\\"main characters in the book Reunion Under the Star\\\\\\\"\\\\nReply with all subqueries in a json list like the following and don't reply with anything else (also don't use any markdown syntax).\\\\nResponse schema: {\\\\\\\"results\\\\\\\": [\\\\\\\"\\u003csubquery-1\\u003e\\\\\\\", \\\\\\\"\\u003csubquery-2\\u003e\\\\\\\"]}\\\"\\n2024/09/09 10:53:30 DEBUG SubqueryQueryRetriever generated subqueries queries=\\\"main characters in the book Reunion Under the Star\\\"\\n2024/09/09 10:53:30 DEBUG Retrieved documents from retriever retriever=subquery numDocs=0\\n2024/09/09 10:53:30 DEBUG Retrieving documents from retriever component=MergingRetriever retriever=bm25\\n2024/09/09 10:53:30 INFO No documents found for BM25 retrieval datasets=\\\"[0 862]\\\"\\n2024/09/09 10:53:30 DEBUG Retrieved documents from retriever retriever=bm25 numDocs=0\\n2024/09/09 10:53:30 DEBUG MergingRetriever topK=0 numDocs=0\\n2024/09/09 10:53:30 DEBUG Retrieved documents num_documents=0 query=\\\"main characters in the book Reunion Under the Star\\\" datasets=\\\"[0 862]\\\" retriever=merge\\n2024/09/09 10:53:30 DEBUG Postprocessed RetrievalResponse num_responses=1 original_query=\\\"Who are the main characters in reunion under star?\\\"\\nRetrieved the following 1 source collections for the original query \\\"Who are the main characters in reunion under star?\\\": {\\\"main characters in the book Reunion Under the Star\\\":null}\\n\"}"

Messages
Use the content within the following <KNOWLEDGE></KNOWLEDGE> tags as your learned knowledge. <KNOWLEDGE> Retrieved the following 1 source collections for the original query "Who are the main characters in reunion under star?": {"main characters in the book Reunion Under the Star":null} </KNOWLEDGE> If this knowledge seems irrelevant to the user query, ignore it. Avoid mentioning that you retrieved the information from the context or the knowledge tool. Only provide citations if explicitly asked for it and if the source references are available in the knowledge. Answer in the language that the user asked the question in.

Ran Knowledge Retrieval
Input
"{\"query\":\"main characters in Reunion Under the Star\"}"

Messages
2024/09/09 10:53:26 INFO Retrieving sources for query query="{\"query\":\"main characters in Reunion Under the Star\"}" datasets="[0 862]" Retrieved the following 1 source collections for the original query "{\"query\":\"main characters in Reunion Under the Star\"}": {"{\"query\":\"main characters in Reunion Under the Star, plot highlights, character development, and important themes\"}":null}

Loaded provider from GPTScript Gateway Provider
Input
""

Messages
http://127.0.0.1:11228

iwilltry42 commented 2 months ago

Should be fixed by https://github.com/gptscript-ai/desktop/pull/470

sangee2004 commented 2 months ago

Tested with latest build from 0c81151c2

This issue is not seen anymore.

gptscript-ai / desktop

"corpus cannot be empty" error when chatting with assistant that has workspace tool and knowledge files added to thread. #425