Closed sangee2004 closed 6 months ago
This issue is not seen anymore. I am able to ingest a smaller csv file successfully and able to retrieve data from it.
Ingestion cvs files seems to take a very long time. The csv file used in this issue is 42 MB and the ingestion of this file was not done even after 6 minutes
/usr/local/bin/knowledge ingest -d testnewcvs /Users/sangeethahariharan/Downloads/Electric_Vehicle_Population_Data.csv
2024/05/03 13:24:28 INFO IngestOpts opts="{Filename:0x1400709a000 FileMetadata:0x1400705e000 IsDuplicateFuncName: IsDuplicateFunc:0x105378920}"
^C2024/05/03 13:30:28 ERROR Failed to add documents error="couldn't add document '10a2c04c-7a9a-43fd-9c3b-be85d1e226b8': couldn't create embedding of document: couldn't send request: Post \"https://api.openai.com/v1/embeddings\": context canceled"
Even ingestion of relatively small file (36 kB), takes about 15 seconds
/usr/local/bin/knowledge ingest -d testnewcvs /Users/sangeethahariharan/Downloads/industry_sic.csv
2024/05/03 13:31:00 INFO IngestOpts opts="{Filename:0x1400e4da780 FileMetadata:0x1400c69c340 IsDuplicateFuncName: IsDuplicateFunc:0x101ce8920}"
2024/05/03 13:31:15 INFO Ingested document filename=industry_sic.csv count=731 absolute_path=/Users/sangeethahariharan/Downloads/industry_sic.csv
Steps to reproduce the problem:
make run
fromhttps://github.com/gptscript-ai/knowledge
to launch knowledge in server mode.bin/knowledge
to path (/usr/local/bin
)tools: Create Dataset, sys.find, Ingest, Retrieve
Create a new Knowledge Base Dataset with ID ${id} Then, ingest ${filepath} into the dataset. Then, figure out ${query} from the previously ingested files.
name: create dataset description: Create a new Dataset in the Knowledge Base args: id: ID of the Dataset
!knowledge client create-dataset ${id}
name: ingest description: Ingest a file or all files from a directory into a Knowledge Base Dataset args: id: ID of the Dataset args: filepath: Path to the file or directory to be ingested
!knowledge client ingest ${id} ${filepath}
name: retrieve description: Retrieve information from a Knowledge Base Dataset args: id: ID of the Dataset args: query: Query to be executed against the Knowledge Base Dataset
!knowledge client retrieve ${id} ${query}
gptscript --disable-cache --debug testknowledge.gpt --id 66664 --filepath /Users/sangeethahariharan/Downloads/Electric_Vehicle_Population_Data.csv --query "Tell me what are VIN number from vehicles are made By TESLA in AZ"
{ "completionID": "3", "id": "call_rYL6JtFkWek5R5u7k9FL8lSk", "level": "debug", "logger": "/pkg/monitor", "msg": "debug", "parentID": "1", "request": { "command": [ "/usr/local/bin/knowledge", "client", "retrieve", "66664", "Tell me what are VIN number from vehicles are made By TESLA in AZ" ], "input": "{\"id\":\"66664\",\"query\":\"Tell me what are VIN number from vehicles are made By TESLA in AZ\"}" }, "time": "2024-04-30T12:00:57-07:00", "toolID": "testknowledge.gpt:25" } 2024/04/30 12:00:57 API request failed: 500 Internal Server Error { "level": "error", "logger": "/pkg/engine", "msg": "failed to run tool [retrieve] cmd [/usr/local/bin/knowledge client retrieve 66664 Tell me what are VIN number from vehicles are made By TESLA in AZ]: exit status 1", "time": "2024-04-30T12:00:57-07:00" } { "cached": false, "completionID": "3", "id": "call_rYL6JtFkWek5R5u7k9FL8lSk", "level": "debug", "logger": "/pkg/monitor", "msg": "debug", "parentID": "1", "response": { "err": {}, "output": "" }, "time": "2024-04-30T12:00:57-07:00", "toolID": "testknowledge.gpt:25" } { "err": "ERROR: 2024/04/30 12:00:57 API request failed: 500 Internal Server Error\n: exit status 1", "level": "debug", "logger": "/pkg/monitor", "msg": "Run stopped", "output": "", "runID": "1", "time": "2024-04-30T12:00:57-07:00" } 2024/04/30 12:00:57 ERROR: 2024/04/30 12:00:57 API request failed: 500 Internal Server Error : exit status 1
2024/04/30 12:00:47 INFO Creating dataset id=66664 [GIN] 2024/04/30 - 12:00:47 | 200 | 1.043792ms | ::1 | POST "/v1/datasets/create" 2024/04/30 12:00:49 INFO Ingesting content into dataset dataset=66664 2024/04/30 12:00:53 DEBUG Received ingest request content_size=74720880 metadata="&{Name:Electric_Vehicle_Population_Data.csv AbsolutePath:/Users/sangeethahariharan/Downloads/Electric_Vehicle_Population_Data.csv Size:42030494 ModifiedAt:2024-04-29 16:46:18.893437515 -0700 PDT}" 2024/04/30 12:00:53 INFO IngestOpts opts="{Filename:0x14000996060 FileMetadata:0x140037f21c0 IsDuplicateFuncName: IsDuplicateFunc:}"
2024/04/30 12:00:53 DEBUG Loading data type=.csv filename=Electric_Vehicle_Population_Data.csv
2024/04/30 12:00:53 ERROR No documents found
[GIN] 2024/04/30 - 12:00:53 | 200 | 3.797541083s | ::1 | POST "/v1/datasets/66664/ingest"
2024/04/30 12:00:57 INFO Retrieving content from dataset dataset=66664
2024/04/30 12:00:57 DEBUG Retrieving content from dataset dataset=66664 query="{Prompt:Tell me what are VIN number from vehicles are made By TESLA in AZ TopK:0x140086f80d8}"
2024/04/30 12:00:57 DEBUG Reduced number of documents to search for numDocuments=0
2024/04/30 12:00:57 ERROR Failed to retrieve documents error="nResults must be > 0"
[GIN] 2024/04/30 - 12:00:57 | 500 | 160.57725ms | ::1 | POST "/v1/datasets/66664/retrieve"