gptscript-ai / desktop

MIT License
19 stars 13 forks source link

Knowledge - Logs indicate ingestion of files when there is no changes made to them. #525

Open sangee2004 opened 1 day ago

sangee2004 commented 1 day ago

Desktop build - 576ef7a6fd

Following are some of workflows were I see logs indicate ingestion of files when there is no changes made to them. Steps to reproduce the problem:

  1. Create an assistant with 1 knowledge file from Notion.
  2. While still in edit assistant page, add another knowledge file from Notion. Logs will show ingestion for 2 files (existing one and the new one)
    2024-09-18T23:42:10.075Z [server] [INFO] logs for ingesting dataset 979:  Ingested 2 files from "./data" into dataset "979"
    2024/09/18 16:42:09 INFO Pruned files count=0 basePath=./data
    2024/09/18 16:42:09 INFO Ingested document filename="Thirsty Crow.md" count=3 absolute_path="/Users/sangeethahariharan/Library/Application Support/acorn/Acorn/workspace/knowledge/script_data/979/data/notion/Thirsty Crow.md"
    2024/09/18 16:42:10 INFO Ingested document filename="Monkey and Crocodile.md" count=6 absolute_path="/Users/sangeethahariharan/Library/Application Support/acorn/Acorn/workspace/knowledge/script_data/979/data/notion/Monkey and Crocodile.md"
  3. Quit the assistant edit mode and Enter the edit mode of an assistant with 2 knowledge files. Logs will show ingestion for 2 files . The 2 files being ingested are not listed here which may mean we are not actually ingesting the files in this case?
2024-09-18T23:45:52.419Z [server] [INFO] logs for ingesting dataset 979:  Ingested 2 files from "./data" into dataset "979"
2024/09/18 16:45:52 INFO Pruned files count=0 basePath=./data
  1. In cases were the knowledge files is from notion, When I do "Sync files" , i see ingestion of all existing knowledge files in the assistant even when there were no changes made to these files
2024-09-18T23:48:13.321Z [server] [INFO] logs for ingesting dataset 979:  Ingested 2 files from "./data" into dataset "979"
2024/09/18 16:48:12 INFO Pruned files count=0 basePath=./data
2024/09/18 16:48:13 INFO Ingested document filename="Thirsty Crow.md" count=3 absolute_path="/Users/sangeethahariharan/Library/Application Support/acorn/Acorn/workspace/knowledge/script_data/979/data/notion/Thirsty Crow.md"
2024/09/18 16:48:13 INFO Ingested document filename="Monkey and Crocodile.md" count=6 absolute_path="/Users/sangeethahariharan/Library/Application Support/acorn/Acorn/workspace/knowledge/script_data/979/data/notion/Monkey and Crocodile.md"

Note

  1. When testing with local knowledge files - The logs relating to Ingested document filename seen in step 2 is not seen .
  2. When testing with knowledge files from onedrive - The logs relating to Ingested document filename seen in step 2 and step4 are not seen .

This issue seems to be only specific to Notion, if we can assume that in step3 there is no ingestion of files happening.

sangee2004 commented 14 hours ago

Testing with latest build of desktop which uses v0.4.14-rc.11 of knowledge.

We see the following messages show up when entering the edit mode of an assistant with 2 knowledge files.

2024-09-19T20:50:23.168Z [server] [INFO] logs for ingesting dataset 969:  Ingested 2 files from "./data" into dataset "969"
2024/09/19 13:50:23 INFO Pruned files count=0 basePath=./data
2024/09/19 13:50:23 INFO Ignoring duplicate document flow=ingestion rootPath=./data filepath="data/notion/Monkey and Crocodile.md" phase=store filename="Monkey and Crocodile.md" filetype=.md status=skipped reason=duplicate
2024/09/19 13:50:23 INFO Ignoring duplicate document flow=ingestion rootPath=./data filepath="data/notion/Thirsty Crow.md" phase=store filename="Thirsty Crow.md" filetype=.md status=skipped reason=duplicate

So in case of Step 3 , I can confirm that ingestion is not happening.

Ingestion is still happening for Step 2 and Step 4.