gptscript-ai / knowledge

Knowledge for GPTScript
https://gptscript-ai.github.io/knowledge/
Apache License 2.0
29 stars 14 forks source link

feat(breaking): add structured json log + make filename mandatory for ingestion #122

Closed iwilltry42 closed 2 months ago

iwilltry42 commented 2 months ago

Ref #116

Here's a breakdown of key used in the logs:

Here's an example log:

$ knowledge ingest .local/testdata/2023q4-alphabet-earnings-release.pdf -d foobar --json
{"time":"2024-09-13T18:18:19.379835505+02:00","level":"INFO","msg":"Created dataset","id":"default"}
{"time":"2024-09-13T18:18:19.381749417+02:00","level":"INFO","msg":"Created dataset","id":"foobar"}
{"time":"2024-09-13T18:18:19.384188814+02:00","level":"INFO","msg":"Starting document loader","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"parse","stage":"documentloader","status":"starting"}
{"time":"2024-09-13T18:18:19.404059404+02:00","level":"INFO","msg":"Loaded documents","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"parse","stage":"documentloader","status":"completed","num_documents":11}
{"time":"2024-09-13T18:18:19.40437493+02:00","level":"INFO","msg":"Starting text splitter","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"parse","stage":"textsplitter","num_documents":11,"status":"starting"}
{"time":"2024-09-13T18:18:19.512585031+02:00","level":"INFO","msg":"Split documents","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"parse","stage":"textsplitter","num_documents":11,"status":"completed","new_num_documents":23}
{"time":"2024-09-13T18:18:19.512606353+02:00","level":"INFO","msg":"Starting document transformers","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"parse","stage":"transformer","num_documents":23,"num_transformers":1,"status":"starting"}
{"time":"2024-09-13T18:18:19.512615555+02:00","level":"INFO","msg":"Running transformer","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","transformer":"extra_metadata","progress":"1/1","progress_unit":"transformations"}
{"time":"2024-09-13T18:18:19.512626619+02:00","level":"INFO","msg":"Transformed documents","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","transformer":"extra_metadata","progress":"1/1","progress_unit":"transformations","status":"completed","num_documents":23}
{"time":"2024-09-13T18:18:19.512629977+02:00","level":"INFO","msg":"Transformed documents","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"parse","stage":"transformer","num_documents":23,"num_transformers":1,"status":"completed","new_num_documents":23}
{"time":"2024-09-13T18:18:19.512643215+02:00","level":"INFO","msg":"Adding documents to collection (generating embeddings)","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","stage":"vectorstore","vectorstore":"chromem-go","status":"starting"}
{"time":"2024-09-13T18:18:20.323637363+02:00","level":"INFO","msg":"Added documents to collection (generated embeddings)","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","stage":"vectorstore","vectorstore":"chromem-go","status":"completed"}
{"time":"2024-09-13T18:18:20.323687658+02:00","level":"INFO","msg":"Inserting file and documents into index","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"store","filename":"2023q4-alphabet-earnings-release.pdf","filetype":".pdf","component":"index"}
{"time":"2024-09-13T18:18:20.330230905+02:00","level":"INFO","msg":"Ingested document","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"store","filename":"2023q4-alphabet-earnings-release.pdf","filetype":".pdf","status":"completed","num_documents":23,"absolute_path":"/home/thklein/git/github.com/gptscript-ai/knowledge/.local/testdata/2023q4-alphabet-earnings-release.pdf"}
Ingested 1 files from ".local/testdata/2023q4-alphabet-earnings-release.pdf" into dataset "foobar"