Closed iwilltry42 closed 2 months ago
Ref #116
--json
Here's a breakdown of key used in the logs:
flow: ingestion or retrieval - currently only implemented for ingestion
flow
ingestion
retrieval
phase
open
parse
store
stage
read
documentloader
textsplitter
transformers
global fields are status with the following possible values:
status
starting
completed
skipped
reason
failed
error
there may be a progress field indicating the progress within a stage or a step within a stage, if it's measurable
progress
<int>/<int>
<current>/<total>
3/5
progressUnit
transformations
3/5 transformations
Here's an example log:
$ knowledge ingest .local/testdata/2023q4-alphabet-earnings-release.pdf -d foobar --json {"time":"2024-09-13T18:18:19.379835505+02:00","level":"INFO","msg":"Created dataset","id":"default"} {"time":"2024-09-13T18:18:19.381749417+02:00","level":"INFO","msg":"Created dataset","id":"foobar"} {"time":"2024-09-13T18:18:19.384188814+02:00","level":"INFO","msg":"Starting document loader","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"parse","stage":"documentloader","status":"starting"} {"time":"2024-09-13T18:18:19.404059404+02:00","level":"INFO","msg":"Loaded documents","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"parse","stage":"documentloader","status":"completed","num_documents":11} {"time":"2024-09-13T18:18:19.40437493+02:00","level":"INFO","msg":"Starting text splitter","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"parse","stage":"textsplitter","num_documents":11,"status":"starting"} {"time":"2024-09-13T18:18:19.512585031+02:00","level":"INFO","msg":"Split documents","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"parse","stage":"textsplitter","num_documents":11,"status":"completed","new_num_documents":23} {"time":"2024-09-13T18:18:19.512606353+02:00","level":"INFO","msg":"Starting document transformers","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"parse","stage":"transformer","num_documents":23,"num_transformers":1,"status":"starting"} {"time":"2024-09-13T18:18:19.512615555+02:00","level":"INFO","msg":"Running transformer","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","transformer":"extra_metadata","progress":"1/1","progress_unit":"transformations"} {"time":"2024-09-13T18:18:19.512626619+02:00","level":"INFO","msg":"Transformed documents","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","transformer":"extra_metadata","progress":"1/1","progress_unit":"transformations","status":"completed","num_documents":23} {"time":"2024-09-13T18:18:19.512629977+02:00","level":"INFO","msg":"Transformed documents","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"parse","stage":"transformer","num_documents":23,"num_transformers":1,"status":"completed","new_num_documents":23} {"time":"2024-09-13T18:18:19.512643215+02:00","level":"INFO","msg":"Adding documents to collection (generating embeddings)","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","stage":"vectorstore","vectorstore":"chromem-go","status":"starting"} {"time":"2024-09-13T18:18:20.323637363+02:00","level":"INFO","msg":"Added documents to collection (generated embeddings)","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","stage":"vectorstore","vectorstore":"chromem-go","status":"completed"} {"time":"2024-09-13T18:18:20.323687658+02:00","level":"INFO","msg":"Inserting file and documents into index","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"store","filename":"2023q4-alphabet-earnings-release.pdf","filetype":".pdf","component":"index"} {"time":"2024-09-13T18:18:20.330230905+02:00","level":"INFO","msg":"Ingested document","flow":"ingestion","rootPath":".local/testdata/2023q4-alphabet-earnings-release.pdf","filepath":".local/testdata/2023q4-alphabet-earnings-release.pdf","phase":"store","filename":"2023q4-alphabet-earnings-release.pdf","filetype":".pdf","status":"completed","num_documents":23,"absolute_path":"/home/thklein/git/github.com/gptscript-ai/knowledge/.local/testdata/2023q4-alphabet-earnings-release.pdf"} Ingested 1 files from ".local/testdata/2023q4-alphabet-earnings-release.pdf" into dataset "foobar"
Ref #116
--json
root level flag to switch log output to JSONHere's a breakdown of key used in the logs:
flow
:ingestion
orretrieval
- currently only implemented foringestion
phase
: phase within a flowingestion
flow has these phases:open
->parse
->store
stage
: stage within the phaseopen
:read
parse
:documentloader
->textsplitter
->transformers
global fields are
status
with the following possible values:starting
completed
skipped
-> additional fieldreason
failed
-> additional fielderror
there may be a
progress
field indicating the progress within a stage or a step within a stage, if it's measurableprogress
value is in the format<int>/<int>
, meaning<current>/<total>
, e.g.3/5
progressUnit
describes, what the above means, e.g.transformations
, so together it would be3/5 transformations
Here's an example log: