enhance: unpack archives before ingestion

gptscript-ai / knowledge

Knowledge for GPTScript

https://gptscript-ai.github.io/knowledge/

Apache License 2.0

24 stars 11 forks source link

enhance: unpack archives before ingestion #100

Open iwilltry42 opened 4 weeks ago

iwilltry42 commented 4 weeks ago

Currently, handling errors coming from ingesting documents within an archive can get pretty nasty. Also, the way we do it right now cannot be parallelized for speedup and first collects all documents/chunks before doing the embedding. Missed opportunity for concurrency here and highly error-prone.