add: commands to export datasets and to import/query them

gptscript-ai / knowledge

Knowledge for GPTScript

https://gptscript-ai.github.io/knowledge/

Apache License 2.0

24 stars 11 forks source link

add: commands to export datasets and to import/query them #38

Closed iwilltry42 closed 2 months ago

iwilltry42 commented 2 months ago

knowledge export <dataset> --output foo.zip
knowledge retrieve -d <dataset> --archive foo.zip "some question"
knowledge list-datasets --archive foo.zip
knowledge get-dataset <dataset> --archive foo.zip
knowledge import foo.zip

Depends On (currently using fork): https://github.com/philippgille/chromem-go/pull/88

NOTE: there are quite a few rough edges and missing pieces, e.g. the server-mode version of this is not yet implemented.. we'll follow up on this in the future!

iwilltry42 commented 2 months ago

So is it expected that when using archive, user shouldn't be ingesting files since those file ingestion won't be made into archive anyway?

Yeah, if you want to "update" an archive, you would do import -> ingest -> export. That's essentially, because the archive doesn't contain information about the embeddings function. I guess we could somehow serialize it with a very limited parameter set that makes sense for our setup, but I'm not sure it's worth the effort.