theopenconversationkit / tock

Tock, the open source conversational AI toolkit.
https://doc.tock.ai
Apache License 2.0
504 stars 132 forks source link

Feature - Gen AI data ingestion workflow / pipeline (gitlab based) #1706

Open Benvii opened 3 months ago

Benvii commented 3 months ago

Data Ingestion is a complex task and ingested documents needs to be refreshed / renewed continuously. For now this task can be performed using our basic python tooling available here tock-llm-indexing-tools.

This is done manually and we are going to automate it a be more and also include testing features based on Langfuse datasets.

Our approach will be based on Gitlab pipelines, this solution is simple and will let us schedule data ingestion or even trigger them using Gitlab's API. We will also be able to keep track of each ingestion jobs using gitlab and each job states.

Related issues and PR :

Technical design needs to be approved before starting any development work it will also serve as documentation for futur contributors.