Feature - Gen AI data ingestion workflow / pipeline (gitlab based)

Data Ingestion is a complex task and ingested documents needs to be refreshed / renewed continuously. For now this task can be performed using our basic python tooling available here tock-llm-indexing-tools.

This is done manually and we are going to automate it a be more and also include testing features based on Langfuse datasets.

Our approach will be based on Gitlab pipelines, this solution is simple and will let us schedule data ingestion or even trigger them using Gitlab's API. We will also be able to keep track of each ingestion jobs using gitlab and each job states.

Related issues and PR :

1707 - #1713

Technical design needs to be approved before starting any development work it will also serve as documentation for futur contributors.

theopenconversationkit / tock

Feature - Gen AI data ingestion workflow / pipeline (gitlab based) #1706

1707 - #1713