Develop data transformation pipeline for optimal RAG and fine-tuning performance - Githubissues

second-opinion-ai / second-opinion

8 stars 0 forks source link

Develop data transformation pipeline for optimal RAG and fine-tuning performance #24

Open branhoff opened 9 months ago

branhoff commented 9 months ago

Description

In this paper RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture , a rough structure is outlined for transforming data into a usable state for RAG and fine-tuning enhancements of an LLM.

This ticket seeks to develop an initial process for transforming data that we scrape (car diagnostic manuals for instance) into a Q&A format.

Acceptance Criteria

Data should be in jsonl formats
Data should be structured as Q&A's
Q&A's should be reviewed and filtered by LLM's according to the criteria laid out in the paper
The implementation should be generic enough that this process should be easily repeatable.