second-opinion-ai / second-opinion

8 stars 0 forks source link

Develop data transformation pipeline for optimal RAG and fine-tuning performance #24

Open branhoff opened 9 months ago

branhoff commented 9 months ago

Description

In this paper RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture , a rough structure is outlined for transforming data into a usable state for RAG and fine-tuning enhancements of an LLM.

This ticket seeks to develop an initial process for transforming data that we scrape (car diagnostic manuals for instance) into a Q&A format.

Acceptance Criteria

  1. Data should be in jsonl formats
  2. Data should be structured as Q&A's
  3. Q&A's should be reviewed and filtered by LLM's according to the criteria laid out in the paper
  4. The implementation should be generic enough that this process should be easily repeatable.