We would like to evaluate the model performance for various LLM fine tuning approaches and compare them with the standard benchmarks. An experiment we would like to try is:
Compare the full cartesian product of fine tuning for the Granite model (medium model) with relevant combinations:
{small, medium, large models} x {no pre-training + full supervised training, full supervised fine-tuning, LoRA, RAG, LoRA + RAG etc.} x {synthetic, no synthetic}. We can omit combinations that may not be relevant for our use case.
Benchmarks we can compare against (obtained from ChatGPT, we should validate these numbers with relevant published papers):
We would like to evaluate the model performance for various LLM fine tuning approaches and compare them with the standard benchmarks. An experiment we would like to try is:
Compare the full cartesian product of fine tuning for the Granite model (medium model) with relevant combinations: {small, medium, large models} x {no pre-training + full supervised training, full supervised fine-tuning, LoRA, RAG, LoRA + RAG etc.} x {synthetic, no synthetic}. We can omit combinations that may not be relevant for our use case.
Benchmarks we can compare against (obtained from ChatGPT, we should validate these numbers with relevant published papers):
![verylarge_ragft](https://github.com/redhat-et/datascience-wg/assets/7343099/217ea681-124b-4d8d-84e9-af11d0ce1350)