Feature Overview (mandatory - Complete while in New status)
The various InstructLab experiences use different SDG pipelines, which, even when fine-tuning a full-resolution model, the performance & quality of the resulting fine-tuned models are different.
This card is for creating an evaluation flow that takes SDG generated by the three default pipelines, fine-tunes a full-resolution model with them, and evaluates the performance of the fine-tuned model.
Goals (mandatory - Complete while in New status)
Provide quantitative evidence of the model performance impact based on the SDG pipeline used to fine-tune the model.
Requirements (mandatory -_ Complete while in Refinement status):
Generate SDG using the three default pipelines:
laptop (a simplified self-instruct) (pipeline=simple)
upstream (SDG 1.0) (pipeline=full)
downstream RHEL AI (SDG 1.5) (pipeline=agentic)
Use the multi-stage agentic fine-tuning pipeline to generate a fine-tune model for each SDG pipeline
Evaluate each of the resulting models on the domain-specific knowledge (e.g. MMLU_branch)
Use the task-dir from the agentic pipeline?
Note: Consider repeating the experiment with various distinct runs of each SDG pipeline to identify the expected range or an average number of the performance differences.
Done - Acceptance Criteria (mandatory - Complete while in Refinement status):
Provide a report on model performance differences of the three default pipelines
Provide a pipeline or scripts users can execute on-premise should they want to replicate the evaluation for their use cases
Feature Overview (mandatory - Complete while in New status)
The various InstructLab experiences use different SDG pipelines, which, even when fine-tuning a full-resolution model, the performance & quality of the resulting fine-tuned models are different.
This card is for creating an evaluation flow that takes SDG generated by the three default pipelines, fine-tunes a full-resolution model with them, and evaluates the performance of the fine-tuned model.
Goals (mandatory - Complete while in New status)
Provide quantitative evidence of the model performance impact based on the SDG pipeline used to fine-tune the model.
Requirements (mandatory -_ Complete while in Refinement status):
Generate SDG using the three default pipelines: laptop (a simplified self-instruct) (pipeline=simple) upstream (SDG 1.0) (pipeline=full) downstream RHEL AI (SDG 1.5) (pipeline=agentic) Use the multi-stage agentic fine-tuning pipeline to generate a fine-tune model for each SDG pipeline Evaluate each of the resulting models on the domain-specific knowledge (e.g. MMLU_branch) Use the task-dir from the agentic pipeline? Note: Consider repeating the experiment with various distinct runs of each SDG pipeline to identify the expected range or an average number of the performance differences.
Done - Acceptance Criteria (mandatory - Complete while in Refinement status):
Provide a report on model performance differences of the three default pipelines Provide a pipeline or scripts users can execute on-premise should they want to replicate the evaluation for their use cases
Tasks/Epics Tracker: