hkust-nlp / deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
Apache License 2.0
458 stars 28 forks source link

Have you conducted ablation experiments with three factors: Complexity, Quality, and Diversity? Which one has the greatest impact on performance improvement? #22

Closed 447428054 closed 3 days ago

447428054 commented 5 months ago

Have you conducted ablation experiments with three factors: Complexity, Quality, and Diversity? Which one has the greatest impact on performance improvement?

VPeterV commented 5 months ago

Hi, thx for ur interest!

I believe the comparisons you're referring to can be found in sections 2.2 to 2.5 of our paper. Regarding "the greatest impact factor", our experience suggests that each factor needs to be integrated with others to maximize potential. For example, focusing solely on diversity in sample selection won't necessarily yield samples with the highest complexity or quality. Conversely, prioritizing complexity without considering diversity can lead to suboptimal performance due to redundancy in the selected subset.