hkust-nlp / deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
Apache License 2.0
458 stars 28 forks source link

[Question] Regarding the order bias in sample scoring. #15

Closed liujuncn closed 5 months ago

liujuncn commented 7 months ago

Hello, thank you for your great work!

Regarding the EVOL COMPLEXITY method in the paper, where ChatGPT ranks and scores the complexity of samples, I have recently observed that many LLMs tend to score samples in a descending order from high to low. For example, a sample sequence like ABCD tends to be scored from high to low, and when the order is adjusted (e.g., CDAB), the scoring trend remains similar.

Have you observed a similar phenomenon, and if so, have you made any corresponding adjustments in your experiments?

Zeng-WH commented 6 months ago

In our previous experiments, we noticed that the order would introduce bias when scoring. Therefore, referring to some previous works, we will randomize the order of sequences.

VPeterV commented 5 months ago

I'm closing this issue for now. If you have any additional questions or concerns, please don't hesitate to reopen it.