Closed liujuncn closed 5 months ago
In our previous experiments, we noticed that the order would introduce bias when scoring. Therefore, referring to some previous works, we will randomize the order of sequences.
I'm closing this issue for now. If you have any additional questions or concerns, please don't hesitate to reopen it.
Hello, thank you for your great work!
Regarding the EVOL COMPLEXITY method in the paper, where ChatGPT ranks and scores the complexity of samples, I have recently observed that many LLMs tend to score samples in a descending order from high to low. For example, a sample sequence like ABCD tends to be scored from high to low, and when the order is adjusted (e.g., CDAB), the scoring trend remains similar.
Have you observed a similar phenomenon, and if so, have you made any corresponding adjustments in your experiments?