The code seems to be training from difficult to easy instead of from easy to difficult. It appears that the ">" and "<" signs are reversed.

ljy0ustc / LLaRA

Apache License 2.0

67 stars 3 forks source link

The code seems to be training from difficult to easy instead of from easy to difficult. It appears that the ">" and "<" signs are reversed. #16

Open mfj12315 opened 5 days ago

mfj12315 commented 5 days ago

The code seems to be training from difficult to easy instead of from easy to difficult. It appears that the ">" and "<" signs are reversed. In LLaRA/data/data_interface.py, if the random number p is greater than the threshold, a mixed more difficult prompt is used. However, the threshold increases as the training step increases, which means that in practice, the training tasks start off the most difficult and become easier later on. Please see the detailed explanation in the image below. 123

ljy0ustc commented 5 days ago

Thanks for pointing out! We'll look into this in a week and update if necessary.