The code seems to be training from difficult to easy instead of from easy to difficult. It appears that the ">" and "<" signs are reversed. In LLaRA/data/data_interface.py, if the random number p is greater than the threshold, a mixed more difficult prompt is used. However, the threshold increases as the training step increases, which means that in practice, the training tasks start off the most difficult and become easier later on. Please see the detailed explanation in the image below.
The code seems to be training from difficult to easy instead of from easy to difficult. It appears that the ">" and "<" signs are reversed. In LLaRA/data/data_interface.py, if the random number p is greater than the threshold, a mixed more difficult prompt is used. However, the threshold increases as the training step increases, which means that in practice, the training tasks start off the most difficult and become easier later on. Please see the detailed explanation in the image below.![123](https://github.com/ljy0ustc/LLaRA/assets/150609095/9ece24ed-d867-4485-95ec-a87f5608a57c)