Open DreamsofGg opened 3 years ago
it seems that the model will tend to make the token number larger when fix threshold (hard training step) because it cannot take L1 loss into account. How to solve this problem?
it seems that the model will tend to make the token number larger when fix threshold (hard training step) because it cannot take L1 loss into account. How to solve this problem?