Closed Eugene-hu closed 11 months ago
To give a nice understanding of why this should not be used:
This will heavily skew results in favour of sending a lot of garbage in a small amount of time. Better to use it as a "cut-off" (minimum tokens required) and time-out as a combination to control minimum token speed.
This will heavily skew results in favour of sending a lot of garbage in a small amount of time. Better to use it as a "cut-off" (minimum tokens required) and time-out as a combination to control minimum token speed.
I agree with the concern and we are adding additional safeguards to limit the effects of the system. Most of the reward will still be based on the reward models, however, this will give us a way to accurately tune how much emphasis to put on the fidelity of responses. We will continue to monitor the system after launch to ensure that the models are working as expected.