Length time reward models

opentensor / validators

Repository for bittensor validators

https://www.bittensor.com/

MIT License

14 stars 9 forks source link

Length time reward models #100

Closed Eugene-hu closed 11 months ago

Eugene-hu commented 1 year ago

adds two additional reward models based on length and time. These rewards are normalized based on the event type. Each event type will follow a different normalization distribution.
These two rewards models allow us to directly reward the speed of the miners on network and put emphasis on the fidelity of responses.

mrseeker commented 1 year ago

To give a nice understanding of why this should not be used:

Average response +/- 100 tokens.
average speed of selfhosted AI models: 15-30T/s
average speed of openAI: 100T/s. (https://gptforwork.com/tools/openai-api-and-other-llm-apis-response-time-tracker)

This will heavily skew results in favour of sending a lot of garbage in a small amount of time. Better to use it as a "cut-off" (minimum tokens required) and time-out as a combination to control minimum token speed.

Eugene-hu commented 1 year ago

This will heavily skew results in favour of sending a lot of garbage in a small amount of time. Better to use it as a "cut-off" (minimum tokens required) and time-out as a combination to control minimum token speed.

I agree with the concern and we are adding additional safeguards to limit the effects of the system. Most of the reward will still be based on the reward models, however, this will give us a way to accurately tune how much emphasis to put on the fidelity of responses. We will continue to monitor the system after launch to ensure that the models are working as expected.