nvtransfer / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Apache License 2.0
646 stars 43 forks source link

Do we have any ideia how many tokens is used to run the full benchmark in a model? #9

Closed daniellefranca96 closed 5 months ago

daniellefranca96 commented 5 months ago

I would like to run this on Gemini 1.0 Pro and Claude 3 so we have their scores but do we have any ideia of the token usage of this benchmark so we can calculate cost in commercial models?

hsiehjackson commented 5 months ago

The token usage is depending on the length you want to test. For example, if you want to test sequence length 128K with our default setting, you will have total tokens 131072 (length) x 500 (samples per task) x 13 (tasks) = 852M tokens. You can decrease the number of samples per task to save your budget.