This PR standardizes the training_time field to always use petaflop/s-day. Simply reporting the number of hours or days trained makes it hard to compare different models due to the use of different hardware. We use the methods outlined in [OpenAI AI and Compute] post to convert the training time reported in number of hours, days or petaflops to petaflop/s-day. The petaflop/s-day metric is also a useful metric as it is commonly used to report training time in the newly released models.
Comments
Although it provides a standard look at the training times of different models, the petaflop/s-day may not be meaningful to the general public. I propose using it as the standard metric in our assets, but converting it into different metrics on the UI, depending on our needs (we can convert petaflop/s-day back into hours or days by saying that we assume that all the models used a certain GPU).
Purpose
This PR standardizes the
training_time
field to always usepetaflop/s-day
. Simply reporting the number ofhours
ordays
trained makes it hard to compare different models due to the use of different hardware. We use the methods outlined in [OpenAI AI and Compute] post to convert the training time reported in number ofhours
,days
orpetaflops
topetaflop/s-day
. Thepetaflop/s-day
metric is also a useful metric as it is commonly used to report training time in the newly released models.Comments
Although it provides a standard look at the training times of different models, the
petaflop/s-day
may not be meaningful to the general public. I propose using it as the standard metric in our assets, but converting it into different metrics on the UI, depending on our needs (we can convertpetaflop/s-day
back into hours or days by saying that we assume that all the models used a certain GPU).