Metric is defined as end to end execution of application. Is there a clever way to preload outside of this definition?
If the application is interpreted to be the python script, then that would exclude things like scp-ing data from GCS/S3 and/or running a block-cache pre-warmer to load the dataset into the Linux block cache (and/or taking the second training run because the first one was cancelled mid-way through / etc). Maybe it makes sense to provide guidance on the interpretation of application? What do you think?
In general, do what the reference implementation does.
Modifications are allowed for scaling. There will be a forthcoming list of pre-approved modifications. These modifications can be used any system size, if desired.
When taking timing measurements, measure time the same way the reference implementation does. If the behavior isn't clear, or is undesirable, file an issue and further clarification will be provided.
A cheat-sheet will be provided for each model with the most common cases covered.
As far as caching and system tricks, you should flush the caches prior to each run (or something similar, such as restarting the computer). If you have a special case, file an issue and additional guidance can be discussed and given.
Metric is defined as end to end execution of application. Is there a clever way to preload outside of this definition?
If the application is interpreted to be the python script, then that would exclude things like scp-ing data from GCS/S3 and/or running a block-cache pre-warmer to load the dataset into the Linux block cache (and/or taking the second training run because the first one was cancelled mid-way through / etc). Maybe it makes sense to provide guidance on the interpretation of application? What do you think?