MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
Criteo1tb OOMs during eval for some third party training algorithms in PyTorch.
We're exploring reducing the criteo1tb eval bsz on both JAX and PyTorch.
The AIs of this issue are to:
[x] Investigate if reducing eval bsz by 4x significantly impacts the run time.
[x] If not, update the bsz.
[x] Clarify in documentation that submitters do not have control over the eval bsz.
Criteo1tb OOMs during eval for some third party training algorithms in PyTorch. We're exploring reducing the criteo1tb eval bsz on both JAX and PyTorch.
The AIs of this issue are to: