mlcommons / algorithmic-efficiency

MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
https://mlcommons.org/en/groups/research-algorithms/
Apache License 2.0
321 stars 62 forks source link

Decrease criteo1tb eval bsz #641

Closed priyakasimbeg closed 6 months ago

priyakasimbeg commented 7 months ago

Criteo1tb OOMs during eval for some third party training algorithms in PyTorch. We're exploring reducing the criteo1tb eval bsz on both JAX and PyTorch.

The AIs of this issue are to: