mlcommons / algorithmic-efficiency

MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
https://mlcommons.org/en/groups/research-algorithms/
Apache License 2.0
321 stars 62 forks source link

Fix workload targets and max runtimes in docs #685

Closed runame closed 6 months ago

runame commented 6 months ago

Fixes #660 and #684.

@priyakasimbeg I noticed that we don't have a shared parent class for the librispeech deepspeech workloads and specify the targets, max runtime, and other properties twice -- once for Jax and once for PyTorch. Is there any reason for that? If not we should probably create a separate workload parent class. Otherwise it is easy to introduce a bug by accidentally making these values inconsistent.

github-actions[bot] commented 6 months ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

priyakasimbeg commented 6 months ago

Hi Runa, Thanks a ton for working on this.

I noticed that we don't have a shared parent class for the Librispeech DeepSpeech workloads and specify the targets, max runtime, and other properties twice -- once for Jax and once for PyTorch. Is there any reason for that?

Right they inherit from the conformer JAX and PyTorch workloads to reuse the input pipeline and other methods. We actually had a bug until a few months ago where the DeepSpeech workloads were inheriting all the properties from the conformer workloads (including targets). The targets and runtimes were specified in a Librispeech DeepSpeech parent workload that was not being used so I removed it (https://github.com/mlcommons/algorithmic-efficiency/pull/526).
At the time I took a stab at separating DeepSpeech from Conformer but it quickly turned into a mess with a large amount of duplicate code. This is definitely a good thing to fix at some point in the future but I don't think we want to go down that rabbit hole right now.