mlcommons / algorithmic-efficiency

MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
https://mlcommons.org/en/groups/research-algorithms/
Apache License 2.0
319 stars 60 forks source link

Wrong return type in librispeech model_fn #762

Open Niccolo-Ajroldi opened 2 months ago

Niccolo-Ajroldi commented 2 months ago

In librispeech_conformer the model_fn returns logits_batch as a Tuple of tensors, not a tensor.

The return type is hence wrong: https://github.com/mlcommons/algorithmic-efficiency/blob/ddf5efc4e13a9a4e620ad719e9bf42303f064fac/algorithmic_efficiency/workloads/librispeech_conformer/librispeech_pytorch/workload.py#L119

It should be:

  def model_fn(...) -> Tuple[Tuple[spec.Tensor, spec.Tensor], spec.ModelAuxiliaryState]:

As insignificant as it seems, this caused me quite some trouble debugging an OOM issue. Might be useful for other ppl too.