mlcommons / algorithmic-efficiency

MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
https://mlcommons.org/en/groups/research-algorithms/
Apache License 2.0
321 stars 62 forks source link

Don't save final checkpoint when `save_checkpoints=False` #713

Closed runame closed 5 months ago

runame commented 5 months ago

Fixes #705.

I noticed that we already have a save_intermediate_checkpoints flag. When setting this to False and save_checkpoints=True, we can recreate the previous behaviour of only storing one final checkpoint.

Not sure where exactly in the docs to mention the lack of checkpointing during scoring? It shouldn't matter to the submitter as it is not timed, besides in edge cases like having an optimizer that does something like this, which would lead to an error in the submission if save_checkpoints=True.

github-actions[bot] commented 5 months ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

priyakasimbeg commented 5 months ago

I noticed that we already have a save_intermediate_checkpoints flag.

I forgot about that.

Not sure where exactly in the docs to mention the lack of checkpointing during scoring?

I'm not sure what the best place is either. I think we can add a question to the FAQS in DOCUMENTATION.md like: "my optimizer is incompatible with the AlgoPerf checkpointing code, will this affect my submission?"

@fsschneider We could also say something in this section about how we will run the code (e.g. "Note that we will disable checkpointing on while we score submission") ?