mlcommons / algorithmic-efficiency

MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
https://mlcommons.org/en/groups/research-algorithms/
Apache License 2.0
319 stars 60 forks source link

Support for optimizers that require an `.eval()` step #758

Open netw0rkf10w opened 3 months ago

netw0rkf10w commented 3 months ago

Description

First of all if this feature is already supported then please consider this as a question.

I'm trying to reproduce some results of existing algorithms such as SGD, AdamW, and also the recently proposed ScheduleFree. For the latter, in particular, there is an .eval() step that needs to be done before the validation phase, and I haven't figured out how to do that properly (there doesn't seem to be any indication in their submission code).

Any help would be greatly appreciated! Thank you in advance!

adefazio commented 3 months ago

I use the closure form of the algorithm to avoid needing to do the eval() call. There has been a request from another user to support this this sort of eval() mode so that exponential weight averaging can be implemented. The organizers said that this would be something they will look at adding in the future. It should help with performance.

netw0rkf10w commented 3 months ago

Thanks a lot, @adefazio. Nice trick!

As I understand, you basically swap the extrapolated points at every step, is that correct? That seems to be putting your algorithm at a disadvantage though. Have you observed considerable slowdown compared to the .eval() version?

And I'm quite surprised that the proposed feature hasn't been added to support your method. It's just two lines of code (actually we only need one). Or maybe I'm missing something?

Would love to hear your opinion on this, @priyakasimbeg. Thanks.

adefazio commented 3 months ago

The overhead is minimal, parameter copying is very fast. The issue was raised very close to the deadline and so given the low overhead of the copy, it was decided that changing the competition API close to the deadline was not warranted.