Unfair comparison in Visualizations

XuezheMax commented 4 years ago

Hi,

Thanks a lot for this great repo. For the comparison in the Visualizations example, I found that for each config, you run 100 updates. I am concerned that 100 is too small so that it would favor optimizers that have fast convergence in the first few updates.

For other optimizers that the convergence is relatively slow at beginning, it would select large lr. This could lead to unstable convergence for these optimizers.

Moreover, for hyper-parameter search, the objective is the distance between the last step point and the minimum. I think the function value of the last step point may be a better objective.

At last, some optimizers implicitly implement learning rate decay (such as AdaBound and RAdam), but some not. But in your comparison, no explicit learning rate schedule is used.

jettify commented 4 years ago

To be honest all comparisons are not fair in one way to other. But in general I agree with you. I am going to add a note that visualizations is not good way to select optimizer.

jettify commented 4 years ago

I happy to merge any PRs with improvements to visualisations, I have few things in mind also like use search for more hyper parameters etc. just have not managed to do it yet.

XuezheMax commented 4 years ago

Thanks a lot for your response.

jettify commented 4 years ago

Here is PR with that add warning https://github.com/jettify/pytorch-optimizer/pull/222 Please create PR if you want improve messaging there.

XuezheMax commented 3 years ago

The message is great. Thanks!

jettify / pytorch-optimizer

Unfair comparison in Visualizations #219