lessw2020 / Ranger21

Ranger deep learning optimizer rewrite to use newest components
Apache License 2.0
320 stars 44 forks source link

Performance of ResNet50 on ImageNet #27

Open juntang-zhuang opened 2 years ago

juntang-zhuang commented 2 years ago

Hi, thanks for the nice project. I noticed your paper achieved a 73.69 accuracy on ImageNet with ResNet50, which is much worse than reported by Keras https://keras.io/api/applications/ (74.9, 76.0 for v2) and PyTorch (76.15). I wonder does this mean Ranger cannot achieve as high accuracy as officially reported with SGD? Or is it caused by other settings are different? If so, how does ranger compare to the best of SGD in a fair setting?

lessw2020 commented 2 years ago

Hi @juntang-zhuang, In our paper, we only trained 60 epochs due to the cost of training. Typically people train 200 - 300 or so epochs on ImageNet if the goal is to maximize the final model accuracy. In our case, our purpose was to compare AdamW and Ranger21 optimizers, head to head, with all other variables the same - i.e. straightforward set of transformations, and spending < $1K on training (i.e. fixed compute budget). Thus, the only takeaway from our paper is that Ranger21 outperforms AdamW assuming all other variables are identical (same transformations, same epochs total training). It says nothing about Keras results, Pytorch results, or SGD results as we would need to replicate the same total number of epochs and augmentation pipeline done in those papers, while training with Ranger21, to do a direct apple->apple comparison. Hope that helps! Less