Open ckyrkou opened 1 week ago
👋 Hello @ckyrkou, thank you for your interest in Ultralytics 🚀! We recommend checking out the Docs for valuable information and examples related to your question. Specifically, you can explore details about training in the Model Training Guide.
Regarding your query, it seems you're asking about an internal logic in the trainer code related to optimizer selection based on iteration count. This is an interesting aspect, and to provide you with the most accurate information, an Ultralytics engineer will review this and assist you soon 🙌.
If you believe the behavior is inconsistent or incorrect, and you suspect it's a 🐛 Bug, please help us by providing a minimum reproducible example. This will greatly help us in investigating and addressing the issue if needed.
Meanwhile, feel free to experiment with overriding the default logic in your own implementation to better suit your experiments. You can also discuss this further with the community or gain insights from others by joining our Discord 🎧, Discourse, or Subreddit.
Make sure you're using the latest version of the ultralytics
package. Update it with:
pip install -U ultralytics
Running the latest version ensures the framework includes the most recent updates and fixes.
Validate your setup using one of our verified environments such as:
Check the current testing status here:
If the badge is green, all Ultralytics CI tests are passing and models are functioning properly across environments.
Thank you again for your curiosity and for bringing this up! An Ultralytics engineer will follow up with a more detailed response soon 🔍✨.
@Y-T-G ah yes, back when I had time to train models. The good old days!
@Y-T-G BTW not sure if there are other improved optimizers since that study (not that I know of but haven't been actively searching either). If there are it might make sense to repeat the study with the latest optimizers in case there's a better default now.
Search before asking
Question
Hi,
I have a question regarding this line in ultralytics/engine/trainer.py
name, lr, momentum = ("SGD", 0.01, 0.9) if iterations > 10000 else ("AdamW", lr_fit, 0.9)
Why is optimizer determined by number of iterations? Any intuition or paper that supports this? This is the same either when fine tuning or training from scratch right so what is the assumptions behind this? Just curious because it would seem that both optimizers work well so why the distinction based on iterations?
Also why is initial learning rate influenced by number of classes? *lr_fit = round(0.002 5 / (4 + nc), 6) # lr0 fit equation to 6 decimal places**
Any other feedback is appreciated.
Thank you
Additional
No response