ultralytics / ultralytics

Ultralytics YOLO11 πŸš€
https://docs.ultralytics.com
GNU Affero General Public License v3.0
36.49k stars 7.04k forks source link

Decision which layers get weight decay regularization #19064

Closed Petros626 closed 1 week ago

Petros626 commented 1 week ago

Search before asking

Question

Hey,

This is a logger output before the training starts: optimizer: Adam(lr=0.01, momentum=0.937) with parameter groups 63 weight(decay=0.0), 73 weight(decay=0.0005), 72 bias(decay=0.0)

How it is decided which parameters receive weight decay? Are only those parameter groups taken that are useful for the training?

Thanks for clarification!

Additional

No response

UltralyticsAssistant commented 1 week ago

πŸ‘‹ Hello @Petros626, thank you for your interest in Ultralytics πŸš€! We recommend visiting the Docs for detailed explanations on key topics like optimization and hyperparameters, which might address similar questions. You can also find helpful Python and CLI usage examples there.

If your query is a ❓ Question, providing further details, such as relevant coding context or how you are observing this behavior during training, may help us offer better assistance.

If this turns out to be related to a πŸ› Bug, we kindly request a minimum reproducible example. This will allow us to debug and understand the issue more effectively.

Ready to connect with the Ultralytics community or share ideas? You can use these platforms:

Explore and Upgrade

If you’re using an older version of Ultralytics, try upgrading to the latest version to ensure you have the latest fixes and improvements:

pip install -U ultralytics

The optimizer details you’ve shared might be influenced by configurations from the latest release. Full requirements are available for review, and we recommend using Python>=3.8 in combination with PyTorch>=1.8.

Visualize and Experiment

YOLO can run in various cloud-based environments with pre-installed dependencies:

For advanced workflows, explore options like Docker Docker Pulls, or use AWS 🌎 and GCP guides for enterprise-scale projects.

Status Check

Ultralytics CI

This badge indicates the current build status of all Ultralytics CI tests. CI continuously ensures the correct operation of YOLO Modes and Tasks across diverse setups on macOS, Windows, and Ubuntu.

πŸ’‘ Please note, this is an automated response. An Ultralytics engineer will follow up with you soon to provide additional assistance.

glenn-jocher commented 1 week ago

@Petros626 the Ultralytics YOLO optimizer applies weight decay (L2 regularization) only to convolutional and linear layer weights, excluding biases and normalization layers (BatchNorm) to maintain their distribution stability. This selective approach enhances training performance. For implementation details, see the build_optimizer source code that groups parameters into weight decay/no-decay categories.

Petros626 commented 1 week ago

Now it's clear, thank you.

glenn-jocher commented 6 days ago

@Petros626 you're welcome! If you have any more questions as you dive deeper into your project, feel free to ask. Happy training with Ultralytics YOLO! πŸš€