Closed KwangryeolPark closed 10 months ago
👋 Hello @KwangryeolPark, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.
If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.
Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.
We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!
Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.
Check out our YOLOv8 Docs for details and get started with:
pip install ultralytics
I hope you fix the mixed precision problem.
@KwangryeolPark hello! Thanks for bringing this to our attention. NaNs during training can indeed sometimes be related to precision issues when using mixed precision training (AMP). However, there could be other factors at play, such as learning rate, weight initialization, or data preprocessing.
Regarding the use of NVIDIA apex, YOLOv5 uses PyTorch's native AMP implementation, which is generally recommended for its ease of use and integration. If you're experiencing NaNs with AMP, you might want to try the following:
If you're willing to submit a PR, we'd be happy to review any improvements or fixes you propose. Just make sure to thoroughly test your changes to ensure they're beneficial across various scenarios.
Remember to check out our documentation for more details on troubleshooting and best practices: https://docs.ultralytics.com/yolov5/
Thanks for your contribution to the YOLOv5 community! 🚀
@glenn-jocher Thank you for answer.
In order to set learning-rate, I see Training Arguments and find lr0 argument. However, when I add --lr0 0.001, the script shows train.py: error: unrecognized arguments: --lr0 1e-3
.
Apologies for the confusion, @KwangryeolPark. The correct argument for setting the initial learning rate in the YOLOv5 training script is --lr
. So, if you want to set the initial learning rate to 0.001, you should use the following command:
python train.py --data coco.yaml --epochs 300 --weights '' --cfg yolov5m.yaml --batch-size 40 --optimizer CAME --device 0 --lr 0.001
Make sure to adjust the learning rate according to your specific needs and keep an eye on the training process to ensure stability. If you have any further questions or issues, don't hesitate to reach out. Happy training! 🚀
@glenn-jocher Thank you for guidance. However, --lr 0.001
argument also occur: train.py: error: unrecognized arguments: --lr 0.001
I apologize for the oversight, @KwangryeolPark. In YOLOv5, the learning rate is set in the hyperparameter configuration file rather than as a command-line argument. You can adjust the learning rate by editing the hyp.scratch.yaml
file or any other hyperparameter file you are using.
For example, to set the initial learning rate to 0.001, you would modify the lr0
value in your hyperparameter file like so:
lr0: 0.001 # initial learning rate
Then, you can reference this hyperparameter file during training using the --hyp
argument:
python train.py --data coco.yaml --epochs 300 --weights '' --cfg yolov5m.yaml --batch-size 40 --optimizer CAME --device 0 --hyp your_hyperparameter_file.yaml
Replace your_hyperparameter_file.yaml
with the path to your edited hyperparameter file. This should correctly set the initial learning rate for your training session. If you encounter any further issues, please let us know. Good luck with your training! 🌟
Thank you
You're welcome, @KwangryeolPark! If you have any more questions or need further assistance in the future, feel free to reach out. Best of luck with your YOLOv5 training! Happy detecting! 🚀👀
Search before asking
YOLOv5 Component
Training
Bug
Like other issues, I also see NaN during training yolov5m to coco dataset following the script in coco.yaml and README.md.
I try to figure out the reason for NaN and I find a hint in a Issue which indirectly is about amp (Auto Mixed Precision).
It makes sense that low precission has a higher chance to occur NaN during casting because of Underflow.
Therefore, I think, lots of NaN problem come from amp so I looks better to use NVIDIA apex which uses distribution shift to prevent distribution miss match.
Environment
YOLOv5m torch:1.12.1+cu116 python: 3.8.12 dataset: coco optimizer: CAME epochs: 300 batch size: 40
Minimal Reproducible Example
I use CAME optimizer with betas=(momentum, 0.999, 0.999)
Additional
No response
Are you willing to submit a PR?