ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.88k stars 16.38k forks source link

Loss computation sometimes cause nan values #13416

Open tobymuller233 opened 1 hour ago

tobymuller233 commented 1 hour ago

Search before asking

YOLOv5 Component

Training

Bug

These days when I'm trying to fine tune my model after pruning by training for several epochs, I found that loss value becomes nan from time to time. By setting breakpoints and checking, I found that there's a bug in metrics.py Sometimes, if the prediction of some bounding box has a width or height of 0, it turns out to be nan values! Since in CIoU computation, h2 and h1 are used as dividers here.

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

UltralyticsAssistant commented 1 hour ago

👋 Hello @tobymuller233, thank you for your interest in YOLOv5 🚀! It seems like you're encountering a nan values issue during training, and there might be a potential bug in the metrics.py file. To assist, we'll need a bit more information.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us understand and debug the issue. This would include steps to replicate the bug, relevant sections of your code, and any specific error messages.

Additionally, it would be helpful to know more about your environment setup, such as the version of Python, PyTorch, and any other dependencies you are using.

If you have any further insights, like dataset characteristics or specific conditions that might trigger this issue, do share those as well.

Please note that this is an automated response, and an Ultralytics engineer will review your issue and provide further assistance soon. Thank you for your patience and help in improving YOLOv5! 🚀✨