Closed agentmorris closed 6 months ago
@agentmorris hello! Thank you for the detailed report and for following up on the previous issue. It's quite intriguing that you're observing negative-width bounding boxes exclusively on M1 hardware with MPS. This could be related to differences in the MPS backend or a specific library version incompatibility.
To help us diagnose and address this issue, could you please:
Your cooperation is much appreciated! We'll look into this as soon as we have more information. Meanwhile, for further guidance, please refer to our documentation at https://docs.ultralytics.com/yolov5/.
Thank you for your contribution to improving YOLOv5! π
I have a more self-contained repro now, from a brand new AWS mac2.metal VM...
# Install miniforge
brew install wget
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh --no-check-certificate
chmod a+x Miniforge3-MacOSX-arm64.sh
./Miniforge3-MacOSX-arm64.sh
source ~/.zshrc
# Get the model weights, dataset file, and test image
mkdir ~/images
wget https://github.com/agentmorris/MegaDetector/releases/download/v5.0/md_v5a.0.0.pt -O ~/images/md_v5a.0.0.pt --no-check-certificate
wget http://dmorris.net/misc/tmp/m1-yolo-issue/n7_2019-03-19_07-25-00.JPG -O ~/images/n7_2019-03-19_07-25-00.JPG
wget http://dmorris.net/misc/tmp/m1-yolo-issue/dataset.yaml -O ~/images/dataset.yaml
# Check out both YOLOv5 versions ("new" and "old") to separate folders
git clone https://github.com/ultralytics/yolov5 yolov5-new
git clone https://github.com/ultralytics/yolov5 yolov5-old
cd yolov5-old
git checkout c23a441c9df7ca9b1f275e8c8719c949269160d1
# Create Python environments
mamba create -n yolov5-new python=3.11 pip -y
cd ~/yolov5-new && mamba activate yolov5-new
pip install -r requirements.txt
mamba create -n yolov5-old python=3.8 pip -y
cd ~/yolov5-old && mamba activate yolov5-old
pip install -r requirements.txt
# The old YOLOv5 requirements.txt file specified numpy>=1.18.5, which mamba
# satisfies with 1.24.4 as of 2024.01.21. This results in "AttributeError: module 'numpy'
# has no attribute 'int'. ". So we roll back numpy to 1.21.4, which is still compatible with
# the requirements.txt file.
pip uninstall -y numpy && pip install numpy==1.21.4
# Test
cd ~/yolov5-new && mamba activate yolov5-new
python val.py --task test --data "/Users/ec2-user/images/dataset.yaml" --weights "/Users/ec2-user/images/md_v5a.0.0.pt" --batch-size 1 --imgsz 1280 --conf-thres 0.001 --device "mps" --save-json --project "/Users/ec2-user/yolo-results/yolo-new" --name "yolo_results" --exist-ok --save-txt --save-conf
cat ~/yolo-results/yolo-new/yolo_results/md_v5a.0.0_predictions.json
# [{"image_id": "n7_2019-03-19_07-25-00", "category_id": 0, "bbox": [1252.152, 994.286, -745.002, 257.866], "score": 0.96902}]
cd ~/yolov5-old && mamba activate yolov5-old
python val.py --task test --data "/Users/ec2-user/images/dataset.yaml" --weights "/Users/ec2-user/images/md_v5a.0.0.pt" --batch-size 1 --imgsz 1280 --conf-thres 0.001 --device "mps" --save-json --project "/Users/ec2-user/yolo-results/yolo-old" --name "yolo_results" --exist-ok --save-txt --save-conf
cat ~/yolo-results/yolo-old/yolo_results/md_v5a.0.0_predictions.json
# [{"image_id": "n7_2019-03-19_07-25-00", "category_id": 0, "bbox": [135.414, 994.286, 371.736, 257.866], "score": 0.96902}]
@agentmorris, fantastic work on creating a self-contained reproducible example! This will greatly assist in debugging the issue. The negative-width bounding box in the new environment versus the correct output in the old environment suggests a regression or incompatibility introduced in the newer software stack.
Given the detailed steps you've provided, we will:
Your thorough testing and reporting are invaluable to the YOLOv5 community and the Ultralytics team. We'll update you as soon as we have more insights or require further information.
Thank you for your dedication to improving YOLOv5! π
π Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO π and Vision AI β
Hopefully the GHA bot isn't going to automatically close this issue? This seems like a fairly severe issue; the negative boxes are just the manifestation that's easy to detect, the underlying issue is better described as "large discrepancies between M1 results and other results".
If I can ignore the GHA bot, you can ignore this comment. :)
@agentmorris, rest assured, we'll ensure this issue remains open and actively investigated given its significance. The discrepancies you've highlighted, especially with the MPS backend on M1 hardware, are indeed critical to address for ensuring consistent and reliable model performance across different platforms.
Your findings and the effort you've put into documenting this issue are greatly appreciated. We'll prioritize looking into this and keep you updated on our progress. Please feel free to add any further observations or data you may gather as we work towards a resolution.
Thank you for your patience and for contributing to the robustness of YOLOv5! π οΈ
π Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO π and Vision AI β
@glenn-jocher Were you able to assess the scope of this issue before closing? The negative-width bounding boxes were just the symptom that let us find this issue; the fact that results are incorrect on M1 HW at all seems like a possibly-big deal, unless there's something specific about this repro that limits the scope. Any ideas?
@agentmorris, absolutely, your concern is valid and recognized. I've reviewed the scope, and indeed, the issue extends beyond just negative-width bounding boxesβhighlighting discrepancies in results on M1 hardware is critical. We're diving deeper to understand the root cause and its implications. Once we have more clarity on the specific conditions or factors contributing to this issue, we'll update. Your insight has been invaluable in unveiling this; rest assured, we're on it! π
Thanks. The github-actions bot tricked me again. :)
@agentmorris, haha, those bots can be quite sneaky! π If there's anything more you need help with or any more insights you gather, feel free to share. We're all ears and here to support. Happy coding!
Search before asking
YOLOv5 Component
Validation
Bug
(At Glenn's suggestion, transferring from issue 12645, which I originally filed as a question.)
I am running a trained YOLOv5x6 model using val.py with the --save_json option. I have a few images where the resulting .json file includes one or more boxes with negative width values (not negative x or y values, which seem normal and are discussed in other issues, but negative width values), but I have only observed this behavior when running on M1 hardware (with --device mps). This issue occurs in the current YOLOv5 Python environment on M1 HW, but does not occur with at least one older YOLOv5 environment on M1 HW, and AFAIK does not occur with any YOLOv5 environment on CUDA/x86 HW.
Environment
Hardware/OS environment
Python environment
Both environments were created via Miniforge.
Minimal Reproducible Example
I am able to share images that reproduce this behavior now, and I also have some new data that might get us closer to a root cause:
The command I am running on the M1 VM is:
python val.py --task test --data "/Users/ec2-user/images/dataset.yaml" --weights "/Users/ec2-user/md_v5a.0.0.pt" --batch-size 1 --imgsz 1280 --conf-thres 0.001 --device "mps" --save-json --project "/Users/ec2-user/yolo-results/yolo-new-no-aug" --name "yolo_results" --exist-ok --save-txt --save-conf
Additional
No response
Are you willing to submit a PR?