Exported CoreML Model with Different Results

hitharr commented 1 month ago

Search before asking

[X] I have searched the YOLOv8 issues and found no similar bug report.

YOLOv8 Component

Predict, Export

Bug

After running the export to coreml command, the results are different.

yolo export model=multi_detection_model.pt format=coreml nms=True half=False int8=False imgsz=640

For example in the results below, the bag is not detected using the CoreML model. In both cases the predict is called with iou=0.7 and conf=0.5.

Results from pytorch model:

Results from the CoreML model:

Or in this example, there are extra detections with the CoreML model. For example there are 2 boxes detected for bottom when the pytorch model returns 1 Results from pytorch model:

Results from CoreML Model:

Environment

Ultralytics YOLOv8.2.28 🚀 Python-3.9.19 torch-2.3.0 CPU (Apple M1 Pro) Setup complete ✅ (10 CPUs, 32.0 GB RAM, 299.4/460.4 GB disk)

OS macOS-14.0-arm64-arm-64bit Environment Darwin Python 3.9.19 Install pip RAM 32.00 GB CPU Apple M1 Pro CUDA None

matplotlib ✅ 3.9.0>=3.3.0 opencv-python ✅ 4.10.0.82>=4.6.0 pillow ✅ 10.3.0>=7.1.2 pyyaml ✅ 6.0.1>=5.3.1 requests ✅ 2.32.3>=2.23.0 scipy ✅ 1.13.1>=1.4.1 torch ✅ 2.3.0>=1.8.0 torchvision ✅ 0.18.0>=0.9.0 tqdm ✅ 4.66.4>=4.64.0 psutil ✅ 5.9.8 py-cpuinfo ✅ 9.0.0 pandas ✅ 2.2.2>=1.1.4 seaborn ✅ 0.13.2>=0.11.0 ultralytics-thop ✅ 0.2.7>=0.2.5

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

github-actions[bot] commented 1 month ago

👋 Hello @hitharr, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 1 month ago

Hello,

Thank you for providing a detailed report on the discrepancies you're experiencing with the CoreML exported model. Differences in detection results between the original PyTorch model and the CoreML model can sometimes occur due to the specifics of how each framework handles operations internally, especially with quantization and optimization settings.

Here are a few steps you can try to align the results more closely:

Check NMS Settings: Ensure that the Non-Maximum Suppression settings are consistent between the two models. Differences in NMS can lead to variations in detected objects.
Re-export with Different Settings: Try exporting the model without any quantization (half=False, int8=False) and without enabling NMS in the export command to see if that affects the output consistency.
Model Validation: Validate the model directly after export using the same images and compare the outputs. This can help identify if the issue arises during the export process or later during the prediction.
Update Libraries: Ensure all related libraries are up to date, as sometimes discrepancies are due to bugs that have been fixed in newer versions.

If the issue persists, please provide a few more examples of the discrepancies, and we can look deeper into the specific causes. Your detailed feedback helps improve YOLOv8, and we appreciate your contribution to enhancing the model's reliability and performance.

hitharr commented 1 month ago

I tried with agnostic_nms set to both True and False and were seeing the same results.
When I export without nms yolo export model=multi_detection_model.pt format=coreml nms=False half=False int8=False imgsz=640 and try to import the model into XCode and test images with the model I no longer have the option to preview and the metadata looks different.

Example with nms=True:

Example with nms=False:

Models were exported and tested immediately with the same image and iou/conf thresholds
Models were exported with the newest libraries as of yesterday with the same result

A couple more examples: Example 1: There is an extra detection around the bracelet in the image Pytorch results:

CoreML results:

Example 2: There is an extra detection on her fingers in the CoreML results Pytorch results:

CoreML results:

glenn-jocher commented 1 month ago

Hello,

Thank you for the detailed follow-up and for testing the different configurations. It seems like the discrepancies persist regardless of the NMS settings and the export parameters.

Given that the issue remains consistent across different settings and the model metadata changes when NMS is disabled, it might be related to how CoreML handles the post-processing steps internally. Here are a few suggestions:

Post-Processing in Code: Since disabling NMS in the export changes the model behavior in XCode, consider implementing NMS manually in your application code after the model inference. This might provide more control over the final output.
Model Inspection: Use tools like Netron to inspect the exported CoreML model and verify that all layers and operations are as expected. This can sometimes reveal discrepancies in the conversion process.
Test with Standard Images: If possible, test the model with standard benchmark images to see if the issue is image-specific or a general characteristic of the model.
Feedback Loop: Please continue to share your findings. Your detailed examples are very helpful for diagnosing the issue. If the problem persists, we might need to look into a more specific adjustment in the model export process.

We appreciate your efforts in refining the model's performance and are here to assist you through this process. 🛠️

hitharr commented 4 weeks ago

Do you have any resources on implementing NMS manually? Would that be something that we could easily validate or preview results in with XCode or another tool to validate that the results look like expected?

Another note. I even tried comparing the outputs of the pretrained pytorch yolo model (yolov8s) and the converted coreml one and the results did not align. Examples:

glenn-jocher commented 4 weeks ago

@hitharr hello!

Thank you for your patience and for providing additional details. Implementing Non-Maximum Suppression (NMS) manually is indeed a viable approach to ensure consistency between the PyTorch and CoreML models. Here’s a step-by-step guide to help you implement NMS manually and validate the results:

Implementing NMS Manually

You can implement NMS in Python and then apply it to the raw outputs of the CoreML model. Here’s a simple example using PyTorch:

import torch

def non_max_suppression(prediction, conf_thres=0.5, iou_thres=0.5):
    # Filter out low confidence detections
    mask = prediction[..., 4] > conf_thres
    prediction = prediction[mask]

    # Sort by confidence
    scores = prediction[..., 4]
    boxes = prediction[..., :4]
    indices = torch.argsort(scores, descending=True)
    boxes = boxes[indices]

    keep_boxes = []
    while boxes.size(0):
        box = boxes[0]
        keep_boxes.append(box)
        if boxes.size(0) == 1:
            break
        ious = box_iou(box.unsqueeze(0), boxes[1:])
        boxes = boxes[1:][ious < iou_thres]

    return torch.stack(keep_boxes)

def box_iou(box1, box2):
    # Calculate intersection areas
    inter = (torch.min(box1[..., 2:], box2[..., 2:]) - torch.max(box1[..., :2], box2[..., :2])).clamp(0).prod(2)
    # Calculate union areas
    union = (box1[..., 2:] - box1[..., :2]).prod(2) + (box2[..., 2:] - box2[..., :2]).prod(2) - inter
    return inter / union

# Example usage
# Assuming `predictions` is the raw output from the CoreML model
predictions = torch.tensor([[x1, y1, x2, y2, conf, class_id], ...])
nms_predictions = non_max_suppression(predictions)

Validating Results in XCode

To validate the results in XCode, you can follow these steps:

Export Raw Outputs: Ensure your CoreML model outputs raw bounding box predictions without applying NMS.
Post-Processing in Swift: Implement the NMS logic in Swift. Here’s a simplified example:

import CoreML
import Vision

func nonMaxSuppression(boxes: [CGRect], scores: [Float], iouThreshold: Float) -> [CGRect] {
    var keep = [Bool](repeating: true, count: boxes.count)
    for i in 0..<boxes.count {
        if !keep[i] { continue }
        for j in i+1..<boxes.count {
            if iou(box1: boxes[i], box2: boxes[j]) > iouThreshold {
                keep[j] = false
            }
        }
    }
    return zip(boxes, keep).filter { $0.1 }.map { $0.0 }
}

func iou(box1: CGRect, box2: CGRect) -> Float {
    let intersection = box1.intersection(box2).size
    let intersectionArea = intersection.width * intersection.height
    let unionArea = box1.width * box1.height + box2.width * box2.height - intersectionArea
    return Float(intersectionArea / unionArea)
}

Preview Results: Use XCode’s preview feature to visualize the results and ensure they align with your expectations.

Comparing Outputs

It’s great that you’re comparing the outputs of the pretrained PyTorch YOLO model and the converted CoreML one. This comparison can help identify where discrepancies might be occurring. If the results still do not align, it might be worth checking the following:

Model Export Parameters: Ensure all export parameters are consistent.
Input Preprocessing: Verify that the input preprocessing steps (e.g., resizing, normalization) are identical for both models.
Post-Processing: Ensure that the post-processing steps (e.g., NMS) are applied consistently.

Feel free to reach out if you have any more questions or need further assistance. We’re here to help! 😊

ultralytics / ultralytics