ultralytics / yolo-ios-app

Ultralytics YOLO iOS App source code for running YOLOv8 in your own iOS apps 🌟
https://ultralytics.com/yolo
GNU Affero General Public License v3.0
117 stars 22 forks source link

Exporting Custom Trained Model #45

Open NeuralNoble opened 1 month ago

NeuralNoble commented 1 month ago

Hello,

First, thank you for this amazing repo! I have a question regarding the usage of custom trained models with CoreML INT8 export.

In the documentation, you mentioned:

Export CoreML INT8 models using the ultralytics Python package (with pip install ultralytics), or download them from our GitHub release assets. You should have 5 YOLOv8 models in total. Place these in the YOLO/Models directory as seen in the Xcode screenshot below.

from ultralytics import YOLO

# Loop through all YOLOv8 model sizes
for size in ("n", "s", "m", "l", "x"):
    # Load a YOLOv8 PyTorch model
    model = YOLO(f"yolov8{size}.pt")

    # Export the PyTorch model to CoreML INT8 format with NMS layers
    model.export(format="coreml", int8=True, nms=True, imgsz=[640, 384])

I have custom trained a YOLO model and would like to use it within the iOS app. Should I export my custom model using the same process mentioned above? Specifically, should I follow the same format and parameters (format="coreml", int8=True, nms=True, imgsz=[640, 384]) for exporting my custom model?

Additionally, if I want to use my custom model, do I need to train all 5 sizes (n, s, m, l, x) and export all of them, or can I just use a single custom trained model? If I need to train and export all 5 sizes, should I avoid using the provided code and handle the export process separately for each size?

Any guidance or additional steps required for custom trained models would be greatly appreciated.

Thank you!

glenn-jocher commented 1 month ago

@NeuralNoble thanks for asking!

The FastSAM_sInput class you've shown is indeed focused only on the image input, which is typical for many Core ML vision models. However, FastSAM's prompts (like bounding boxes or points) are typically used in post-processing, after the main model inference.

Here's a suggested approach to handle this:

  1. Run the FastSAM Core ML model on the input image.
  2. Implement the post-processing and prompting logic in Swift.

For the post-processing step, you'll need to implement the following:

  1. Decode the model output (probably segmentation masks).
  2. Apply the prompts (bounding boxes or points) to filter or select the appropriate segments.

Here's a rough outline of how you might structure this in your Swift code:

class FastSAMProcessor {
    let model: FastSAM_s

    init() throws {
        self.model = try FastSAM_s(configuration: MLModelConfiguration())
    }

    func process(image: CVPixelBuffer, prompt: FastSAMPrompt) throws -> [Mask] {
        // 1. Run the model
        let input = FastSAM_sInput(image: image)
        let output = try model.prediction(input: input)

        // 2. Decode the output
        let masks = decodeMasks(from: output)

        // 3. Apply the prompt
        let filteredMasks = applyPrompt(prompt, to: masks)

        return filteredMasks
    }

    private func decodeMasks(from output: FastSAM_sOutput) -> [Mask] {
        // Implement mask decoding logic here
    }

    private func applyPrompt(_ prompt: FastSAMPrompt, to masks: [Mask]) -> [Mask] {
        // Implement prompt application logic here
    }
}

enum FastSAMPrompt {
    case boundingBox(CGRect)
    case point(CGPoint)
    // Add other prompt types as needed
}

struct Mask {
    // Define your mask structure here
}

This approach allows you to use the Core ML model as-is, without modifying its inputs, and then apply the FastSAM-specific logic in Swift.

For the prompt application logic, you'll need to implement the algorithms described in the FastSAM paper or repository. This might involve operations like:

The exact implementation will depend on the specific output format of your Core ML model and the details of how FastSAM uses these prompts.