Object Detection MLModel for iOS with output configuration of confidence scores & coordinates for the bounding box.

ajurav1 commented 4 years ago

I have exported the mlmodel from export.py, and the model exported have a type of Neural Network and output configuration is [<VNCoreMLFeatureValueObservation: 0x281d80240> 4702BA0E-857D-4CE6-88C1-4E47186E751F requestRevision=1 confidence=1.000000 "2308" - "MultiArray : Float32 1 x 3 x 20 x 20 x 85 array" (1.000000), <VNCoreMLFeatureValueObservation: 0x281d802d0> AE0A0580-7DA2-4991-98BB-CD26EE257C7A requestRevision=1 confidence=1.000000 "2327" - "MultiArray : Float32 1 x 3 x 40 x 40 x 85 array" (1.000000), <VNCoreMLFeatureValueObservation: 0x281d80330> 0253FD2B-10B0-4047-A001-624D1864D27C requestRevision=1 confidence=1.000000 "2346" - "MultiArray : Float32 1 x 3 x 80 x 80 x 85 array" (1.000000)]

I was looking for output type VNRecognizedObjectObservation in yoloV5 instead of VNCoreMLFeatureValueObservation.

So, my question is what information does this VNCoreMLFeatureValueObservation MultiArray hold (is it something like a UIImage or CGRect?, or something different?) and how can I convert this Multidimensional Array into a useful set of data that I can actually see confidence scores & coordinates?

github-actions[bot] commented 4 years ago

Hello @ajurav1, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

dlawrences commented 4 years ago

Hi @ajurav1

Those tensors store network predictions that are not decoded. There are a few recommendations I made in #343 to decode these and there is even a NumPy sample code that does this for the ONNX model.

There's also guidance here: https://docs.ultralytics.com/yolov5/tutorials/model_export

Take a look in there and let me know if you require further support.

github-actions[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ShreshthSaxena commented 4 years ago

Hi @dlawrences , I'm in the same boat trying to figure out how to convert these raw predictions to bbox coordinates. Is there a swift implementation available to interpret these outputs or someway I can add the post-processing to a CoreML pipeline and get the final outputs in Swift?

dlawrences commented 4 years ago

@ShreshthSaxena , there is a lot of useful information which you will be able to use on this blog: https://machinethink.net/blog/

I recommend acquiring the book as well.

maidmehic commented 3 years ago

Hi @ajurav1 and @ShreshthSaxena, have you managed to convert VNCoreMLFeatureValueObservation MultiArrays to some useful info on Swift side?

kir486680 commented 3 years ago

@dlawrences In the article, the author says that the "Ultralytics YOLOv5 model has a Core ML version but it requires changes before you can use it with Vision." Did you manage to make it work with Vision?

dlawrences commented 3 years ago

@kir486680 the CoreML model, as exported from this repository, presents the final feature maps, thus the predictions are not decoded, nor any NMS is applied.

I have been able to create the required steps locally, yes, and it works directly with Vision.

kir486680 commented 3 years ago

@dlawrences could you please share at least some clues pls? I implemented NMS but I do not know what to do with the output from the model. I seen some implementations here . Do you have something similar?

wmcnally commented 3 years ago

@kir486680 @dlawrences any update on this?

Workshed commented 1 year ago

If you come across this issue, there's a script here for creating a CoreML model which outputs the expected values https://github.com/Workshed/yolov5-coreml-export

glenn-jocher commented 1 year ago

@Workshed thanks for sharing your script for creating a CoreML model that outputs the expected values with YOLOv5. This will be helpful for others who are looking to work with YOLOv5 in CoreML. Keep up the great work!

ultralytics / yolov5

Object Detection MLModel for iOS with output configuration of confidence scores & coordinates for the bounding box. #535