microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.66k stars 2.93k forks source link

[Mobile] iOS yolov8n-pose model throws 'onnxruntime Code=2 "Invalid Feed Input Name:input"' #19776

Open pbanavara opened 8 months ago

pbanavara commented 8 months ago

Describe the issue

Trying out the yolov8 pose models for iOS using ONNX. I have exported the model to ONNX using the following code.

from ultralytics import YOLO

# Load a model
for version in ['n']:
    model = YOLO(f'yolov8{version}-pose.pt')  # load an official model
    # Export the model
    model.export(format='onnx', nms=True)

I am trying to run this model on an image in iOS. Here is the crux of the code for running ONNX on the image. I resized the image to 640*640

func convertImageToTensor() -> NSMutableData{

    let image = UIImage(named: "IMG_0688")?.resized(to: CGSize(width: 640.0, height: 640.0))
    let data = image?.pngData()
    let count = data?.count
    let rawData = NSMutableData(data: data!)
    print(image?.size)
    do {
        guard let modelPath = Bundle.main.path(forResource: "yolov8n-pose", ofType: "onnx") else {
            fatalError("Model file not found")
        }

        let ortEnv = try ORTEnv(loggingLevel: ORTLoggingLevel.error)
        let ortSession = try ORTSession(env: ortEnv, modelPath: modelPath, sessionOptions: nil)
        let input = try ORTValue(tensorData:rawData, elementType:ORTTensorElementDataType.float, shape:[])
        //try ortSession.run(withInputs: ["input": tensorValue], outputs: ["output": opTensorValue], runOptions:  ORTRunOptions())
        let outputs = try ortSession.run(
                        withInputs: ["input": input],
                        outputNames: ["output"],
                        runOptions: nil)
        guard let output = outputs["output"] else {
            fatalError("Failed to get model output from inference.")
        }

        return try output.tensorData()
    } catch {
        print(error)
        fatalError("Error in running the ONNX model")
    }
}

I get the following error at the ortSession.run line

Error Domain=onnxruntime Code=2 "Invalid Feed Input Name:input" UserInfo={NSLocalizedDescription=Invalid Feed Input Name:input} No idea what this means. Can anyone please help ? I tried renaming the string "input" to "image" etc but get the same error.

Environment Xcode 15.2 MBPro M3 Max chip iPhone 15 simulator min iOS version 17.2

To reproduce

Clone the repo, set your developer acccount info and run

Urgency

No response

Platform

iOS

OS Version

17.2

ONNX Runtime Installation

Released Package

Compiler Version (if 'Built from Source')

No response

Package Name (if 'Released Package')

onnxruntime-mobile-objc/onnxruntime-mobile-c

ONNX Runtime Version or Commit ID

1.16.0

ONNX Runtime API

Objective-C/Swift

Architecture

ARM64

Execution Provider

CoreML

Execution Provider Library Version

No response

skottmckay commented 8 months ago

Open the model in Netron and see what the input name/s are. Every model is different and inputs are matched based on name not order (i.e. you must know the exact name).

skottmckay commented 8 months ago

Most likely you'll need to use onnxruntime-c and onnxruntime-objc packages as those contain a full build of ONNX Runtime (supports most recent ONNX opsets and all operators).

You may also want to consider adding pre/post processing to the model. There's an end-to-end tutorial for yolov8-pose here: https://onnxruntime.ai/docs/tutorials/mobile/pose-detection.html.

Simply resizing the image isn't sufficient to provide input to the ONNX model. You need to convert from the original image format to RGB, the channels need to be first, data needs to be converted to float and normalized, and a batch dimension needs to be added so the input is 4D. A 640x640 image would become a 4D input of float data with shape {1, 3, 640, 640}, 3 being the channels in RBG order (not BGR which is the default ordering some image conversion produces).

pbanavara commented 8 months ago

@skottmckay Thank you. The name should be images as per the visualization i n Netron.
Now am working on changing the dimensions of the image.

I did see the tutorial. In fact I just copied the Android steps to iOS :). May be I missed something. Perhaps the rawImageBytes is already transformed to a 4D input here.

val shape = longArrayOf(rawImageBytes.size.toLong())
pbanavara commented 8 months ago

Converting an image to the required rank is turning out to be more complicated than I expected.

The data structure in iOS that supports this ranked structure somewhat like numpy is MLMultiArray and to convert the image data to MLMultiArray involves using pointers that I am not familiar with.

Even if I somehow managed to convert the image to MLMultiArray the ONNX API expects an NSMutableData. Absolutely no idea how to convert the MLMultiArray to NSMutableData and SO or the general web is of no help.

So I tried the onnx model that had pre and post processing already included and included the input and output names as per the Netron visualization.

However I get this message at the load model line

        let ortSession = try ORTSession(env: ortEnv, modelPath: modelPath, sessionOptions: nil)
/Users/pbanavara/Library/Developer/CoreSimulator/Devices/955A6321-4431-448B-9405-3B46B1EC4440/data/Containers/Bundle/Application/D9B01188-77C5-4DD4-8346-F015E1DF83C4/onnx.app/yolov8n-pose-pre.onnx failed:Fatal error: com.microsoft.extensions:DecodeImage(-1) is not a registered function/op" UserInfo={NSLocalizedDescription=Load model from

EDIT: I installed the onnxruntime-extensions-c pod and built the pre-post-process onnx file as per this link

Still same error.

Greatly appreciate any help in resolving this.

edgchen1 commented 8 months ago

To use onnxruntime-extensions-c, you'll need to register its custom ops with the session options object. Here's an example: https://github.com/microsoft/onnxruntime-extensions/blob/62bbcb59a22fdf45b40d45d3245224684c6a8cba/test/ios/OrtExtensionsUsage/OrtClient/OrtSwiftClient.swift#L16-L24

As for going from MLMultiArray to NSMutableData, you can get a void* from MLMultiArray and pass it to one of the NSMutableData initializers. E.g.: From MLMultiArray: https://developer.apple.com/documentation/coreml/mlmultiarray/3929555-getmutablebyteswithhandler?language=objc

To NSMutableData: https://developer.apple.com/documentation/foundation/nsdata/1547231-datawithbytes?language=objc

pbanavara commented 8 months ago

@edgchen1 Thank you. Registering the custom ops works. Appreciate the help.

pbanavara commented 8 months ago

Another questions if someone can help out or give some pointers. I have used the preandpostprocessing pose model as per this link

I have fed the raw image to this model as per the above Swift code. The raw image size is Optional(4284.0) Optional(5712.0)

The Output tensor data as an array gives this

[0, 53, 17, 69, 81, 117, 93, 69, 224, 188, 38, 69, 76, 188, 87, 69, 118, 114, 90, 63, 0, 0, 0, 0, 163, 142, 188, 68, 167, 106, 45, 69, 18, 103, 50, 63, 176, 93, 202, 68, 105, 65, 38, 69, 196, 0, 9, 63, 30, 106, 179, 68, 137, 126, 41, 69, 108, 43, 7, 63, 162, 97, 231, 68, 90, 176, 20, 69, 8, 101, 121, 62, 41, 41, 168, 68, 42, 235, 26, 69, 248, 202, 195, 61, 106, 69, 16, 69, 117, 230, 4, 69, 64, 66, 64, 63, 164, 62, 171, 68, 26, 119, 24, 69, 82, 242, 82, 63, 96, 103, 79, 69, 140, 161, 16, 69, 4, 173, 65, 63, 201, 24, 185, 68, 17, 166, 40, 69, 88, 134, 105, 63, 47, 8, 78, 69, 100, 6, 67, 69, 224, 85, 76, 63, 69, 163, 182, 68, 223, 123, 66, 69, 244, 2, 101, 63, 141, 115, 31, 69, 139, 233, 49, 69, 8, 151, 126, 63, 218, 239, 234, 68, 122, 138, 61, 69, 158, 24, 127, 63, 253, 94, 44, 69, 70, 94, 78, 69, 220, 84, 126, 63, 19, 156, 176, 68, 57, 253, 117, 69, 231, 41, 127, 63, 79, 33, 62, 69, 48, 148, 99, 69, 223, 0, 124, 63, 101, 110, 192, 68, 164, 212, 145, 69, 132, 44, 125, 63]

Based on this line in yolov8_pose_e2e.py

 (box, score, _, keypoints) = np.split(result, (4, 5, 6))

The first 4 indices are that of the bounding box, index 5 score, index 6 - class and indices 6:end are keypoints

The shape of the output tensor is (1, 57) => 51 entries for the keypoints, (x, y, confidence) 4 entreies for bounding box, one for score and one for class, which is perfect

However the length of the data array above is 228 228 - 6 = 222 Don't know how to interpret this 222 length. It should be 51 for the 17 keypoints. Can someone please explain ?

Next am not sure if these are scaled values because if I just plot these points on the image, the points are way off. Like the first 4 points for the bbox.

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.