hollance / YOLO-CoreML-MPSNNGraph

Tiny YOLO for iOS implemented using CoreML but also using the new MPS graph API.
MIT License
929 stars 251 forks source link

MLMultiArray datatype Flot32 issue #66

Open Techgps1 opened 3 years ago

Techgps1 commented 3 years ago

I've fetched the issue while model output type MLMultiArray Flot32 multidimensional array of floats.

Heres in my convert script for .h5 to ml model using coremltools 4.0

from tensorflow import keras 
import coremltools as ct

keras_model = keras.models.load_model('v1.h5')

image_input = ct.ImageType(shape=(1, 416, 416, 3,),
                           bias=[-1,-1,-1], scale=1/255)

model = ct.convert(
    keras_model, inputs=[image_input] )

model.save("v1.mlmodel")

Run the above script with python3 and here is my model generate with output MLMultiArray with datatype Flot32 multidimensional array of floats. https://prnt.sc/vg1dkl Here is my working model output MLMultiArray with datatype as 171 element vector of doubles. https://prnt.sc/vg1g7j

App crash while generating the bbx,bby,bbw,bbh I've created custom function for generating boxes here is the code.

  private func process(output out: MLMultiArray, name: String) throws -> [Prediction] {
    var predictions = [Prediction]()
    let grid = out.shape[out.shape.count-1].intValue
    let gridSize = YOLO.inputSize / Float(grid)
    let classesCount = labels.count
    let pointer = UnsafeMutablePointer<Double>(OpaquePointer(out.dataPointer))
    if out.strides.count < 3 {
      throw YOLOError.strideOutOfBounds
    }
    let channelStride = out.strides[out.strides.count-3].intValue
    let yStride = out.strides[out.strides.count-2].intValue
    let xStride = out.strides[out.strides.count-1].intValue
    func offset(ch: Int, x: Int, y: Int) -> Int {
      return ch * channelStride + y * yStride + x * xStride
    }
    for x in 0 ..< grid {
      for y in 0 ..< grid {
        for box_i in 0 ..< YOLO.boxesPerCell {
          let boxOffset = box_i * (classesCount + 5)
          let bbx = Float(pointer[offset(ch: boxOffset, x: x, y: y)])
          let bby = Float(pointer[offset(ch: boxOffset + 1, x: x, y: y)])
          let bbw = Float(pointer[offset(ch: boxOffset + 2, x: x, y: y)])
          let bbh = Float(pointer[offset(ch: boxOffset + 3, x: x, y: y)])
          let confidence = sigmoid(Float(pointer[offset(ch: boxOffset + 4, x: x, y: y)]))
          if confidence < confidenceThreshold {
            continue
          }
          let x_pos = (sigmoid(bbx) + Float(x)) * gridSize
          let y_pos = (sigmoid(bby) + Float(y)) * gridSize
          let width = exp(bbw) * self.anchors[name]![2 * box_i]
          let height = exp(bbh) * self.anchors[name]![2 * box_i + 1]
          for c in 0 ..< 52 {
            classes[c] = Float(pointer[offset(ch: boxOffset + 5 + c, x: x, y: y)])
          }
          softmax(&classes)
          let (detectedClass, bestClassScore) = argmax(classes)
          let confidenceInClass = bestClassScore * confidence
          if confidenceInClass < confidenceThreshold {
            continue
          }
          predictions.append(Prediction(classIndex: detectedClass,
                                  score: confidenceInClass,
                                  rect: CGRect(x: CGFloat(x_pos - width / 2),
                                               y: CGFloat(y_pos - height / 2),
                                               width: CGFloat(width),
                                               height: CGFloat(height))))
        }
      }
    }
    return predictions
  }

If model output MLMultiArray datatype is double 171 above code working well but issue in Float 32.

Any assistance would be appreciated

hollance commented 3 years ago

You should probably change this line,

let pointer = UnsafeMutablePointer<Double>(OpaquePointer(out.dataPointer))

to:

let pointer = UnsafeMutablePointer<Float>(OpaquePointer(out.dataPointer))
Techgps1 commented 3 years ago

hello @hollance Thank you for the replay, I've changed the pointer type Float instead of the Doublebut still crash at the below line.

          for c in 0 ..< 52 {
            classes[c] = Float(pointer[offset(ch: boxOffset + 5 + c, x: x, y: y)])
          }

I've noticed one thing is that my working model has the shape and strides 5 elements and the current model has the 4 elements does that effect?

here is the screenshot https://prnt.sc/vhcu8e

hollance commented 3 years ago

Naturally, if your model arranges the data differently, you'll also have to change how the code reads that data. ;-)

Techgps1 commented 3 years ago

Ok, thanks for understating but is there any way to change MLMultiArray datatype is double 171instead of the Flot32 at the convert time .h5 to mlmodelbecause I've coupe of old model which is currently working with the current YOLO output read architecture So I don't want to write multiple architectures.

Also, I've discussed with the ML developer, model data arrange the same as the old model so I thought something is wrong in my convert script but I'm not sure about that.

hollance commented 3 years ago

The 171 is because there are 3 boxes predicted per grid cell and there are 52 classes that you're predicting. So boxesPerCell has to be 3, did you change that accordingly?

Techgps1 commented 3 years ago

Yes, boxes per cell has the 3 but in Flot32 output like Float32 1 × 13 × 13 × 171 array and the Double type output Double 1 × 1 × 171 × 26 × 26 array that is the difference I'm having a hard time to calculate bbx, bby, bbw, bbh.

hollance commented 3 years ago

Is this a YOLOv3 model, @Techgps1?

Techgps1 commented 3 years ago

custom yolo-v3 tiny

hollance commented 3 years ago

YOLOv3 is slightly different from YOLOv2 in that you now have multiple output grids instead of just one. See here for the differences: https://github.com/Ma-Dan/YOLOv3-CoreML