hollance / YOLO-CoreML-MPSNNGraph

Tiny YOLO for iOS implemented using CoreML but also using the new MPS graph API.
MIT License
929 stars 251 forks source link

Using the front camera for detection causing boxes to be mirrored #25

Closed vvkv closed 6 years ago

vvkv commented 6 years ago

Thank you for the great work here. I am constantly trying to experiment with variations of your work for my own purposes and for one of my applications I am trying to use the object dectection through my front camera (on my iPad) so the I can see what being detected while moving in front my my screen. I have made the following change in VideoCapture.swift to capture the video stream through the front camera:

Original code chunk:

 guard let captureDevice = AVCaptureDevice.default(for: AVMediaType.video) else {
      print("Error: no video devices available")
      return false
    }

My changed chunk:

    guard let captureDevice = AVCaptureDevice.default(.builtInWideAngleCamera, for: AVMediaType.video, position: .front) else {
        print ("Error: no video devices available")
        return false
    }

All works well and good except that the bounding boxes are mirrored, meaning that if I am holding an apple in my left hand and a banana in my right hand, the boxes are swapped around these objects. I understand that this might have to do something with the iPhones front camera mirroring the video display but I am not sure how to tackle this issue. Any guidance will be very useful. Thank you

hollance commented 6 years ago

I'm not sure where this goes wrong or why but an easy fix is to flip the coordinates of the bounding boxes too: box.x = screen width - box.x.

vvkv commented 6 years ago

That sounds like a neat solution, it would be great if you could advice on where this code modification is to be made? I went through most of the .swift files but wasn't able to find a "box.x" variable. Thank you

hollance commented 6 years ago

On this line, https://github.com/hollance/YOLO-CoreML-MPSNNGraph/blob/master/TinyYOLO-CoreML/TinyYOLO-CoreML/YOLO.swift#L94

change it to:

let x = blockSize - ((Float(cx) + sigmoid(tx)) * blockSize)

This happens because the image from the front camera is mirrored. There are other solutions to this, but mirroring the x coordinate of the bounding box is probably easiest (and fastest).

mkisantal commented 4 years ago

I think you need to subtract from the width (416), not from block size in order to mirror. This line did it for me:

let x = 416 - ((Float(cx) + sigmoid(tx)) * blockSize)