Tiny YOLO for iOS implemented using CoreML but also using the new MPS graph API.
Using the front camera for detection causing boxes to be mirrored #25

vvkv commented 6 years ago

Thank you for the great work here. I am constantly trying to experiment with variations of your work for my own purposes and for one of my applications I am trying to use the object dectection through my front camera (on my iPad) so the I can see what being detected while moving in front my my screen. I have made the following change in VideoCapture.swift to capture the video stream through the front camera:

Original code chunk:

 guard let captureDevice = AVCaptureDevice.default(for: AVMediaType.video) else {
      print("Error: no video devices available")
      return false

My changed chunk:

    guard let captureDevice = AVCaptureDevice.default(.builtInWideAngleCamera, for: AVMediaType.video, position: .front) else {
        print ("Error: no video devices available")
        return false

All works well and good except that the bounding boxes are mirrored, meaning that if I am holding an apple in my left hand and a banana in my right hand, the boxes are swapped around these objects. I understand that this might have to do something with the iPhones front camera mirroring the video display but I am not sure how to tackle this issue. Any guidance will be very useful. Thank you

hollance commented 6 years ago

I'm not sure where this goes wrong or why but an easy fix is to flip the coordinates of the bounding boxes too: box.x = screen width - box.x.

vvkv commented 6 years ago

That sounds like a neat solution, it would be great if you could advice on where this code modification is to be made? I went through most of the .swift files but wasn't able to find a "box.x" variable. Thank you

hollance commented 6 years ago

On this line, https://github.com/hollance/YOLO-CoreML-MPSNNGraph/blob/master/TinyYOLO-CoreML/TinyYOLO-CoreML/YOLO.swift#L94

change it to:

let x = blockSize - ((Float(cx) + sigmoid(tx)) * blockSize)

This happens because the image from the front camera is mirrored. There are other solutions to this, but mirroring the x coordinate of the bounding box is probably easiest (and fastest).

mkisantal commented 4 years ago

I think you need to subtract from the width (416), not from block size in order to mirror. This line did it for me:

let x = 416 - ((Float(cx) + sigmoid(tx)) * blockSize)