hollance / YOLO-CoreML-MPSNNGraph

Tiny YOLO for iOS implemented using CoreML but also using the new MPS graph API.
MIT License
935 stars 252 forks source link

scaleFit instead of scaleFill #58

Open pannaf opened 5 years ago

pannaf commented 5 years ago

Thank you for the great codebase! I was able to get a PyTorch-trained version of Yolov3 (with spatial pyramid pooling) to run on an ipad pro at 26fps thanks to a lot of tips from purchasing/reading your ebook. Many, many thanks for that!

I don't have an iOS/swift development background, and I wonder if you could point me in the right direction for modifying something with the code. My original model was trained with scaleFit, where the long side is set to 416 pixels, and the short side is padded. I saw in your book the recommendation to change the image scaling to match training. To that end, I found in ViewController.swift this relevant block:

// NOTE: If you choose another crop/scale option, then you must also
// change how the BoundingBox objects get scaled when they are drawn.
// Currently they assume the full input image is used.
request.imageCropAndScaleOption = .scaleFill

I want to change this to .scaleFit, but I don't understand how to make the change for the BoundingBox scaling when drawn, as suggested by the comment you included. Could you possibly point me in the right direction?

hollance commented 5 years ago

I think Vision pads the bottom / right side of the image when using scaleFit. So you'll need to figure out by how much the image gets padded. If the input is 1280x720, for example, then resized it will be 416x234 with 416-234=182 pixels of padding added at the bottom to make it 416x416.

The bounding box predictions will be in that 416x416 coordinate space, and only use the 416x234 region at the top. It's probably easiest if you stretch the bounding box to fit the entire 416x416 region and then use the same code as in the example app to convert back to screen coordinates.

In other words, multiply the y-coordinate and the height of the bounding box by 416/234.