hollance / Forge

A neural network toolkit for Metal
MIT License
1.27k stars 172 forks source link

implant Yolo to ARKit #30

Closed icefox57 closed 6 years ago

icefox57 commented 6 years ago

i tried use yolo in ARKit. i have implanted your code.

call predict in seesion delegate

func session(_ session: ARSession, didUpdate frame: ARFrame) {
        let seekingCM = CMTimeMakeWithSeconds(frame.timestamp, 1000000);
        let timestamp = seekingCM
        let deltaTime = timestamp - lastTimestamp
        if fps == -1 || deltaTime >= CMTimeMake(1, Int32(fps)) {
            lastTimestamp = timestamp

            if let texture = convertToMTLTexture(pixelBuffer:frame.capturedImage){
                predict(texture: texture)
            }

        }
    }

and convert texture with CVPixelBuffer instead SampleBuffer

func convertToMTLTexture(pixelBuffer: CVPixelBuffer?) -> MTLTexture? {
        if let textureCache = textureCache,
            let pixelBuffer = pixelBuffer{

            let width = CVPixelBufferGetWidth(pixelBuffer)
            let height = CVPixelBufferGetHeight(pixelBuffer)

            var texture: CVMetalTexture?
            CVMetalTextureCacheCreateTextureFromImage(kCFAllocatorDefault, textureCache,
                                                      pixelBuffer, nil, .bgra8Unorm, width, height, 0, &texture)
            if let texture = texture {
                return CVMetalTextureGetTexture(texture)
            }
        }
        return nil
    }

because arkit run camera with full screen ,and output 1280X720 so i changed height to 16/9

private func show(predictions: [YOLO.Prediction]) {
        DEBUGLOG(message: predictions.count)

        for i in 0..<boundingBoxes.count {
            if i < predictions.count {
                let prediction = predictions[i]

                // The predicted bounding box is in the coordinate space of the input
                // image, which is a square image of 416x416 pixels. We want to show it
                // on the video preview, which is as wide as the screen and has a 4:3
                // aspect ratio. The video preview also may be letterboxed at the top
                // and bottom.
                let width = view.bounds.width
                let height = width * 16 / 9
                let scaleX = width / CGFloat(YOLO.inputWidth)
                let scaleY = height / CGFloat(YOLO.inputHeight)
//                let top = (view.bounds.height - height) / 2

                // Translate and scale the rectangle to our own coordinate system.
                var rect = prediction.rect
                rect.origin.x *= scaleX
                rect.origin.y *= scaleY
//                rect.origin.y += top
                rect.size.width *= scaleX
                rect.size.height *= scaleY

                // Show the bounding box.
                let label = String(format: "%@ %.1f", labels[prediction.classIndex], prediction.score * 100)
                let color = colors[prediction.classIndex]
                boundingBoxes[i].show(frame: rect, label: label, color: color)

            } else {
                boundingBoxes[i].hide()
            }
        }
    }

it can run . But not effect right . same bottle ,it can recog in your demo . but can't in my.

where am i missing ? please help me

icefox57 commented 6 years ago

i find out why . ARKit output pixelbuffer with kCVPixelFormatType_420YpCbCr8BiPlanarFullRange and we need 32BGRA.

hollance commented 6 years ago

Ah yes, easiest way to handle this is to add a compute kernel that converts YUV (2 textures) to RGBA (1 texture).

icefox57 commented 6 years ago

@hollance thank u ,and how can i do that ? any tips may help me a lot .