Is there anyway to convert a videoCapturer to bitmap type, Im trying to use AI detection on the frame

sceddd commented 7 months ago

I working with WebRTC try to stream kind of detection model to other users in the room. The key is to convert the frame to a bitmap.

serhiynovos commented 7 months ago

Hi. @sceddd Yes. It's possible. You can create your VideoSink and onFrame you will get VideoFrame

Here is some code which you can use to convert it to bitmap

onFrame(videoFrame: VideoFrame) {
    val bitmap = videoFrame.buffer.toI420().toBitmap()
}

fun VideoFrame.I420Buffer.toBitmap(): Bitmap? {
    return kotlin.runCatching {
        val i420Buffer = this

        // Create a YuvImage from the I420Buffer
        val data =
            ByteArray(i420Buffer.dataY.limit() + i420Buffer.dataU.limit() + i420Buffer.dataV.limit())
        val dataY = i420Buffer.dataY
        val dataU = i420Buffer.dataU
        val dataV = i420Buffer.dataV
        dataY[data, 0, dataY.limit()]
        dataU[data, dataY.limit(), dataU.limit()]
        dataV[data, dataY.limit() + dataU.limit(), dataV.limit()]
        val yuvImage =
            YuvImage(
                data,
                ImageFormat.NV21,
                i420Buffer.width,
                i420Buffer.height,
                null
            )

        val outputStream = ByteArrayOutputStream()
        yuvImage.compressToJpeg(
            Rect(0, 0, i420Buffer.width, i420Buffer.height),
            100,
            outputStream
        )
        val jpegData = outputStream.toByteArray()
        val bitmap = BitmapFactory.decodeByteArray(jpegData, 0, jpegData.size)

        i420Buffer.release()

        bitmap
    }.getOrNull()
}

but please note when you add sink to VideoTrack, video resolution may change even for local video track. To fix this you can use VideoProcessor and intercept video frames in the resolution as Capturer is configured

nguyenthekhoig7 commented 7 months ago

My solution

Hi, I am doing a similar task, I did converted the VideoFrame to bit map using YuvFrame, I guess it is a bit simpler than @serhiynovos 's answer, I know it succeed bc I can see the detection bbox drawn onto the bitmap. Here is the way I used, convert inputVideoFrame to inputFrameBitmap:

val yuvFrame = YuvFrame(
    inputVideoFrame,
    YuvFrame.PROCESSING_NONE,
    inputVideoFrame.timestampNs
)
val inputFrameBitmap = yuvFrame.bitmap

By the way

I am stuck at the second phase, when converting Bitmap back to Video Frame and put onto the Stream, I am deeply apprecieate if anyone can point out where I did wrong and give me some suggests. Here is my full code of processing a frame:

localVideoSource.setVideoProcessor(object : VideoProcessor {
            override fun onCapturerStarted(p0: Boolean) {
            }
            override fun onCapturerStopped() {

            }
            override fun onFrameCaptured(inputVideoFrame: VideoFrame) {
                //Do processing with inputVideoFrame here
                Log.d("000000000000", "Starting convert VideoFrame -> bitmap")
                val frameTs = inputVideoFrame.timestampNs
                val yuvFrame = YuvFrame(
                    inputVideoFrame,
                    YuvFrame.PROCESSING_NONE,
                    inputVideoFrame.timestampNs
                )
                Log.d("000000000000", "  converted to bitmap")
                val inputFrameBitmap = yuvFrame.bitmap
                val bitmap = inputFrameBitmap
                val image = TensorImage.fromBitmap(bitmap)
                // Runs model inference and gets result.
                val outputs = model.process(image)
                Log.d("000000000000", "  detected by model, got outputs, start drawing...")
                val locations = outputs.locationsAsTensorBuffer.floatArray
                val scores = outputs.scoresAsTensorBuffer.floatArray
                val mutable = bitmap.copy(Bitmap.Config.ARGB_8888,true)
                val canvas = Canvas(mutable)
                val focalLength = (cameraManager?.getCameraCharacteristics("1")?.
                get(CameraCharacteristics.LENS_INFO_AVAILABLE_FOCAL_LENGTHS)
                    ?.get(0)
                    ?:0) as Float

//                val focalLength = (0.25).toFloat() // a sample number, should get from phone's system
                val h = mutable.height
                val w = mutable.width

                var x: Int
                paint.textSize = h/15f
                paint.strokeWidth = h/85f
                scores.forEachIndexed { index, fl ->
                    x = index
                    x *= 4
                    // annotate detection
                    if(fl > 0.5){
                        paint.color = colors[index]
                        paint.style = Paint.Style.STROKE
                        canvas.drawRect(RectF(locations[x + 1] *w, locations[x] *h, locations[x + 3] *w, locations[x + 2] *h), paint)
                        paint.style = Paint.Style.FILL
                        val objectHeight = locations[x+2]*h-locations[x]*h
                        val distance = distanceMeasurement(focalLength,objectHeight,h.toFloat(),peopleHeight,getSensorHeight())
                        val formattedDistance = String.format("%.${2}f", distance)
                        canvas.drawText("d:$formattedDistance", locations[x+1] *w, locations[x] *h, paint)
                    }
                }
                Log.d("000000000000", "  drawing done, mutable: $mutable")

                // #### (something might be wrong in the part below) ###############
                // Convert bitmap back to Video Frame

                val yuvConverter = YuvConverter()
                val textures = IntArray(1)
                val buffer = TextureBufferImpl(
                    width,
                    height,
                    VideoFrame.TextureBuffer.Type.RGB,
                    textures.get(0),
                    Matrix(),
                    surfaceTextureHelper?.handler,
                    yuvConverter,
                    null
                )
                surfaceTextureHelper?.handler?.post {
                    GLES20.glTexParameteri(
                        GLES20.GL_TEXTURE_2D,
                        GLES20.GL_TEXTURE_MIN_FILTER,
                        GLES20.GL_NEAREST
                    )
                    GLES20.glTexParameteri(
                        GLES20.GL_TEXTURE_2D,
                        GLES20.GL_TEXTURE_MAG_FILTER,
                        GLES20.GL_NEAREST
                    )
                    GLUtils.texImage2D(GLES20.GL_TEXTURE_2D, 0, mutable, 0)
                    val frameTime = System.nanoTime() - start
                    val i420Buf: I420Buffer = yuvConverter.convert(buffer)
                    val videoFrame = VideoFrame(i420Buf, 0, frameTime)
                    localVideoSource.capturerObserver.onFrameCaptured(videoFrame)
                }
                setSink(surfaceView)
            }
            override fun setSink(sink: VideoSink?) {
                Log.d(TAG, "setSink: ")
            }
        })

davidliu commented 7 months ago

For turning a bitmap back into a VideoFrame, you can reference BitmapFrameCapturer:

https://github.com/livekit/client-sdk-android/blob/main/livekit-android-sdk/src/main/java/io/livekit/android/room/track/video/BitmapFrameCapturer.kt

nguyenthekhoig7 commented 7 months ago

Thank you @davidliu, I have gone through BitmapFrameCapturer and seems it setup a independent VideoCapturer, then pushing bitmap to the surface. In my case, I tried pushing using the same surface and I encountered the double-producer problem, the exact error is SurfaceTexture-1-16170-0 connect: already connected (cur=4 req=1).

Since I am having a VideoProcessor processing each VideoFrame to bitmap, is there any way I can process each of the bitmap to VideoFrame inside the VideoProcessor, (so that I can send back the output to the VideoSink as VideoFrame)?

Additional question: In case that I want to setup a VideoCapturer using BitmapFrameCapturer, how can I setup for it to openCamera, convert the frames to bitmap (to detect and draw onto them using canvas), then convert back to frames, and push to the surface?

davidliu commented 7 months ago

You could just provide your own surface to create the frames from. I don't think you need to use the provided surface for this.

If you want to manually feed into BitmapFrameCapturer, you'd have to create a CameraCapturer (there's some utils at CameraCapturerUtils.createCameraCapturer) and do your processing and then feed it into the BitmapFrameCapturer. The alternative would be to handle the camera capturing yourself.

sceddd commented 6 months ago

Hello @davidliu, Can you give me an example of how to use it? After researched through those file I still can't get it.

wzJun1 commented 6 months ago

You should use VideoSink.onFrame()

safeer-ahmed commented 5 months ago

@serhiynovos VideoFrame.I420Buffer.toBitmap() is returning the bitmap in grey-scale. Is there a work-around to get the colored bitmap from the buffer?

livekit / client-sdk-android

Is there anyway to convert a videoCapturer to bitmap type, Im trying to use AI detection on the frame #341

My solution

By the way