ismaelsousa / vision-camera-ocr

VisionCamera Frame Processor Plugin to detect text in real time using MLKit Text Detector (OCR)
MIT License
27 stars 32 forks source link

[QUESTION] Recognized text is at wrong position #20

Open j0na1han opened 1 month ago

j0na1han commented 1 month ago

Hi,

I try to mark the text which your library recognized.

The code: ` const frameProcessor = useSkiaFrameProcessor((frame) => { 'worklet'

    frame.render()
    const data = scanOCR(frame)

    const paint = Skia.Paint()
    paint.setColor(Skia.Color('red'))

    if (Object.keys(data).length != 0) {
        for (const block of data.result.blocks) {
            const rect = Skia.XYWHRect(block.frame.x, block.frame.y, block.frame.width, block.frame.height)
            frame.drawRect(rect, paint)
        }
    }

}) `

I added a video with the behavior. Do you have an idea why this so?

https://github.com/user-attachments/assets/bcef128d-dc1d-4411-919e-e4f90cd0b778

react-native-vision-camera: 4.5.3 @ismaelmoreiraa/vision-camera-ocr: 3.0.2-1

ismaelsousa commented 1 month ago

hello, it should be related to the rotation of the frame. what is the ratio of the frame and the ratio of the camera view?

j0na1han commented 1 month ago

Hi, the pixel ratio of the camera view is 3.

How can I get the ratio of the frame?

Here again a video without my special case. https://github.com/user-attachments/assets/634aa738-3bae-4d68-b0e3-38d468b2142b

ismaelsousa commented 1 month ago

You can get the ratio from the frame inside the frame processor

But yeah, I need to take a look at that. We need to convert the bounding boxes to the same ratio and scale of the camera view.

ismaelsousa commented 1 month ago

Wait, hmm, odd, are you using skia to draw the bounding boxes?

I never tried this. You can check the coordinates to see what is happening when you start to move the text to the middle of the screen.

j0na1han commented 1 month ago

Yes I do. You mean it is a problem with react-native-vision camera?

At the top right corner is 0,0. I would expect that in the bottom left corner is 1920,1080. Somehow the max value is not greater than 1000,1000.

coordinates

ismaelsousa commented 1 month ago

hmm, I guess in ML Kit the (0,0) is at top left