jasonjmcghee / rem

An open source approach to locally record and enable searching everything you view on your Mac.
https://rem.ing
MIT License
2.18k stars 60 forks source link

Store OCR Locations #94

Closed cparish312 closed 1 month ago

cparish312 commented 1 month ago

Addressing #87

Previously all OCR results for a given frame were combined and stored as a single row in the allText Virtual Table. Now, the framesText table stores a row for each OCR result with columns (frame_id, text, x, y, w, h).

The main reason for this addition is the ability to now group texts into "paragraphs" for higher performance on more complex downstream tasks (i.e. RAG, etc...)

Considerations: 1) Now the OCR results are being stored in both the allText and framesText tables. The issue is replicating the search functionality with just the framesText will be less optimal since OCR result bounding boxes can be unpredictable and search results may not appear if the text is spread across multiple rows in the framesText table. One solution could be grouping the OCR results into "paragraphs" before entering into the framesText table, but I would like to avoid this since that functionality will be extremely unpredictable/ experimental and I'd rather keep the raw results. Any suggestions?? 2) The x, y, w, h columns are currently the raw values of the returned bounding boxes which are actually values from 0-1 that represent the percentage of the frame. These could be converted to pixel values within the image by multiplying x and w by the image.width and y and h by the image.height. That might be a bit more intuitive for users but I'm pretty indifferent.

cparish312 commented 1 month ago

Sorry if I wasn't totally clear! My main reason for adding this in is for improving the ability of downstream workstreams to utilize the OCR results. I think it is a necessary next step for #17.

I mentioned using it to replace the timeline on the fly OCR when we talked, but with more use of the timeline I'm pretty amazed how quick it is and how infrequently I need to wait to copy. Ideally, the timeline would use the framesText table, but since the current version is solid and the extra compute is negligible, I'm not really prioritizing it. Also, I'm unsure if using the framesText table would simplify (I believe you could simplify the Views) or make it more complex (since the ImageAnalysis handles the overlay so well).

jasonjmcghee commented 1 month ago

Got it! Makes sense.

jasonjmcghee commented 1 month ago

@cparish312 thought post merge- you could backfill this by running through all existing images and running image analysis on them. You could add a new menu option that only pops up if there are missing ocr rects, to activate this operation.