feat:Only OCR active app window

seletz commented 5 months ago

This PR implements feature #38. This feature is only active if the new setting:

Implementation Details

It turns out to be surprisingly difficult to get the bounds of the active window. After much trying and googling around, I finally adopted code from SO. I added WindowHelper to hide that code.

The signature of performOCR now takes a NSImage instead of a CGImage. The processScreenshot now passes a cropped NSImage based on the bounds of the active window. Cropping is done in ImageUtils.

seletz commented 5 months ago

BTW -- I also added changes to the project.pbxproj file -- I hope this is OK. I have additional changes locally here, mostly related to code signing.

seletz commented 5 months ago

@jasonjmcghee Hmm -- will have a look, I believe I tried that and it didn't work. I'll try again.

seletz commented 5 months ago

@jasonjmcghee OK so I removed the ImageHelper as per your suggestion. However, it seems the window frame returned by the method you suggested does not work properly. It does return a frame, but using this to crop the window renders the OCR output incomplete -- when using the WindowHelper this is not the case. Perhaps I am missing something?

I "lifted" the OCR generation up one function, and removed the now unused frame argument.

For the filter work you suggested in in some other thread I think we'd need to refactor this a bit so that adding / removing steps to the processing pipeline is more straight-forward. It works pretty good now, however.

jasonjmcghee commented 5 months ago

@jasonjmcghee Hmm -- will have a look, I believe I tried that and it didn't work. I'll try again.

fwiw - seems to work (tried it real quick)

let activePid = NSWorkspace.shared.frontmostApplication?.processIdentifier
let window = shareableContent.windows.first { $0.owningApplication?.processID == activePid }
let frame = window?.frame

logger.debug("Active Frame: \(frame?.debugDescription ?? "<undefined>")")
logger.debug("Active Application: \(activeApplicationName ?? "<undefined>")")

jasonjmcghee commented 5 months ago

@jasonjmcghee OK so I removed the ImageHelper as per your suggestion. However, it seems the window frame returned by the method you suggested does not work properly. It does return a frame, but using this to crop the window renders the OCR output incomplete -- when using the WindowHelper this is not the case. Perhaps I am missing something?

😅 the one thing i didn't try - lemme see, maybe it's a pixel scaling issue?

jasonjmcghee commented 5 months ago

hm... seems to work for me somehow:

    ...
    let window = shareableContent.windows.first { $0.owningApplication?.processID == activePid }
    await processScreenshot(frameId: frameId, image: image, frame: window?.frame)
    ....

later

performOCR(..., frame: CGRect?) {
    ...
    // Crop if a frame was provided
    let img = (frame != nil ? image.cropping(to: frame!) : nil) ?? image
    self.logger.debug("Active Frame: \(frame.debugDescription)")

    let configuration = ImageAnalyzer.Configuration([.text])
    let nsImage = NSImage(cgImage: img, size: NSSize(width: img.width, height: img.height))
    ...

jasonjmcghee commented 5 months ago

to be clear, likely better to pass the frame as CGRect? and only crop if it's present - and then we could only even instantiate the window if the setting exists.

(updated above)

jasonjmcghee commented 5 months ago

but using this to crop the window renders the OCR output incomplete

Maybe I'm not understanding...

Also, i hope it goes without saying I really appreciate you joining as a contributer. it's awesome.

I just want to make sure that if we can manage to do this in like 5 lines of code, we do - also this is using the ScreenCaptureKit API which we'll need to use for filtering anyway

seletz commented 5 months ago

Also, i hope it goes without saying I really appreciate you joining as a contributer. it's awesome.

I just want to make sure that if we can manage to do this in like 5 lines of code, we do - also this is using the ScreenCaptureKit API which we'll need to use for filtering anyway

Hah -- no worries. I've been doing SW development for a very long time. I'm also old enough. I really would also like to have a simple solution. The WindowHelper code seems awkward and surprisingly complex. I'm a Swift and UIKit NOOB, I used to do a little ObjectiveC when iOS was new and hot. So all in all I contribute (a) because I like the idea of the rem app, and (b) I take it as a opportunity to learn something new.

seletz commented 5 months ago

Maybe I'm not understanding...

Well. When I use the frame of the window found using the ScreenCapture Framework, I found that the OCR result does not contain what I expect. I tested this by having a terminal window open and active with some text in it. I was unable to find the text using the Search function. When I use the frame from WindowHelper instead, the text is found.

I did the same test with an open Xcode window -- it does find some text, but not all. I believe the cropped image is not wide enough somehow -- but I need more testing. I'll test this again later tomorrow.

jasonjmcghee commented 5 months ago

Happy to merge as-is if we remove ImageUtils which I think is not being used. We can always iterate to find a more ScreenCaptureKit approach!

jasonjmcghee commented 5 months ago

If you give me permission, happy to co-author by removing ImageUtils and merging your current code @seletz. It's such a great feature and I don't want to halt progress!

seletz commented 5 months ago

If you give me permission, happy to co-author by removing ImageUtils and merging your current code @seletz. It's such a great feature and I don't want to halt progress!

Sure! Sorry for not reacting, RL keeps interfering :)

jasonjmcghee commented 5 months ago

@seletz all good! Is ImageUtils being used? (Is it needed?)

seletz commented 5 months ago

FWIW, updated this branch to the latest mainline. I believe that the cropping is wrong:

So what I said above is wrong., i.e. your window finding method is not the culprit, the cropping is. I believe that the frame of the window found might use some different units or a different coordinate system. I'll need to read apple docs to get to the issue.

We might also save the cropped image to disc in some readable format so we can confirm this visually.

jasonjmcghee commented 5 months ago

ImageHelper (removed in this PR) has a function to save images. It's for debugging and shouldn't be deleted imo.

https://github.com/jasonjmcghee/rem/blob/3431d2a7acddf6eeeec19a429001fceafd94853f/rem/ImageHelper.swift#L34

seletz commented 5 months ago

ImageHelper (removed in this PR) has a function to save images. It's for debugging and shouldn't be deleted imo.

https://github.com/jasonjmcghee/rem/blob/3431d2a7acddf6eeeec19a429001fceafd94853f/rem/ImageHelper.swift#L34

Oh damn, I removed the wrong file -- I'm sorry. Let me add it again.

jasonjmcghee commented 5 months ago

NSScreen has scale factor info fwiw. If that's the issue!

seletz commented 5 months ago

So the cropped images look all wrong, verified by your png save function. I'm now looking at the docs and the frame info.

seletz commented 5 months ago

Ah. CGimage measures in pixels, Window frames in points. We need to scale.

seletz commented 5 months ago

@jasonjmcghee ok this should work now. Could you please have a look?

jasonjmcghee / rem

feat:Only OCR active app window #40

Implementation Details