Closed seletz closed 5 months ago
BTW -- I also added changes to the project.pbxproj
file -- I hope this is OK. I have additional changes locally here, mostly related to code signing.
@jasonjmcghee Hmm -- will have a look, I believe I tried that and it didn't work. I'll try again.
@jasonjmcghee OK so I removed the ImageHelper as per your suggestion. However, it seems the window frame returned by the method you suggested does not work properly. It does return a frame, but using this to crop the window renders the OCR output incomplete -- when using the WindowHelper
this is not the case. Perhaps I am missing something?
I "lifted" the OCR generation up one function, and removed the now unused frame
argument.
For the filter work you suggested in in some other thread I think we'd need to refactor this a bit so that adding / removing steps to the processing pipeline is more straight-forward. It works pretty good now, however.
@jasonjmcghee Hmm -- will have a look, I believe I tried that and it didn't work. I'll try again.
fwiw - seems to work (tried it real quick)
let activePid = NSWorkspace.shared.frontmostApplication?.processIdentifier
let window = shareableContent.windows.first { $0.owningApplication?.processID == activePid }
let frame = window?.frame
logger.debug("Active Frame: \(frame?.debugDescription ?? "<undefined>")")
logger.debug("Active Application: \(activeApplicationName ?? "<undefined>")")
@jasonjmcghee OK so I removed the ImageHelper as per your suggestion. However, it seems the window frame returned by the method you suggested does not work properly. It does return a frame, but using this to crop the window renders the OCR output incomplete -- when using the
WindowHelper
this is not the case. Perhaps I am missing something?
😅 the one thing i didn't try - lemme see, maybe it's a pixel scaling issue?
hm... seems to work for me somehow:
...
let window = shareableContent.windows.first { $0.owningApplication?.processID == activePid }
await processScreenshot(frameId: frameId, image: image, frame: window?.frame)
....
later
performOCR(..., frame: CGRect?) {
...
// Crop if a frame was provided
let img = (frame != nil ? image.cropping(to: frame!) : nil) ?? image
self.logger.debug("Active Frame: \(frame.debugDescription)")
let configuration = ImageAnalyzer.Configuration([.text])
let nsImage = NSImage(cgImage: img, size: NSSize(width: img.width, height: img.height))
...
to be clear, likely better to pass the frame as CGRect?
and only crop if it's present - and then we could only even instantiate the window if the setting exists.
(updated above)
but using this to crop the window renders the OCR output incomplete
Maybe I'm not understanding...
Also, i hope it goes without saying I really appreciate you joining as a contributer. it's awesome.
I just want to make sure that if we can manage to do this in like 5 lines of code, we do - also this is using the ScreenCaptureKit
API which we'll need to use for filtering anyway
Also, i hope it goes without saying I really appreciate you joining as a contributer. it's awesome.
I just want to make sure that if we can manage to do this in like 5 lines of code, we do - also this is using the
ScreenCaptureKit
API which we'll need to use for filtering anyway
Hah -- no worries. I've been doing SW development for a very long time. I'm also old enough. I really would also like to have a simple solution. The WindowHelper
code seems awkward and surprisingly complex. I'm a Swift and UIKit NOOB, I used to do a little ObjectiveC when iOS was new and hot. So all in all I contribute (a) because I like the idea of the rem
app, and (b) I take it as a opportunity to learn something new.
Maybe I'm not understanding...
Well. When I use the frame of the window found using the ScreenCapture Framework, I found that the OCR result does not contain what I expect. I tested this by having a terminal window open and active with some text in it. I was unable to find the text using the Search
function.
When I use the frame from WindowHelper
instead, the text is found.
I did the same test with an open Xcode window -- it does find some text, but not all. I believe the cropped image is not wide enough somehow -- but I need more testing. I'll test this again later tomorrow.
Happy to merge as-is if we remove ImageUtils
which I think is not being used. We can always iterate to find a more ScreenCaptureKit
approach!
If you give me permission, happy to co-author by removing ImageUtils and merging your current code @seletz. It's such a great feature and I don't want to halt progress!
If you give me permission, happy to co-author by removing ImageUtils and merging your current code @seletz. It's such a great feature and I don't want to halt progress!
Sure! Sorry for not reacting, RL keeps interfering :)
@seletz all good! Is ImageUtils being used? (Is it needed?)
FWIW, updated this branch to the latest mainline. I believe that the cropping is wrong:
So what I said above is wrong., i.e. your window finding method is not the culprit, the cropping is. I believe that the frame of the window found might use some different units or a different coordinate system. I'll need to read apple docs to get to the issue.
We might also save the cropped image to disc in some readable format so we can confirm this visually.
ImageHelper (removed in this PR) has a function to save images. It's for debugging and shouldn't be deleted imo.
ImageHelper (removed in this PR) has a function to save images. It's for debugging and shouldn't be deleted imo.
Oh damn, I removed the wrong file -- I'm sorry. Let me add it again.
NSScreen has scale factor info fwiw. If that's the issue!
So the cropped images look all wrong, verified by your png save function. I'm now looking at the docs and the frame info.
Ah. CGimage measures in pixels, Window frames in points. We need to scale.
@jasonjmcghee ok this should work now. Could you please have a look?
This PR implements feature #38. This feature is only active if the new setting:
Implementation Details
It turns out to be surprisingly difficult to get the bounds of the active window. After much trying and googling around, I finally adopted code from SO. I added
WindowHelper
to hide that code.The signature of
performOCR
now takes aNSImage
instead of aCGImage
. TheprocessScreenshot
now passes a croppedNSImage
based on the bounds of the active window. Cropping is done inImageUtils
.