Closed roblg closed 5 months ago
Thank you! That's incredibly useful information.
I'm clearly doing this 100% wrong! I will look into this as well.
Right now rem is only using it to get the "application name". It's not doing any filtering / only capturing from a specific thing.
To speak to the discussion around ScreenCaptureKit
:
I ended up using:
guard let image = CGDisplayCreateImage(display.displayID, rect: display.frame) else { return }
Instead of doing a capture stream.
So, is this the right move, instead of their more recent APIs?
Well, I felt this was much, much simpler, and actually captures everything including the status bar, desktop, etc. without having to do a bunch of fiddling.
Now what it doesn't provide is a way to hide specific windows, like a private one.
That's a great feature to add at some point - a user setting: "don't record private browser windows" or something, but for now, this felt like a good approach.
Well, I felt this was much, much simpler, and actually captures everything including the status bar, desktop, etc. without having to do a bunch of fiddling.
Fair point.
I haven't gotten to digging into the OCR piece yet. Does it differentiate between different sections of layout on the screen? (Amazon Textract can do this, and give you bounding boxes). If not, I'm sort of torn on whether it wouldn't make more sense to only record the focused window (or at least figure out how to annotate text extracted from the focused window), so that it's more clear what activity the user is performing, rather than pulling text out of background windows, etc.
Also the ability to avoid capturing certain windows seems really important -- things like 1Password, Messages, and even my terminal in some cases I think would be desirable to exclude. But on the flip side, providing that functionality might spike adoption and result in a lot more people doing drive-by feature requests, rather than plinking in the code :)
Ha. It's like you're looking into my soul. This grew much faster than I expected and it's still very unstable. Going for feature complete before basic code organization, decoupling, tests etc seems ripe for trouble.
That being said, having an option in settings that isolates the new behavior feels like a good way to counteract that a bit.
It's using Apples latest OCR stuff- ImageAnalysis. It definitely does some structured recognition. Like it puts line numbers in one column, text in another, and settings in another etc.
Definitely requires experimentation.
I would love to encourage you to communicate about your findings and consider helping out! Window blacklisting is a great feature if we can maintain core functionality!
Thought about this more - you could likely do "per-window" OCR as desired. like, get the frame and only do ocr within that frame, for each desired window.
I wonder about per-window OCR -- I'm curious how that would interact w/ overlapping windows, particularly in the simple implementation where there's just a CGImage generated over a rectangle. Also curious about how SCKit handles "excluded" windows. If there's an excluded window on top of a not-excluded window, is the not-excluded window still completely rendered? Definitely seems like experimentation is in order.
I think if you properly use SCContentStream, you can record each window individually, even if occluded. But not positive.
And in the case of filtering an occluding window, it'll be like it's not there at all
I just added a PR which probably fixes this. I'm using NSWorkspace
to get the frontmost application and use that to determine the application name.
This also opens up further possibilities, e.g. screenshotting only active windows of the frontmost application, or determining the display the current active window of the frontmost application is etc.
FWIW I added issue #38 to add support for OCRing only the frontmost application's active window(s). I think PR #37 does fix the application name issue. I also think that using that route we'll able to support multi-display setups.
I have a very wide display at work, and I think capturing the whole screen might cause some performance issues. Maybe we can optionally even limit screenshotting -- although multi-window applications are a thing and I don't know how this would work. Also, when doing a MP4 this would cause issues as windows would generally differ in size. We could compose a artificial image using the frame of the window and the display, of course ...
@seletz thanks for that change! definitey is the right implementation today IMO.
I'm thinking that, should we go the route of allowing certain windows / applications to be filtered (SCFilter / SCContentStream stuff), and someone filters out an application, that, even if it's the front-most application, it shouldn't show up.
Which means we may not be able to use NSWorkspace.shared.frontmostApplication?.localizedName
once we do that, as it would reveal the user had been using one of the filtered applications.
@jasonjmcghee Uh ok, you're right. Need to look into the screen capture framework and do some rtfm I guess 🤩
Not super pressing in this moment as no one has built windows / application filtering yet- just trying to be transparent in my thought process here / set expectations for the future!
I'm going to mark as "fixed" for now thanks to @seletz work. I think filtering is a feature outside of this issue
I've been poking at the screenshot capture with the goal of trying to be able to filter to only the foreground application, and I noticed that what's being captured even on
main
seems to not match my expectations of what should be being captured based on the assumed intent of the code.In particular, https://github.com/jasonjmcghee/rem/blob/d34aa69b1e094f757efcf6c5d93d0014a96240da/rem/remApp.swift#L265
looks like it's attempting to find the active window, where "active" is probably supposed to be "the one on top"(?). But there are actually a lot of windows that have isActive = true when I'm running.
Adding the following debugging code to that method:
results in basically every open application on my machine, plus some Desktop-y and menubar-y looking things:
I spent some time looking at the various ScreenCaptureKit APIs, including SCContentFilter, and SCScreenshotManager, but haven't figured out how one would go about getting the actual highlighted/selected window. Just wanted to flag this in case someone driving by knew the answer. :)