jasonjmcghee / rem

An open source approach to locally record and enable searching everything you view on your Mac.
https://rem.ing
MIT License
2.2k stars 63 forks source link

window retrieval for application identification is using `isActive` which can be multiple windows, the incorrect field to use #15

Closed roblg closed 5 months ago

roblg commented 6 months ago

I've been poking at the screenshot capture with the goal of trying to be able to filter to only the foreground application, and I noticed that what's being captured even on main seems to not match my expectations of what should be being captured based on the assumed intent of the code.

In particular, https://github.com/jasonjmcghee/rem/blob/d34aa69b1e094f757efcf6c5d93d0014a96240da/rem/remApp.swift#L265

looks like it's attempting to find the active window, where "active" is probably supposed to be "the one on top"(?). But there are actually a lot of windows that have isActive = true when I'm running.

Adding the following debugging code to that method:

            shareableContent.windows.forEach() {
                if $0.isActive {
                    print("Active window: \($0.title)")
                }
            }

results in basically every open application on my machine, plus some Desktop-y and menubar-y looking things:

active window: Optional("Offscreen Wallpaper Window")
active window: Optional("Wallpaper-")
active window: Optional("Item-0")
active window: Optional("Recents")
active window: Optional("rem — remApp.swift")
active window: Optional("Item-0")
active window: Optional("Screenshot 2023-12-28 at 7.16.08 PM")
... snip ...

I spent some time looking at the various ScreenCaptureKit APIs, including SCContentFilter, and SCScreenshotManager, but haven't figured out how one would go about getting the actual highlighted/selected window. Just wanted to flag this in case someone driving by knew the answer. :)

jasonjmcghee commented 6 months ago

Thank you! That's incredibly useful information.

I'm clearly doing this 100% wrong! I will look into this as well.

Right now rem is only using it to get the "application name". It's not doing any filtering / only capturing from a specific thing.

jasonjmcghee commented 6 months ago

To speak to the discussion around ScreenCaptureKit:

I ended up using:

guard let image = CGDisplayCreateImage(display.displayID, rect: display.frame) else { return }

Instead of doing a capture stream.

So, is this the right move, instead of their more recent APIs?

Well, I felt this was much, much simpler, and actually captures everything including the status bar, desktop, etc. without having to do a bunch of fiddling.

Now what it doesn't provide is a way to hide specific windows, like a private one.

That's a great feature to add at some point - a user setting: "don't record private browser windows" or something, but for now, this felt like a good approach.

roblg commented 6 months ago

Well, I felt this was much, much simpler, and actually captures everything including the status bar, desktop, etc. without having to do a bunch of fiddling.

Fair point.

I haven't gotten to digging into the OCR piece yet. Does it differentiate between different sections of layout on the screen? (Amazon Textract can do this, and give you bounding boxes). If not, I'm sort of torn on whether it wouldn't make more sense to only record the focused window (or at least figure out how to annotate text extracted from the focused window), so that it's more clear what activity the user is performing, rather than pulling text out of background windows, etc.

Also the ability to avoid capturing certain windows seems really important -- things like 1Password, Messages, and even my terminal in some cases I think would be desirable to exclude. But on the flip side, providing that functionality might spike adoption and result in a lot more people doing drive-by feature requests, rather than plinking in the code :)

jasonjmcghee commented 6 months ago

Ha. It's like you're looking into my soul. This grew much faster than I expected and it's still very unstable. Going for feature complete before basic code organization, decoupling, tests etc seems ripe for trouble.

That being said, having an option in settings that isolates the new behavior feels like a good way to counteract that a bit.

It's using Apples latest OCR stuff- ImageAnalysis. It definitely does some structured recognition. Like it puts line numbers in one column, text in another, and settings in another etc.

Definitely requires experimentation.

I would love to encourage you to communicate about your findings and consider helping out! Window blacklisting is a great feature if we can maintain core functionality!

jasonjmcghee commented 5 months ago

Thought about this more - you could likely do "per-window" OCR as desired. like, get the frame and only do ocr within that frame, for each desired window.

roblg commented 5 months ago

I wonder about per-window OCR -- I'm curious how that would interact w/ overlapping windows, particularly in the simple implementation where there's just a CGImage generated over a rectangle. Also curious about how SCKit handles "excluded" windows. If there's an excluded window on top of a not-excluded window, is the not-excluded window still completely rendered? Definitely seems like experimentation is in order.

jasonjmcghee commented 5 months ago

I think if you properly use SCContentStream, you can record each window individually, even if occluded. But not positive.

And in the case of filtering an occluding window, it'll be like it's not there at all

seletz commented 5 months ago

I just added a PR which probably fixes this. I'm using NSWorkspace to get the frontmost application and use that to determine the application name.

This also opens up further possibilities, e.g. screenshotting only active windows of the frontmost application, or determining the display the current active window of the frontmost application is etc.

seletz commented 5 months ago

FWIW I added issue #38 to add support for OCRing only the frontmost application's active window(s). I think PR #37 does fix the application name issue. I also think that using that route we'll able to support multi-display setups.

I have a very wide display at work, and I think capturing the whole screen might cause some performance issues. Maybe we can optionally even limit screenshotting -- although multi-window applications are a thing and I don't know how this would work. Also, when doing a MP4 this would cause issues as windows would generally differ in size. We could compose a artificial image using the frame of the window and the display, of course ...

jasonjmcghee commented 5 months ago

@seletz thanks for that change! definitey is the right implementation today IMO.

I'm thinking that, should we go the route of allowing certain windows / applications to be filtered (SCFilter / SCContentStream stuff), and someone filters out an application, that, even if it's the front-most application, it shouldn't show up.

Which means we may not be able to use NSWorkspace.shared.frontmostApplication?.localizedName once we do that, as it would reveal the user had been using one of the filtered applications.

seletz commented 5 months ago

@jasonjmcghee Uh ok, you're right. Need to look into the screen capture framework and do some rtfm I guess 🤩

jasonjmcghee commented 5 months ago

Not super pressing in this moment as no one has built windows / application filtering yet- just trying to be transparent in my thought process here / set expectations for the future!

jasonjmcghee commented 5 months ago

I'm going to mark as "fixed" for now thanks to @seletz work. I think filtering is a feature outside of this issue