jasonjmcghee / rem

An open source approach to locally record and enable searching everything you view on your Mac.
https://rem.ing
MIT License
2.18k stars 60 forks source link

Active Display by Mouse Location #93

Closed cparish312 closed 1 month ago

cparish312 commented 1 month ago

This PR is addressing https://github.com/jasonjmcghee/rem/issues/91

The issue was that when a window in in full screen mode on an external displayNSScreen.main is set to 1 (which is the main monitor) even if the active application is on the external monitor. Since this was method used to determine which display to record, an external monitor in fullscreen mode was never able to be recorded.

I attempted to use the active window frame to identify the active display ID but things got strange with the returned frame dimensions, and I could not identify a consistent way to get the correct display ID. This got me concerned about the onlyOCRFrontmostWindow functionality as it utilizes the window.frame to determine the region to crop. Two weird things occurred: 1) Sometimes the height of the frame would be a number below 100 despite the window being much larger (actually happened most when the window was fullscreen) 2) Moving the frame around and resizing it did not change the window.frame dimensions 😬

Also, since the window.frame will contain negative values when the window is on an external monitor, the cropImage functionality fails causing a fallback to full image OCR.

Sub ideal solution: Use the mouse location to identify the active display. Main considerations: 1) Now the active application stored is not necessarily the one in the given frame. This makes the searching by application less functional. Looking at rewind.ai's product, I noticed that the active application is not always the image being shown on the timeline, and I'm assuming this is the reason why. 2) To avoid cropping the wrong region when the active window in on the main monitor, but the active display is not the main monitor, cropping before OCR will only occur when the displayID is 1. Although not very clean, having the active window on an external monitor and the active display as the main monitor is handled already since cropImage will fail with the negative window frame values.

P.S. Maybe this is not an issue when there are no external monitors, but the funky window.frame results (even when the window was on the main display) sketched me out, so I will personally be turning off onlyOCRFrontmostWindow 🤷. Hopefully can find a good solution, but a small amount of research leads me to believe using the accessibility API may be the only robust option.

jasonjmcghee commented 1 month ago

@cparish312 weird frame dimensions were probably due to scaling factors... which get weird and tricky when different monitors have different scaling factors - backingScaleFactor.

Also - what exactly does DisplayID being 1 mean?

cparish312 commented 1 month ago

Okay cool I can look into the backingScaleFactor! Here is an example where I had all 3 windows on fullscreen:

App: Xcode window.frame: (712.0, -1056.0, 1920.0, 54.0) Screen: VG245, Display ID: 5, Coordinates: x=712.0, y=982.0, Width=1920.0, Height=1080.0

App: Firefox window.frame: (-1208.0, -1080.0, 1920.0, 72.0) Screen: HP 24mh, Display ID: 3, Coordinates: x=-1208.0, y=982.0, Width=1920.0, Height=1080.0

App: Messages window.frame: (0.0, 37.0, 1512.0, 68.0) Screen: Built-in Retina Display, Display ID: 1, Coordinates: x=0.0, y=0.0, Width=1512.0, Height=982.0

From my understanding (at least for my device and settings), DisplayID being 1 means that the active display is the main display of the computer.

Something else I came across is the Screens have SeparateSpaces setting that could impact a bunch of these APIs. I have mine set to true.

cparish312 commented 1 month ago

How do you feel about using the mouse for tracking the active display?

jasonjmcghee commented 1 month ago

I'm too ignorant of mac apis to be able to say it confidently, but i'd be very hesitant to rely on a specific display id to determine anything (displayId == 1) - i'd use == CGMainDisplayID()

re using mouse to track the active display - feels relatively (or even quite) sketchy. if i left my mouse on my external monitor, and used cmd + tab to swap over to my ide and start typing, i'd sure hope rem would record wherever i am "active", not necessarily where my mouse happens to be

jasonjmcghee commented 1 month ago

Could we do something like NSApplication.shared.keyWindow?.screen

cparish312 commented 1 month ago

I agree the mouse tracking is not ideal but I wonder if there's a good reason rewind decided to use it. My guess would be robustness given how fickle these API window approaches seem to be.

Just tried NSApplication.shared.keyWindow?.screen. NSApplication.shared.keyWindow is returning nil and I think it is because this only accesses the window within your application that is receiving keyboard events.

cparish312 commented 1 month ago

That said I am currently using my mouse for scrolling even though my active window is on a different monitor so the trade off seems to go both ways (for me at least)

jasonjmcghee commented 1 month ago

Maybe mouse event steals focus to wherever it is and keyboard event falls back to active window?

jasonjmcghee commented 1 month ago

We have global listeners for both that could trigger

cparish312 commented 1 month ago

Unfortunately, not seeing any way to get the active window from the actual keyboard event listener.

I'm a bit worried about this approach bringing back the original bug. Imagine you scroll over to an external monitor that is in full screen mode (start capturing on that monitor). Then, you click and start typing (starts recording your main monitor).

Really wish NSScreen.main didn't have this weird functionality (/bug)!

jasonjmcghee commented 1 month ago

Does CGMainDisplayID have the same problem?

jasonjmcghee commented 1 month ago

I think NSScreen.main is "the screen with the menu bar"

cparish312 commented 1 month ago

CGMainDisplayID() just returns 1. It's just the display ID of the main display

cparish312 commented 1 month ago

They should make a NSScreen.menuBar for that 😂

cparish312 commented 1 month ago

I mean another option is to completely rethink the multiple monitor recording where one possibility is just recording the displays with changes, but this could obviously get more tricky than it is worth at this stage (unless we came up with some elegant way to achieve this). Also this assumes the capturing and image comparison steps are relatively cheap computationally.

cparish312 commented 1 month ago

Overall I'm leaning towards mouse location unless you're strongly opposed (or we find the "real" NSScreen.main without having to use the Accessibility API)

jasonjmcghee commented 1 month ago

Can we make a setting that says "always record window with mouse"?

cparish312 commented 1 month ago

Yeah sure! And then otherwise it defaults to NSScreen.main?

jasonjmcghee commented 1 month ago

Yeah until we find a better solution!

cparish312 commented 1 month ago

This is a great compromise! It is probably worth putting this in the docs somewhere as it would be frustrating to realize rem was never recording your external monitors in fullscreen.

cparish312 commented 1 month ago

One other thing worth noting is if I wake up and my mouse is on my one monitor that takes a second to get an active display (or if a monitor is unplugged), the shareableContent.displays.first(where: { $0.displayID == displayID }) will fail causing the screenshot loop to be exited. For wake up a 4 second or so time delay could fix this. I'm unsure how to elegantly handle the unplugging. One solution I thought of is when the display can't be retrieved instead of returning schedule another screenshot. Then you could add something to make sure it can only try 3 times in a row or so and after that it will return. Thoughts?

Also, thanks for all of the support with these updates!

jasonjmcghee commented 1 month ago

What do you mean by "the unplugging"? I'd say retry a few times and then stop remembering seems like a fine solution here!

I want to enable you to get this over the line, so is this an existing issue? Or does it cause problems as a result of this change set?

If the former, happy to get this in as is!