lwouis / alt-tab-macos

Windows alt-tab on macOS
https://alt-tab-macos.netlify.app
GNU General Public License v3.0
10.66k stars 323 forks source link

Better trick to get access to other-Space windows #447

Open lwouis opened 4 years ago

lwouis commented 4 years ago

Is your feature suggestion related to a problem? Please describe. When AltTab starts, there is a flash-of-content as windows from other Spaces are temporarily brought in the current space through a private API. This is needed to be able to focus them later. However, it is janky as it confuses the user with the flashing, and is also limited in power as it has a 1s budget to try and grab the windows, after which windows which were not grabbed will not be known to AltTab.

Describe the solution you'd like HyperSwitch is able to focus windows from other Spaces after starting. It does not flash content doing so, so they must have a better way.

lwouis commented 2 years ago

I just tested in a macOS 12.4 VM, and HS still detects windows on other Spaces after launch. There is also no visible flicker or visual glitch on launch.

@koekeishiya had previously found that HS uses the invisible windows trick to show windows of other Spaces, until the user navigates to these Spaces, then they have the AXref.

Anyone knows of an example of a window where the CG and AX APIs report a different title for instance? I would like to confirm that the windows shown by HS are really from the CG API, on boot, or if they somehow use another trick we don't know about yet.

The fact that HS works on 12.4 proves that they don't use the add/remove private APIs, but it doesn't tell us with certainty how they show the windows.

lwouis commented 2 years ago

I've been experimenting to try and replace CGSAddWindowsToSpaces and CGSRemoveWindowsToSpaces. Here are some candidates I tried, with my notes for each:

API Result Notes
func CGSMoveWindowsToManagedSpace(_ cid: CGSConnectionID, _ windows: NSArray, _ space: CGSSpaceID) -> Void It moves the window to another Space. However it doesn't work on fullscreen windows.
func CGSShowSpaces(_ cid: CGSConnectionID, _ sids: NSArray) -> Void Shows windows of the Spaces, but their AXrefs are not obtainable. The windows are only visually there.
func CGSManagedDisplaySetCurrentSpace(_ cid: CGSConnectionID, _ displayUuid: CFString, _ sid: CGSSpaceID) -> Void Bring all windows of the Space onto the current display. It also brings fullscreen windows. However, it break the Spaces on that display as they get assigned to the current Space. They "stack" on Space 1, and there is no way to send them back to their original Space. See more details
func CGSProcessAssignToSpace(_ cid: CGSConnectionID, _ pid: pid_t, _ sid: CGSSpaceID) -> CGError Bringing a process to a Space brings its windows there, so that's good. However, it doesn't bring fullscreen windows. Also tried CGSProcessAssignToAllSpaces but same limitations.
func CGSMoveWorkspaceWindowList(_ cid: CGSConnectionID, _ windowList: CFArray, _ windowCount: UInt, _ sid: CGSSpaceID) -> OSStatus The call compiles and runs, but only returns OSStatus 1001 which is kCGErrorIllegalArgument. I called it with something like CGSMoveWorkspaceWindowList(cgsMainConnectionId, [60930], 1, Spaces.currentSpaceId) which seems to match other calls on Github, but always 1001. It seems it was used in Chromium, but I can't find usages in Chromium latest
func CGSSetWorkspace(_ cid: CGSConnectionID, _ sid: CGSSpaceID) -> OSStatus Link error: Undefined symbol: _CGSSetWorkspace I tried with CGSSetWorkspace, SLSSetWorkspace, or _CGSSetWorkspace, but none of them seem defined. Looking at the SDK symbols for macOS 10.10, I don't see it. I see only _CGSSetWorkspaceForWindow.
func CGSSetWindowWorkspace(_ cid: CGSConnectionID, _ wid: CGWindowID, _ sid: CGSSpaceID) -> CGError Link error: Undefined symbol: _CGSSetWindowWorkspace It seems to also no longer exist in the SDK
func CGSSetWorkspaceForWindow(_ cid: CGSConnectionID, _ wid: CGWindowID, _ sid: CGSSpaceID) -> CGError Link error: Undefined symbol: _CGSSetWorkspaceForWindow I tried also _CGSSetWorkspaceForWindow. It seems to also no longer exist in the SDK
func CGSSpaceAddWindowsAndRemoveFromSpaces(_ cid: CGSConnectionID, _ sid: CGSSpaceID, _ wid: NSArray, _ notSure: Int) -> Void Correctly move windows to the given Space. Works with fullscreen windows. However, it messes with macOS internals, and after moving back a fullscreen window to its original Space, that Space will be fully black for instance. Furthermore, it doesn't work on macOS 12.2

Potential things to look into:

lwouis commented 2 years ago

I've looked again at decompiled HS, and it's clearer now that they use the CG API to list windows from other Spaces, until they can get the AXref later on. It's kind of proved by looking at how they handle closing and minimizing windows. Everything in the OCWindow class:

Minimizing, actually doesn't work if you have never visited the window's Space. The code reflects that:

/* @class OCWindow */
-(char)minimize {
    r14 = self;
    rax = [self axWindow];
    if (rax != 0x0) {
            rax = AXUIElementCopyAttributeValue(rax, @"AXMinimizeButton", &var_18);
            if (rax != 0x0) {
                    if (*(int32_t *)dword_10017eecc > 0x0) {
                            rbx = 0x0;
                            NSLog(@"Couldn't find minimize button for: %@", r14);
                    }
                    else {
                            rbx = 0x0;
                    }
            }
            else {
                    AXUIElementPerformAction(var_18, @"AXPress");
                    CFRelease(var_18);
                    *(int32_t *)(r14 + 0x14) = 0x0;
                    rbx = 0x1;
            }
    }
    else {
            rbx = 0x0;
    }
    rax = rbx & 0xff;
    return rax;
}

Closing works, even without ever visiting the window's Space, because they use the :

/* @class OCWindow */
-(char)close {
    r14 = self;
    rax = [self axWindow];
    rbx = rax;
    if (rax == 0x0) {
            [r14 moveToCurrentSpace];
            rax = [r14 axWindow];
            rbx = rax;
            if (rax != 0x0) {
                    var_20 = 0x0;
                    rax = AXUIElementCopyAttributeValue(rbx, @"AXCloseButton", &var_20);
                    if (rax != 0x0) {
                            if (*(int32_t *)dword_10017eecc >= 0x2) {
                                    NSLog(@"Couldn't get close button for: %@!", r14);
                            }
                            if (AXUIElementPerformAction(rbx, @"AXRaise") != 0x0) {
                                    rax = 0x0;
                            }
                            else {
                                    var_28 = [r14 ownerPSN];
                                    CGEventSetFlags(CGEventCreateKeyboardEvent(0x0, 0xd, 0x1), 0x100000);
                                    CGEventPostToPSN(&var_28, rax);
                                    CFRelease(rax);
                                    CGEventSetFlags(CGEventCreateKeyboardEvent(0x0, 0xd, 0x0), 0x100000);
                                    CGEventPostToPSN(&var_28, rax);
                                    CFRelease(rax);
                                    usleep(0x30d40);
                                    rax = 0x1;
                            }
                    }
                    else {
                            AXUIElementSetMessagingTimeout(var_20, intrinsic_movss(xmm0, *(int32_t *)float_value_1));
                            rbx = AXUIElementPerformAction(var_20, @"AXPress");
                            CFRelease(var_20);
                            rax = rbx == 0x0 ? 0x1 : 0x0;
                    }
            }
            else {
                    rax = 0x0;
            }
    }
    else {
            var_20 = 0x0;
            rax = AXUIElementCopyAttributeValue(rbx, @"AXCloseButton", &var_20);
            if (rax != 0x0) {
                    if (*(int32_t *)dword_10017eecc >= 0x2) {
                            NSLog(@"Couldn't get close button for: %@!", r14);
                    }
                    if (AXUIElementPerformAction(rbx, @"AXRaise") != 0x0) {
                            rax = 0x0;
                    }
                    else {
                            var_28 = [r14 ownerPSN];
                            CGEventSetFlags(CGEventCreateKeyboardEvent(0x0, 0xd, 0x1), 0x100000);
                            CGEventPostToPSN(&var_28, rax);
                            CFRelease(rax);
                            CGEventSetFlags(CGEventCreateKeyboardEvent(0x0, 0xd, 0x0), 0x100000);
                            CGEventPostToPSN(&var_28, rax);
                            CFRelease(rax);
                            usleep(0x30d40);
                            rax = 0x1;
                    }
            }
            else {
                    AXUIElementSetMessagingTimeout(var_20, intrinsic_movss(xmm0, *(int32_t *)float_value_1));
                    rbx = AXUIElementPerformAction(var_20, @"AXPress");
                    CFRelease(var_20);
                    rax = rbx == 0x0 ? 0x1 : 0x0;
            }
    }
    rax = rax & 0xff;
    return rax;
}

We also see functions meant to match AX windows with CGWindow references:

image

We can confirm that they compare AXref title with CGref title for instance:

/* @class OCWindow */
-(char)matchToAxWinByTitle:(struct __AXUIElement *)arg2 {
    r12 = arg2;
    r13 = self;
    rbx = [self cgTitle];
    rdx = [NSCharacterSet controlCharacterSet];
    rax = [rbx stringByTrimmingCharactersInSet:rdx];
    r14 = rax;
    if (rax == 0x0) {
            r14 = 0x0;
            if ([[r13 ownerName] isEqualToString:@"App Store"] != 0x0) {
                    r14 = @"App Store";
            }
    }
    var_30 = 0x0;
    rdx = &var_30;
    AXUIElementCopyAttributeValue(r12, @"AXTitle", rdx);
    rbx = var_30;
    if (rbx != 0x0) {
            rbx = [rbx stringByTrimmingCharactersInSet:[NSCharacterSet controlCharacterSet]];
            [var_30 release];
            rax = [r14 isEqualToString:rbx];
            rcx = 0x1;
            if (rax == 0x0) {
                    rcx = 0x0;
            }
    }
    else {
            rcx = 0x0;
    }
    rax = rcx & 0xff;
    return rax;
}

Also there is a function called createDummySpaceWindow which seems like how they would create the invisible windows used to switch Space.

Also interesting, HS has a function called isUsualUserWindow which filters out windows, similar to AltTab's isActualWindow. They check only 2 things:

They also have some hardcoded checks like owner is Dock, or Safari or Microsoft Office, etc. Lots of hardcoded cases.

lwouis commented 2 years ago

I'm considering ditching the private API tricks, and doing like HyperSwitch: using CG API, and living with a dual-accounting.

I'm trying to compare the pros and cons, so listing here so I can refer to it later:

Also, I'm wondering if we could re-explore the AppleScript APIs. Maybe we could use it to do the actions or extract the info. I remember that the original POC for AltTab was using AppleScript to focus window. It's a shame I didn't use git for this POC.

Anyway, I'll probably explore that. There are many issues with AppleScript, such as: Permissions, no window ID (at least not CGWindowID), performance, maybe other limitations, etc.

Need to explore the various ways to interact with AS: osascript, NSUserScriptTask, NSUserAppleScriptTask, NSAppleScript, OSAKit, etc.

Update: nevermind, I confirmed that AppleScript can't interact with windows on other Spaces. It can see them and get their info, but not send command.

# this shows data of the window on another Space
tell application "Finder"
    get properties of first window
end tell

# this fails to show data of the window on another Space
# but we need to go through "System Events" to call "AXRaise"
tell application "System Events" to tell process "Finder"
    get properties of first window
    perform action "AXRaise" of first window
end tell
metacodes commented 2 years ago

I'm considering ditching the private API tricks, and doing like HyperSwitch: using CG API, and living with a dual-accounting.

If we are going to that way, maybe the PR(#1484 ) has some useful codes.

lwouis commented 2 years ago

@koekeishiya I just found out https://github.com/tonyarnold/virtuedesktops. It uses a Dock extension. It's pretty old code / APIs, so I'm thinking it may old approaches that we wouldn't think of these days.

lwouis commented 2 years ago

I reviewed other apps that deal with windows to see if they handle other Spaces. They all fail: Witch, Contexts, WindowSwitcher, OptimalLayout, uBar.

Only HyperSwitch handles it. And AltTab, until macOS 12.2.

metacodes commented 2 years ago

I reviewed other apps that deal with windows to see if they handle other Spaces. They all fail: Witch, Contexts, WindowSwitcher, OptimalLayout, uBar.

I used Contexts, it can switch to other window on other space or quit app on other space by cmd+q. But it can not handle close window and other window action. I think the most frequently used action is switch window. So I think it's acceptable.

image image
lwouis commented 2 years ago

I used Contexts, it can switch to other window on other space

Keep in mind the ticket we are in. Try to first open a window on another Space. Then come back to the main Space, and open Contexts. Now notice how it is not aware of that window until you visit that Space. Of course or you open the window after Contexts, it will know about it. The issue is when the window existed before

metacodes commented 2 years ago

Yes, I know, I specifically first open a window on another Space, it really show that window on the window list and I can switch to that window. I'm on macOS 12.4.

lwouis commented 2 years ago

I just tried again, and it doesn't work for me. Latest Contexts v3.8.1, on macOS 10.15.

Could you please record a video on your machine? I'm still questioning that it would work, since it clearly doesn't work for me.

metacodes commented 2 years ago

@lwouis

https://user-images.githubusercontent.com/3339872/171548100-b8d7d825-7585-4632-a7c5-2079abf227a4.mp4

I decompiled Contexts a few weeks ago, and it also uses invisible windows(called helperWindow) to implement the switch window feature.

lwouis commented 2 years ago

I wonder if you had quit Contexts properly before you recorded the video. Because Contexts has no visible UI to show that it's running. You have to quit it in its preferences before running the experiment.

Look at what happens on my machine:

https://user-images.githubusercontent.com/106195/171554507-8cd71909-38cd-4775-8ba2-5f14b301740e.mp4

Notice how:

This confirms that until the user visits the Space, they don't have windows data. And when you focus "Finder", they activate the Finder app, which shows one of its window.

metacodes commented 2 years ago

I did close Contexts.

https://user-images.githubusercontent.com/3339872/171563624-f336dfd4-32bc-4bb5-9a1a-da927c606e81.mp4

Very strange, I do work here. My current configuration looks like this:

image

But I found that sometimes it would take a long time to switch to fullscreen space or failed(just sometimes). For non-fullscreen windows, it works fine.

lwouis commented 2 years ago

In your example, you use 2 windows which don't have their own name. So I still think Contexts is just showing you that your apps were open.

Could you try exactly the same use-case as in my video?

Does it list "Finder", or does it list 2 windows "Desktop" and "Applications"?

metacodes commented 2 years ago

In your example, you use 2 windows which don't have their own name. So I still think Contexts is just showing you that your apps were open.

Could you try exactly the same use-case as in my video?

  • Open a Finder window with Desktop
  • Send it to Space 2
  • Open a Finder window with Applications
  • Send it to Space 3
  • From Space 1, open Contexts
  • Press command+tab

Does it list "Finder", or does it list 2 windows "Desktop" and "Applications"?

In your use-case, my test result is same to you.

lwouis commented 2 years ago

Recap of the saga so far:

I'm running out of ideas or areas to explore. Any help would be very welcome ๐Ÿ™‡โ€โ™‚๏ธ

jkelleyrtp commented 2 years ago

There's another app in the ecosystem - TotalSpaces - that seems to get window data. Or, at least, it's able to do window drags and window screenshots from a fresh start.

I'll see if I can figure out how it's working...

Screen Shot 2022-06-11 at 12 50 07 PM
lwouis commented 2 years ago

@jkelleyrtp it seems TotalSpaces requires the user to disable SIP: https://totalspaces.binaryage.com/installing-mojave

So maybe they inject the Dock like yabai. Not a solution that will work for AltTab casual userbase, unfortunately.

jkelleyrtp commented 2 years ago

Not with TotalSpaces3 - I am actually working on it and there is no SIP disable required. My screenshot is from the TS3 Beta released on the binaryAge forums.

metacodes commented 2 years ago

I agree, for the purposes of this software it is not an ideal solution.

My naive idea would instead be to focus every space once during startup, calling the AX API to retrieve refs -- once for each space, and then re-focus the original space again. I think this should work fine, but there will be a short span of visual flicker during first launch. Not sure how acceptable that is, but it should be able to detect all windows.

You'd need a combination of the following API's to do this:

extern void CGSManagedDisplaySetCurrentSpace(int cid, CFStringRef display_ref, uint64_t spid);
extern uint64_t CGSManagedDisplayGetCurrentSpace(int cid, CFStringRef display_ref);
extern CFArrayRef CGSCopyManagedDisplaySpaces(const int cid);
extern CFStringRef CGSCopyManagedDisplayForSpace(const int cid, uint64_t spid);
extern void CGSShowSpaces(int cid, CFArrayRef spaces);
extern void CGSHideSpaces(int cid, CFArrayRef spaces);

This process would likely have to happen for each connected monitor:

# https://developer.apple.com/documentation/coregraphics/1454603-cggetactivedisplaylist
CGGetActiveDisplayList(display_count, result, count);

# convert a CGDisplayID to a CFStringRef (UUID) used by the above spaces API
CFStringRef display_uuid(uint32_t did)
{
    CFUUIDRef uuid_ref = CGDisplayCreateUUIDFromDisplayID(did);
    if (!uuid_ref) return NULL;

    CFStringRef uuid_str = CFUUIDCreateString(NULL, uuid_ref);
    CFRelease(uuid_ref);

    return uuid_str;
}

@lwouis I think the solution @koekeishiya mentioned is feasible. You also mentioned some of the problems you encountered with this method. But I think we can avoid that by using invisible window trick if we have problems by using those CGS Space APIs. Here is my solution:

  1. Startup AltTab with a fullscreen animation if there are multiple spaces(The animation is screen exclusive, just like totalspaces3 does when showing all the space previews.). We do this for better UX when we switch space to get AXRef. When I'm on the space previews window of TotalSpaces3, I try to switch to other space but the screen no change(like the picture @jkelleyrtp mentioned), so I think it's screen exclusive. We can use that to avoid visual flicker for better UX.
  2. Using invisible window to switch to other spaces which have windows
  3. Get all of the AXRefs of that space
  4. Go to step 2 if there are other spaces
  5. End the animation

I think using that animation like a loading window can help us to avoid the animation when switching spaces. For users, it's a better UX.

lwouis commented 2 years ago

@metacodes it's hard for me to understand what you describe. I can't run TotalSpaces3 because it's Apple Silicon only, as far as I can tell, and I don't have an AS machine. Maybe you could record some of the flows you mention, to share with us what this app does, and how we could maybe copy some of their techniques?

lwouis commented 2 years ago

@jkelleyrtp @metacodes FYI, I sent an email to Stephen (his email was on the top left on this blog post). I asked him if he would be willing to share the technique he's using with TS3. I hope he's willing to share his knowledge~

metacodes commented 2 years ago

@lwouis You can use Keynote.app to understand the fullscreen animation I mentioned. When you play a keynote with fullscreen, you can not switch space by your trackpad or others. The Keynote.app's fullscreen mode is different from other apps. Maybe this screen remains unchanged if we show an animation with this fullscreen mode when we switch the space in the background.

image
lwouis commented 2 years ago

@metacodes yes Keynote creates a window that takes all the screen space, but that's not a native fullscreen. Other apps do that like firefox (see #558), some video players, games, etc.

Ok so what you're suggesting is:

On launch, AltTab essentially obscures the user's screen, like putting curtains in front of the screen. While the screen is hidden, AltTab manually switches to all Spaces one by one, to capture AXrefs. Then, AltTab stops obscuring the screen.

I think it's overall a bad UX to make the computer unusable for a little while. I see a lot of issues with this:

lwouis commented 2 years ago

@jkelleyrtp @metacodes FYI, I sent an email to Stephen (his email was on the top left on this blog post). I asked him if he would be willing to share the technique he's using with TS3. I hope he's willing to share his knowledge~

He replied and they are using CG APIs + SLSMoveWindowsToManagedSpace, so they simply don't support the fullscreen windows scenarios i guess. I wish i could play around with TS3 but i don't have an AS mac.

lwouis commented 2 years ago

Today I did more testing on how fullscreen windows actually work. We support closing, minimizing, de-fullscreening them, from another Space. Stuff that macOS won't let you do otherwise. I realized that it creates weird artifacts:

Scenario Behavior
Close a fullscreen window from another Space The window quickly flashes on the current Space, then is closed
Minimize a fullscreen window from another Space The window quickly flashes on the current Space, then it actually works, surprisingly, even though macOS disables the yellow "minimize" button if you go on that Space to minimize with the mouse
De-fullscreen a fullscreen window from another Space The window quickly flashes on the current Space, then is nowhere to be seen. Its Space is destroyed. You can still get the window back by right-clicking on its app's Dock icon, then selecting that window. It's still open, just not accessible on any Space directly. That behavior is pretty bad UX
Hide (an app with) a fullscreen window from another Space Nothing happens for that window. Non-fullscreen windows of that app are hidden. This one is weird even with native macOS UI interactions. Fullscreen windows don't get hidden when you hide an app.

A note on close and minimize: we first set kAXFullscreenAttribute to false, and only then we send the close/minimize event. For the minimize event, we wait 1s (hardcoded duration :/), because otherwise macOS ignores the command to minimize, as it's still doing the de-fullscreen animation.

Conclusion: I think that dealing with fullscreen windows in general is a broken experience on macOS. Same with AltTab. Maybe we could just give up on fullscreen window, and always bring the user to their Space before doing any action on them. That way we always get the AXref before acting. Then for non-fullscreen windows, we can use SLSMoveWindowsToManagedSpace to bring them to the current Space before sending a command, to get the AXref. Alternatively, we could do bring them to the current Space in advance, when they are spawned, so that later we don't need to.

The advantage of bringing the windows early is that then we have the AXref to show title, remove non-windows, and do commands from the current Space. The downside is that it flashes those windows for the user (maybe there is a way to hide them temporarily?). And vice-versa for the other approach.

lwouis commented 2 years ago

I found this function in SkyLight: _SLSPackagesAssignDraggedWindowToDestinationSpace(int arg0, int arg1, int arg2, int arg3, int arg4, int arg5). It seems to be still available on Catalina.

image

@koekeishiya @jkelleyrtp @metacodes Do you know the complete signature of that API?

ifsheldon commented 2 years ago

A slight deviation: I see Apple released a new kit this year in WWDC, which might be helpful to detect windows. The kit is ScreenCaptureKit. This is kind of a misuse, but one of its APIs, SCShareableContent, seems perfect for grabbing all information of all windows of all displays.

(I guess) All we need is:

  1. the permission of screen capture, because the kit requires it but we don't actually need to capture the screen
  2. use this API every time we need the window information

I'm not sure if this works because I didn't try it and I'm not a Swift developer, so this is just a suggestion. I will try this API and write a minimal demo when I'm not busy. If anyone wants to try out, go ahead.

lwouis commented 2 years ago

@ifsheldon I'm afraid, ScreenCaptureKit is barely wrapping the existing CG/CGS APIs. The data it returns for windows is quite limited: https://developer.apple.com/documentation/screencapturekit/scwindow

I think it could perhaps be used for #122, where I also suggested it. But it would not solve the issues discussed in this ticket here, as this new API doesn't provide us with the Accessibility window reference we need to focus/miniaturize/close windows. I also expect it would return the same windows as CGWindowListCopyWindowInfo. Notice how similar the parameters are to SCShareableContent.getExcludingDesktopWindows.

brettstover commented 2 years ago

Perused this thread and have some thoughts, some of which might be worthwhile. I have no experience with the accessibility APIs discussed here though, so keep that in mind.

1: If I follow the thread correctly, it's assumed that AXUIElement can be passed from process to process (e.g., from a daemon to the main app). I'm not sure if this is true, so if it is true, assume for the below points we're passing AXUIElement (or some intermediate type that AXUIElement can be reconstructed from), and if not true, then assume we're passing a dictionary of window related values.

2: I think a login item / helper app could work well here at least as a partial solution. While it doesn't launch before login as you wanted with a daemon, it can be launched on login, run in the background windowless with no menu bar or dock item and can survive the termination and relaunch of the main app. Therefore even without launching before login, it's better positioned to have more of the data you need than the main app would be. It also inherits the permissions granted to the main app which can be helpful. Also, in another thread, it was mentioned that the AX calls are blocking and that this can be a problem. If the AX calls are made in a separate process, then this likely isn't an issue.

Good tutorial on login items: https://martiancraft.com/blog/2015/01/login-items/ Btw, https://developer.apple.com/documentation/servicemanagement/smappservice looks interesting.

3: The approach I'd suggest here that's a bit simpler than using XPC is to setup a user defaults suite that is shared b/w the helper and the main app. The helper app would store which spaces and windows it has encountered in the shared user defaults and the main app would use that as a backup for any spaces it has not yet encountered. Again, I'm not sure if this would be storing a representation that allows for reconstruction of an AXUIElement or just a dictionary of basic window information. Additionally you could persist screenshots to disk and store URLs to those screenshots in the user defaults.

The main app would check to get a list of the current spaces, and check its own memory to see if it has the window/AXUIElement information needed for these spaces and if not check the shared user defaults for information for any spaces it hasn't yet navigated to. In many/most cases, either the main app or helper app would have the information for all spaces on screen, in which case all is good.

In cases where the shared user defaults doesn't already have all information needed from all spaces, then trigger existing solution for getting that data for only the spaces with that missing data. Whatever difficulties exist with that solution, at least this approach should minimize the occurrence of needing to use it.

lwouis commented 2 years ago

@brettstover AltTab already launches at login, so it can monitor things in the same capacity as the alternative solution you describe. It's already multi-threaded to avoid blocking. The only downtime is during upgrades where it restarts and lost context. But having a background service wouldn't solve that since that service would need to restart on upgrade as well. So it's more of a topic of serializing the state on disk either way. And we don't do that today because there could be differences before/after and we don't control when AltTab is back. Could be minutes and windows could be shuffled in between.

stephancasas commented 1 year ago

Hello, all. I'm new to this conversation, so if I'm saying something which has already been suggested, please feel free to let me know. If you're not opposed to continuing use of the private CoreGraphics framework, I've found a trick that works extremely well:

macOS continuously registers a new keyboard accelerator each time a new desktop/space is created. The accelerator isn't activated unless previously enabled by the user in System Preferences. However, using the function CGSSetSymbolicHotkeyEnabled(int, BOOL), you can activate the accelerator programmatically. What's more, since you're calling a method and not writing to a PLIST store, the change is instantly registered by the CoreGraphics keyboard events listener.

Determining what space you want for which window can be done by first querying CGSCopySpacesForWindows(CGSConnectionID id, int spacesMask, CFArrayRef windows). This will give you the CGSpaceID for a CGWindowID.

To resolve the CGSpaceID into the space's ordered index or human-readable desktop number, you can use CGSCopyManagedDisplaySpaces(CGSConnectionID id) to get an ordered list of all spaces. Flat-map the spaces from each CGDisplay entity, and then find the index of your CGSpaceID from the previous step. You now have the zero-based desktop number of the target space, add 1 to turn this into the human-readable desktop number.

Finding the correct hotkey to focus the target space can be done by reading-in the com.apple.symbolichotkeys PLIST. Switching for numbered desktops begins with desktop 1 at index 118. Add the zero-based desktop number to 118 to determine which symbolic hotkey you'll need to enable, and then call CGSSetSymbolicHotkeyEnabled(int, BOOL) to engage the listener โ€”ย using the PLIST-resolved index as the first arg and YES as the second.

The parameters value of each entity in com.apple.symbolichotkeys.plist is structured as follows:

(
  {{ ascii_value_of_keyboard_glyph_if_applicable }},
  {{ osascript_key_code_of_keyboard_key }},
  {{ bitwise_nxkeymask_of_modifier_keys }}
)

Resolve the NXKeyMask value, into the modifier keys, and then use either System Events or CGEventPost to dispatch the keyboard event. The space will snap into focus, and you can now use AXUIElementCreateApplication to make your desired window frontmost โ€” after which you can activate the target process using [[NSRunningApplication runningApplicationWithProcessIdentifier: {{ PID }}] activateWithOptions: 2].

This solution is working extremely well for me. I'm not an Objective-C programmer, and I pieced this together as best I could through a lot of trial and error, so there may be places where efficiency can improve. As stated before, if I'm missing something, please feel free to point it out to me. I can post a working copy if you'd like to see some code.

lwouis commented 1 year ago

@stephancasas this is interesting information. Thank you for sharing.

As i understand, this would allow us to switch to specific Spaces. We can do this already, in a simpler way actually. We have a strategy where we spawn invisible windows in every Space. We can then focus them to force macOS to focus that Space. More info in the last bullet point of this recap.

The issue with switching Space is that it's visible for the user. It disturbs their work when we start going Space by Space to visit.

How fast is your method at visiting Space? Could we call it like 10 times in a row for 10 Spaces, really quickly, so the user sees only a "flash" on-screen?

stephancasas commented 1 year ago

@lwouis I may have misunderstood the initial issue. Is the aim to find a different way of navigating to a space once a window is selected, or to find a different way of getting thumbnails for windows which are on other spaces?

What I've described would only be useful in the former, not the latter.

lwouis commented 1 year ago

@stephancasas this ticket is about the following topic.

When windows are on other Spaces than the active Space, we can't get their AXref, which is the technical structure that lets us do many things with them (e.g. focus them, minimize them, get their title, get their screenshot, etc).

What we were doing before Monterey was to use a private API to instantly teleport all windows on the active Space. Then we would grab their AXref, then we would teleport them back in their original Spaces. From the user perspective, they would open AltTab, and see quick flash on screen, sometimes barely noticeable, then AltTab would list all windows nicely.

The API which teleports windows is broken in Monterey onwards. This ticket investigate alternatives.

We could ask the user to visit all Spaces manually, or we could visit them automatically on launch, but all these solutions make for a bad UX. We are looking for something more invisible to the user, that would let us grab the AXref somehow.

goulashsoup commented 1 year ago

@lwouis @koekeishiya @metacodes Maybe instead of just focusing on already known private APIs we could analyse the code stack of the macOS dock and see if there are any private APIs that are not discovered yet.

Found these two articles about how to reverse engineer macOS APIs:

koekeishiya commented 1 year ago

The discussion above did not focus on known APIs; it included looking at basically every symbol exported by the SkyLight.framework, which is the interface to the WindowServer.

I don't remember exactly every detail that was attempted in this discussion, but the core of the issue is:

To focus a window, you need a reference through the AX API. This is the only way to focus a specific window on macOS (unless you inject code into the Dock, which requires SIP to be disabled).

To get an AX reference for a window, that window must be on a currently visible space.

The workaround in alttab that worked for older versions of macOS was to detect windows using private APIs and move them to the currently active space, so that they would be eligible for usage through the AX API.

goulashsoup commented 1 year ago

The discussion above did not focus on known APIs; it included looking at basically every symbol exported by the SkyLight.framework

I was not talking known APIs!

To focus a window, you need a reference through the AX API. This is the only way to focus a specific window on macOS (unless you inject code into the Dock, which requires SIP to be disabled).

Yes I know, you inject code into the Dock App (e.g. in window_manager_focus_window_without_raise right?) if you don't have th axuiref, in Swift its an object of AXUIElement.

I think the Dock App uses an API that we can expose where you use the injection.

@koekeishiya Did you take a look at the assembly code of the Dock App to find out which memory addresses to use? If yes, did you not find any APIs used that we can expose?

koekeishiya commented 1 year ago

Did you take a look at the assembly code of the Dock App to find out which memory addresses to use? If yes, did you not find any APIs used that we can expose?

Yes I did, and no there is no API that does what alttab needs, that work on the newest version of macOS.

--

window_manager_focus_window_without_raise is not code injection; it simply sends bytes to a specific application based on an event protocol that I figured out by instrumenting code using Frida.re. This alone is not enough to fully focus a window, and must be used in combination with the AX API. It is used to work around a bug that makes the AX API not focus the correct window in a multi-monitor setup.

I am not going to go into details here, but basically every GUI application on macOS register themselves with the Dock (this is part of Carbon/Cocoa); setting up an event handler and a mach port for communication. The Dock runs the server part, and applications connect and give the Dock communication rights. The Dock then uses this mach port to signal an application (using the process serial number and window id) to make a specific window the key-window (focused window). You can hook into this part, but as I said it requires injecting code into the Dock's process space, which requires SIP to be disabled. I have hooked this function for use in yabai many years ago.

goulashsoup commented 1 year ago

Yes I did, and no there is no API that does what alttab needs I am not going to go into details here ...

Well, thats a lot of detail already, thanks ๐Ÿ˜

And I suppose we can not "expose" the Dock App source code functions, because this is only possible for shared libraries, right?

Anyway, I want to analyze the Dock App myself, therefor I have to disable SIP also i suppose.