lwouis / alt-tab-macos

Windows alt-tab on macOS
https://alt-tab-macos.netlify.app
GNU General Public License v3.0
10.66k stars 323 forks source link

Better trick to get access to other-Space windows #447

Open lwouis opened 4 years ago

lwouis commented 4 years ago

Is your feature suggestion related to a problem? Please describe. When AltTab starts, there is a flash-of-content as windows from other Spaces are temporarily brought in the current space through a private API. This is needed to be able to focus them later. However, it is janky as it confuses the user with the flashing, and is also limited in power as it has a 1s budget to try and grab the windows, after which windows which were not grabbed will not be known to AltTab.

Describe the solution you'd like HyperSwitch is able to focus windows from other Spaces after starting. It does not flash content doing so, so they must have a better way.

lwouis commented 4 years ago

This investigation has already been discussed, and there is lots of interesting information in https://github.com/lwouis/alt-tab-macos/issues/431

koekeishiya commented 4 years ago

I downloaded HyperSwitch to test, and it appears to me that they do not actually have a real fix for this issue. They actually appear to somehow freeze a majority of screen updates and other actions while HyperSwitch is launching. I assume that during this time they actually use the same trick that you have implemented, but they try hard to avoid the visual flicker. You can see artifacts or weird behaviour if you try to switch a space or open mission-control during this split second during its launch.

Edit: Actually, they place invisible windows on each space that they use to first make that space focus, and then they re-focus the target window. I managed to spot the active application in the menubar saying "HyperSwitch" during the space transition, and immediately after the transition ends, it focuses the window I selected. Then if I switch back, and switch to the same window again, it does not show "HyperSwitch" during the transition, but the name of the actual application of the window I selected, as they now have the AXUIElementRef to work with.

lwouis commented 4 years ago

Damn @koekeishiya you're clever! I suspected the invisible windows trick, but I ran the accessibility API on HyperSwitch itself, and found it had 0 window. I forgot that this api doesn't list windows from other Spaces.

I'll try to implement this trick. It's way smoother that flashing the screen 👍

lwouis commented 4 years ago

Thinking about it, the trick of invisible windows on all Spaces is not good enough. We need to have AXUIElement for all windows. This gives us the correct titles, role, subrole, etc. We use these as heuristics to decide if the windows should be shown or not (see #456).

By accepting that we sometimes don't have an AXUIElement for a window, we need to implement a whole mirror logic using the CG APIs: get the title, filter windows based on other criterias specific to the CG APIs, etc.

There must be a better way. I wonder if it's possible to activate a window on another Space without switching to that Space. Maybe in that scenario, we get access to that Space's windows.

metacodes commented 2 years ago

It seems that we can not get AXUIElement for a window in other space by using private API CGSAddWindowsToSpaces from macOS 12.3.1. We can not get AXUIElement for a fullscreen window by using private API CGSAddWindowsToSpaces from macOS 12.3.

lwouis commented 2 years ago

@metacodes it seems you found the root cause for https://github.com/lwouis/alt-tab-macos/issues/1324

metacodes commented 2 years ago

@metacodes it seems you found the root cause for #1324

I found this problem yesterday while testing the code(#1484 ), and it tested perfectly fine on 12.1. Regarding this private api, I read some of the posts you discussed before, as well as did some research and partially solved the problem so far. There are still some issues to be solved, and I hope to find a perfect way to solve this problem.

jkelleyrtp commented 2 years ago

Thinking about it, the trick of invisible windows on all Spaces is not good enough. We need to have AXUIElement for all windows. This gives us the correct titles, role, subrole, etc. We use these as heuristics to decide if the windows should be shown or not (see #456).

By accepting that we sometimes don't have an AXUIElement for a window, we need to implement a whole mirror logic using the CG APIs: get the title, filter windows based on other criterias specific to the CG APIs, etc.

There must be a better way. I wonder if it's possible to activate a window on another Space without switching to that Space. Maybe in that scenario, we get access to that Space's windows.

I figured this one out. They cache which windows are on the space after you leave the space and monitor for closing events given the cached AXUIElementRefs.

You can also get a list of windows and spaces from a combination of CGS apis, cross-referencing, and the com.apple.spaces plist. There's also the screenshot API which alt-tab could take advantage of.

When you switch to a particular window in their hyperswitch list, it actually switches to its own window on that screen and then gives up focus to the window the user target. Fortunately, this doesn't use any private APIs.

Here's my proof-of-concept. I wrote it in Rust w/ Tao as the window manager.

Note that since my use case is just cmd+tab into alternative spaces, I don't both focusing to a particular window. However, I do keep the list of windows around.

Now if we could only figure out how to get rid of the space switching animation....

http://github.com/jkelleyrtp/kauresel

https://user-images.githubusercontent.com/10237910/165623636-c6a30e9d-1639-483f-8a6e-01bd762cf311.mov

koekeishiya commented 2 years ago

Now if we could only figure out how to get rid of the space switching animation....

This is impossible if you don't inject code into Dock.app: https://github.com/koekeishiya/yabai/issues/1235#issuecomment-1105269758 You literally need to patch the function inside Dock.app that is responsible for doing this animation. You can use private APIs (https://github.com/koekeishiya/yabai/blob/master/src/osax/payload.m#L59), but macOS will be out of sync because there are internal data structures inside the Dock's process-space that keep track of (and render/display) this state. If you use those APIs you need to kill the Dock process to trigger a reload of the system state.

lwouis commented 2 years ago

@jkelleyrtp i didn't understand your suggestion fully.

Here is the challenge:

How does your POC solve that?

jkelleyrtp commented 2 years ago

@jkelleyrtp i didn't understand your suggestion fully.

Here is the challenge:

  • AltTab starts
  • There is already a window on another Space
  • User presses alt+tab
  • AltTab has to show the window on the other Space even though the user has never been to that Space. We can't use other APIs than AX because our window discrimination logic needs the AX ref to decide if the window is to be shown or not

How does your POC solve that?

From what I understood about the problem

At first glance you can't get an AXRef without seeing the window first and then caching it. However, if the AXRef is used only to focus the window from a foreign space, then we can circumvent the limitation by placing our own windows in every space.

This works if your discrimination logic can get by with the information gleaned from:

GCWindowListCopyWindowInfo produces a bunch of helpful stuff

If you need the AXRef to perform discrimination logic then tossing your own windows into each space doesn't work. But if you can get by without it (either by filtering titles, spaces, or or layers), then you can use your own AXRefs from hidden windows to switch spaces. Once you're at the new space, you can bring the window forward immediately now that its AXRef is availabile to you.

My PoC creates hidden windows (well not in the gif but it does) and then sends them to all of the unique spaces gathered from com.apple.spaces and CGSCopySpacesForWindows. For the app I'm trying to build, this is enough, since all I want is to switch between virtual desktops with cmd+tab with a grid-based layout.

metacodes commented 2 years ago

After some decompiling, I found out that the Contexts.app switch space by using an invisible window. I think we don't actually need to get the AXUIElement of the other space when AltTab is launched, we can delay to get the AXUIElement when we switch to that space. We can move the helper window to the space we want by using CGSMoveWindowsToManagedSpace, and then switch to that space. After that, we can get AXUIElement and focus that window we want. The implementation code is on the PR #1484 .

metacodes commented 2 years ago

The following codes are from Contexts.app. It uses these codes to switch space.

-(void)makeFrontProcess {
    SetFrontProcessWithOptions(&self->_processSerialNumber, 0x1);
    return;
}

/* @class CWSActivateWindowOperation */
-(void)changeToSpace:(void *)arg2 {
    var_60 = [arg2 retain];
    rax = objc_alloc_init(@class(CWSActivateWindowOperationHelperWindow));
    r15 = *_objc_msgSend;
    [self setHelperWindow:rax];
    [rax release];
    rax = [self helperWindow];
    rax = [rax retain];
    [rax makeKeyAndOrderFront:0x0]; //the helperWindow is NSWindow
    [rax release];
    var_58 = self;
    rax = [self helperWindow];
    rax = [rax retain];
    rdx = [rax windowNumber]; //NSWindow.windowNumber
    r13 = [[CTCoreGraphics spacesForWindow:rdx withSpaceMask:0x7] retain];
    [rax release];
    rax = [self helperWindow];
    rax = [rax retain];
    r15 = rax;
    r12 = [rax windowNumber];
    var_38 = var_60;
    rax = [NSArray arrayWithObjects:rdx count:0x1];
    rax = [rax retain];
    rdx = r12;
    var_50 = r13;
    r12 = *_objc_msgSend;
    [CTCoreGraphics moveWindow:rdx fromSpaces:r13 toSpaces:rax];
    [rax release];
    [r15 release];
    rdx = var_60;
    rbx = [[CTCoreGraphics screenForSpace:rdx] retain];//return screen id
    r14 = r12;
    r12 = [[var_58 helperWindow] retain];
    var_68 = rbx;
    if (rbx != 0x0) { //not nil
            rdx = @selector(frame);//get frame origin
            objc_msgSend_stret(&var_90, rbx, rdx);
            intrinsic_movsd(xmm0, var_90);
            intrinsic_movsd(xmm1, *(&var_90 + 0x8));
    }
    else {
            intrinsic_movaps(var_80, 0x0);
            intrinsic_movaps(var_90, 0x0);
    }
    var_30 = **___stack_chk_guard;
    (r14)(r12, @selector(setFrameOrigin:), rdx);//set helperWindow's frame
    [r12 release];
    // set self.helperWindow's frame,and NSWindow.makeKeyAndOrderFront
    (r14)([(r14)(var_58, @selector(helperWindow), rdx) retain], @selector(makeKeyAndOrderFront:), 0x0);
    [rax release];
     // NSApp.activateIgnoringOtherApps
    (r14)(**_NSApp, @selector(activateIgnoringOtherApps:), 0x1);
    // print some logs
    var_48 = @"toSpace";
    var_40 = var_60;
    rax = (r14)(@class(NSDictionary), @selector(dictionaryWithObjects:forKeys:count:), &var_40, &var_48, 0x1);
    (r14)(var_58, @selector(logInfo:data:), @"Changing space complete.", [rax retain]);
    [rax release];
    [var_68 release];
    [var_50 release];
    [var_60 release];
    if (**___stack_chk_guard != var_30) {
            __stack_chk_fail();
    }
    return;
}

//helperWindow init
/* @class CWSActivateWindowOperationHelperWindow */
-(void *)init {
    var_40 = self;
    *(&var_40 + 0x8) = *0x10053b878;
    rax = [[&var_40 super] init];
    rbx = rax;
    if (rax != 0x0) {
            intrinsic_movaps(var_30, 0x0);
            [rbx setFrame:0x1 display:intrinsic_movaps(var_20, intrinsic_movaps(0x0, *(int128_t *)0x100420d90))];
            rsp = (rsp - 0x20) + 0x20;
            [rbx setStyleMask:0x0];
            [rbx setIgnoresMouseEvents:0x1];
            [rbx setHidesOnDeactivate:0x1];
            [rbx setTitle:@"Contexts H"];
            [rbx retain];
    }
    [rbx release];
    rax = rbx;
    return rax;
}

/* @class CTCoreGraphics */
+(void *)spacesForWindow:(unsigned int)arg2 withSpaceMask:(int)arg3 {
    r14 = arg3;
    rbx = arg2;
    if ([self privateApiAvailable] != 0x0) {
            r15 = (*qword_10054e740)(); // CGSMainConnectionID()
            rax = [NSNumber numberWithUnsignedInt:rbx];
            rax = [rax retain];
            var_38 = rax;
        // CGSCopySpacesForWindows()
            rbx = qword_10054e788(r15, r14, [NSArray arrayWithObjects:rbx count:0x1]);
            [rax release];
    }
    else {
            rbx = [[NSArray array] retain];
    }
    if (**___stack_chk_guard == **___stack_chk_guard) {
            rax = [rbx autorelease];
    }
    else {
            rax = __stack_chk_fail();
    }
    return rax;
}

/* @class CTCoreGraphics */
+(void)moveWindow:(unsigned int)arg2 fromSpaces:(void *)arg3 toSpaces:(void *)arg4 {
    r12 = arg2;//helperWindow.windowNumber
    var_48 = [arg3 retain];
    r15 = [arg4 retain];
    if ([self privateApiAvailable] != 0x0) {
            r14 = (*qword_10054e740)(); // CGSMainConnectionID()
            var_38 = [[NSNumber numberWithUnsignedInt:r12] retain];
       // CGSRemoveWindowsFromSpaces(), remove helper window from that space
            qword_10054e798(r14, [NSArray arrayWithObjects:r12 count:0x1], var_48);
            [rax release];
            r13 = (*qword_10054e740)(); // CGSMainConnectionID()
            var_40 = [[NSNumber numberWithUnsignedInt:r12] retain];
        // CGSAddWindowsToSpaces(), add helper window to that space
            qword_10054e790(r13, [NSArray arrayWithObjects:r12 count:0x1], r15);
            [rax release];
    }
    var_30 = **___stack_chk_guard;
    [r15 release];
    [var_48 release];
    if (**___stack_chk_guard != var_30) {
            __stack_chk_fail();
    }
    return;
}
metacodes commented 2 years ago

HyperSwitch.app uses these codes to switch space. I haven't fully read the codes yet because they encoded the private API and can't directly read what api they used to do these things, but I've decoded some private APIs they used and need to take some time to do further investigate.

/* @class OCWindow */
-(void)bringToFront:(char)arg2 {
    rbx = arg2;
    r15 = self;
    rax = [self ownerPid];
    if (rax == 0x0) goto .l1;

loc_10003d3f2:
    rax = GetProcessForPID(rax, &var_40);
    if (rax != 0x0) goto .l1;

loc_10003d405:
    if ([[r15 ownerName] isEqualToString:@"X11"] == 0x0) goto loc_10003d46f;

loc_10003d431:
    if (*(int32_t *)dword_10017eecc >= 0x2) {
            NSLog(@"We can't raise X11 windows, bringing XQuartz to front instead");
    }
    [[r15 ownerApplication] activateWithOptions:0x3];
    return;

.l1:
    return;

loc_10003d46f:
    rax = [r15 axWindow];
    r13 = rax;
    if (rax != 0x0) {
            AXUIElementPerformAction(r13, @"AXRaise");
    }
    var_30 = rbx;
    if (rbx != 0x0) {
            var_2C = 0x1;
            if ([r15 isVisible] == 0x0) {
                    r14 = [OCWindow currentSpaceID];
                    rax = [r15 space];
                    if ((rax != 0x0) && (rax != r14)) {
                            sub_10003bef2(rax, 0x1);
                            if (r13 == 0x0) {
                                    usleep(0x493e0);
                            }
                            var_2C = 0x0;
                    }
            }
    }
    else {
            var_2C = 0x1;
    }
    if (r13 != 0x0) goto loc_10003d54d;

loc_10003d51e:
    rbx = 0xa;
    goto loc_10003d523;

loc_10003d523:
    usleep(0x186a0);
    r13 = [r15 axWindow];
    rbx = rbx - 0x1;
    if (rbx == 0x0) goto loc_10003d544;

loc_10003d53d:
    if (r13 == 0x0) goto loc_10003d523;

loc_10003d54d:
    var_38 = r15;
    xmm0 = intrinsic_movss(xmm0, *(int32_t *)float_value_1);
    AXUIElementSetMessagingTimeout(r13, xmm0);
    r15 = 0x4;
    goto loc_10003d575;

loc_10003d575:
    rax = AXUIElementPerformAction(r13, @"AXRaise");
    if (rax == 0x0) goto loc_10003d5e6;

loc_10003d584:
    r14 = rax;
    if (*(int32_t *)dword_10017eecc > 0x0) {
            NSLog(@"Couldn't raise (errno: %d), trying again ...", r14);
    }
    xmm0 = intrinsic_movss(xmm0, *(int32_t *)float_value_3);
    AXUIElementSetMessagingTimeout(r13, xmm0);
    r15 = r15 - 0x1;
    if (r15 != 0x0) goto loc_10003d575;

loc_10003d5b2:
    AXUIElementSetMessagingTimeout(r13, 0x0);
    if (r14 == 0xffff9d8c) {
            r15 = var_38;
            if (*(int32_t *)dword_10017eecc > 0x0) {
                    NSLog(@"AXErrorCannotComplete");
            }
    }
    else {
            rax = SetFrontProcessWithOptions(&var_40, 0x1);
            r15 = var_38;
    }
    goto loc_10003d60a;

loc_10003d60a:
    rdx = @"X11";
    rcx = var_2C | (var_30 == 0x0 ? 0x1 : 0x0);
    if (rcx == 0x0) {
            r14 = dispatch_get_global_queue(0xfffffffffffffffe, 0x0);
            r12 = r15;
            r15 = *__NSConcreteStackBlock;
            var_90 = r15;
            *(&var_90 + 0x8) = 0xffffffffc0000000;
            *(&var_90 + 0x10) = sub_10003d713;
            *(&var_90 + 0x18) = 0x100141058;
            *(&var_90 + 0x20) = var_40;
            dispatch_after(dispatch_time(0x0, 0x11e1a300), r14, &var_90);
            var_68 = r15;
            r15 = r12;
            *(&var_68 + 0x8) = 0xffffffffc2000000;
            *(&var_68 + 0x10) = sub_10003d726;
            *(&var_68 + 0x18) = 0x100140e80;
            *(&var_68 + 0x20) = r12;
            rax = dispatch_time(0x0, 0x1dcd6500);
            rdx = &var_68;
            dispatch_after(rax, r14, rdx);
    }
    [[NSNotificationCenter defaultCenter] postNotificationName:@"OCWindowBroughtToFrontNotification" object:r15];
    return;

loc_10003d5e6:
    AXUIElementSetMessagingTimeout(r13, 0x0);
    rax = SetFrontProcessWithOptions(&var_40, 0x1);
    r15 = var_38;
    goto loc_10003d60a;

loc_10003d544:
    if (r13 == 0x0) goto loc_10003d60a;
}
lwouis commented 2 years ago

@jkelleyrtp @metacodes first of all, thank you for digging into these advanced tricks and trying to find a breakthrough. I also decompiled the other apps to try to understand how they do it. I never got any secret trick though to be honest. My reverse-engineering skills are pretty low.

Now, I'd like to quote myself again, and please read carefully what I'm talking about:

Thinking about it, the trick of invisible windows on all Spaces is not good enough. We need to have AXUIElement for all windows. This gives us the correct titles, role, subrole, etc. We use these as heuristics to decide if the windows should be shown or not (see https://github.com/lwouis/alt-tab-macos/issues/456).

By accepting that we sometimes don't have an AXUIElement for a window, we need to implement a whole mirror logic using the CG APIs: get the title, filter windows based on other criterias specific to the CG APIs, etc.

There must be a better way. I wonder if it's possible to activate a window on another Space without switching to that Space. Maybe in that scenario, we get access to that Space's windows.

Please follow the link and see how complex the heuristic to decide if a window is real of not is. Please look at the current implementation.

In addition to detecting real/fake windows, as I said in my original quote, there is the issue of the window metadata like its title. If we use another API than AX to get windows title, then the title will suddenly change once the user visits the Space with that window. Essentially we mislead them until we get the AXref, from which point we have reliable data to show.

Oh and also after checking out the pull-request, I also realize we don't know how to deal with other AX actions: closing a window, minimizing/de-minimizing, fullscreening. If we don't have the AXref, we can't do it, even with the invisible window trick.

So in short: yes, invisible windows are a workaround focusing windows without their AXref, but it we need another workaround for window titles and for window detection still. It's not dealing with the whole problem, just the focus part. It's not good enough.

@metacodes how do you think apple shortcuts gets windows data? Maybe you could decompile and look?

metacodes commented 2 years ago

@metacodes how do you think apple shortcuts gets windows data? Maybe you could decompile and look?

Wow, that looks very cool! I've got something new to work on. Actually, my reverse-engineering skills are also pretty low, hopper is still something I bought to solve AltTab problems, so don't expect too much from me.😄 But this reverse-engineering is very interesting, so I can use it to pass the boring time during the epidemic(COVID-19).

metacodes commented 2 years ago

@metacodes how do you think apple shortcuts gets windows data? Maybe you could decompile and look?

@lwouis I have tried apple shortcuts just now on macOS 12.3.1. I found that it can't show windows on other space, just show those windows in current space. Bad news.

lwouis commented 2 years ago

Here's my attempt at summarizing the situation, regarding the goal of this ticket:

Why we need to use the AX API

It's tempting to think of solutions involving alternative APIs (e.g. CG APIs, AppleScript, system .plist files, CLI binaries like, Automator.app, Shortcuts.app, etc). Here are the things that we need to do, and that these APIs can't deliver like the AX APIs:

How to get the AX references

There are only 2 ways that I know to get the AX reference of a window on another Space:

The first method is what AltTab does currently. It creates a flash of content at launch (see OP). It also has the issue of being broken from macOS 12.3 onwards after Apple broke the CGSAddWindowsToSpaces API. However it seems that we could simply replace it with CGSMoveWindowsToManagedSpace.

The second method has the problem that when switching to a Space, there is a long animation that can't be avoided.


Here's the situation. Now it's up to us to find a breakthrough workaround.

jkelleyrtp commented 2 years ago

I haven't dove too deep into AX vs CG but the CGSCopySpacesForWindows provides a lot of information. For offscreen windows I don't think you'd run into issues like popups? I imagine you could populate the cache with CGSCopySpacesForWindows first and then update it with more accurate AX information as you visit those spaces. For me, CGSCopySpacesForWindows solves (approximately, can be updated later with AXrefs) these two issues:

These two can be solved by bringing the window from a foreign space to the current one and then performing the action after getting the AX ref (or using a cached ref it exists):

This one can be solved with an invisible window (or with cached ax ref)

I think using a rough heuristic and then populating it with updated information would at least solve the issue for me where alt-tab doesn't show any of my apps when I launch it, and I have to dance between desktops. It also seems like some of my apps never make it into the carousel, hence why I've been digging into the alt-tab source.

metacodes commented 2 years ago

@jkelleyrtp Maybe you should go to see the code review comments in PR #1484 . We can close/minimize/de-minimize/focus a window after we switch to that space, but not ideal.

metacodes commented 2 years ago

@lwouis I have an idea, is it possible for us to develop a Daemon like WindowServer process that is started before the user logs in. It could listen to the AXRef of all the programs after it is started, similar to an AXRef state machine. This means that it can have the AXRefs of all programs after the user logs in. Thus, when AltTab encounters a program that needs to be operated that does not have an AXRef, it sends a request to the Daemon to do the operation for it. We can't put all the logic into the Daemon, on the one hand I don't know if there is any API limitation for this Daemon, on the other hand if we need to update the Daemon frequently, it will require the user to restart the computer, otherwise there is no way to manage all the AXRefs. This Daemon just keeps some AX references and simple window operations, like closing, minimizing, maximizing, etc. Also I've only looked a little at Daemon as a technology and am not sure if it can be started before the user logs in, and if there are any API limitations to this daemon that it can't get AXRef. https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/Introduction.html

lwouis commented 2 years ago

That is a very clever idea @metacodes!

I think it can work actually. I think technically there are no blockers for your idea. Yesterday i removed some values from AltTab LaunchAgent (that launches AltTab at login) and in a fun coincidence, i noticed that the documentation said that:

RunAtLoad: Running the job when it is loaded

This key is used to start the job as soon as it has been loaded. For daemons this means execution at boot time, for agents execution at login.

We could indeed use a global daemon at computer boot instead of a launchagent (here an app) at login, we we are in competition/race with the other login apps.

Having a "backend" to track OS windows state, and the app being a UI to see and order it, this has been discussed a bit in #371 already. There is even a PR: https://github.com/lwouis/alt-tab-macos/pull/768.

While it is a very interesting idea, there are difficulties to deal with if we go with this new architecture:

It's a great idea in theory but i can tell it will be a lot of work to make a POC and then the real full solution. That being said it could elevate AltTab from a simple app to a backend many projects could built on top of. There are many projects already who attempt to tame windows state management and as far as i know AltTab is the only reliable one. Except for yabai, but yabai injects the Dock and does very intrusive things that require the user to disable SIP which is a lot to ask of users, thus making it a niche solution for really motivated power users

That being said, using a backend would only solve the login/boot situation. A user who restarts AltTab during their session would still need a trick to see windows from other Spaces. We may have a popup to tell them that they need to reboot? It's not ideal. We would probably keep our current trick here to be able to show windows still. Mmm not sure what to think

metacodes commented 2 years ago

A user who restarts AltTab during their session would still need a trick to see windows from other Spaces. We may have a popup to tell them that they need to reboot?

This only happens when updating Daemon, so it's only possible to keep Daemon simple and keep its updates to a minimum. Once it needs to be updated, we do need to have a popup or something to let the user know about it. Just considering how often Daemon is updated, this might be acceptable. But from those technical difficulties you mentioned above, it's really hard for us to just switch to that model at the moment.

lwouis commented 2 years ago

It's a good point. I think Sparkle can deliver updates with that level of detail like "only update the main app" or "update the main app + the daemon". That could improve the UX for sure. We would need a new CI script to tag the delivery manifest based on which files were updated.

Yeah it's big groundworks but it may be the only solution long-term, i don't know

jkelleyrtp commented 2 years ago

It's a good point. I think Sparkle can deliver updates with that level of detail like "only update the main app" or "update the main app + the daemon". That could improve the UX for sure. We would need a new CI script to tag the delivery manifest based on which files were updated.

Yeah it's big groundworks but it may be the only solution long-term, i don't know

Can you just update both the daemon and the app in one go if you flush the daemon state to disk?

One thing that might hold the daemon up is getting AXRefs of windows in the first place. I see that the discovery phase involves AXUIElementCreateApplication(pid). This gives you the AXRef for the app - but to get the refs for its (existing) windows all I know is kAXWindowsAttribute which requires the windows to be currently on screen. With the observer API you're limited to intercepting events to windows for their discovery.

I'm not sure how alt-tab does it, but if it's done through the kAXWindowsAttribute - that only works in the current space.

Anyways, I just tested a "true" daemon in Rust and it doesn't really work. macOS will just crash your daemon if you make various calls.

more /tmp/daemon.err
objc[92408]: +[NSNumber initialize] may have been in progress in another thread when fork() was called.
objc[92408]: +[NSNumber initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

The calls I observed were:

fn the_de() {
    let stdout = File::create("/tmp/daemon.out").unwrap();
    let stderr = File::create("/tmp/daemon.err").unwrap();

    let daemonize = Daemonize::new()
        .pid_file("/tmp/test.pid") // Every method except `new` and `start`
        .chown_pid_file(false) // is optional, see `Daemonize` documentation
        .working_directory("/tmp") // for default behaviour.
        .stdout(stdout) // Redirect stdout to `/tmp/daemon.out`.
        .stderr(stderr) // Redirect stderr to `/tmp/daemon.err`.
        .exit_action(|| println!("Executed before master process exits"))
        .privileged_action(|| "Executed before drop privileges");

    match daemonize.start() {
        Err(e) => eprintln!("Error, {}", e),
        Ok(_) => {
            let pid = 503;

            let app_ref = unsafe { AXUIElementCreateApplication(pid as i32) };
            let mut window_list_ref = std::ptr::null();
            unsafe {
                AXUIElementCopyAttributeValues(
                    app_ref,
                    CFString::new(kAXWindowsAttribute).as_concrete_TypeRef(),
                    0,
                    9999999,
                    &mut window_list_ref,
                )
            };

            if !window_list_ref.is_null() {
                let window_count = unsafe { CFArrayGetCount(window_list_ref) };
                for i in 0..window_count {
                    let mut window_id: u32 = 0;

                    let window_ref = unsafe {
                        CFArrayGetValueAtIndex(window_list_ref, i as isize) as AXUIElementRef
                    };

                    unsafe { _AXUIElementGetWindow(window_ref, &mut window_id) };

                    println!("Window: {}", window_id);
                }
            }
        }
    }
}

An alternative is just another app running in the background - it wouldn't be a pure a daemon since it still as a CFRunLoop.

lwouis commented 2 years ago

Can you just update both the daemon and the app in one go if you flush the daemon state to disk?

It's interesting. We would need to have a special handling to serialize to disk only for updates. Because on reboot we certainly don't want to resume from previous state on disk, but start fresh.

I'm not sure how alt-tab does it, but if it's done through the kAXWindowsAttribute - that only works in the current space.

The only reliable way to get windows is not to ask for them with this call, but to subscribe to the app AXref in order to be notified when the app creates a window. So the daemon would start on boot, and start observing all apps. After the user logs in, apps will launch, open windows, and the daemon would see all these. Today AltTab launches at the same time so it can miss some apps launched before itself.

An alternative is just another app running in the background - it wouldn't be a pure a daemon since it still as a CFRunLoop.

What would be the value of that? We could do it the current app then. The idea of having a separate daemon would be that it starts before login, at boot. So it precedes all start-at-login apps, and can subscribe to their events early.

lwouis commented 2 years ago

Today I looked further into this idea of splitting AltTab into 2 parts: a background process that starts as early as possible + a GUI app that gets global state from the background process and shows UI.

I found Apple docs on Agents and Daemon incredibly complex.

Out of the points I raised above, there was the issue of whether or not APIs (including private APIs) AltTab uses to get Apps/Windows state would work in a background process launched before login.

This section for instance explains how much of a mess the OS libraries are in, and how it's a bit of a "good luck" situation to make calls in a daemon context.

Doing some more archeology, I found an interesting sample of a "pre-login agent", which has this interesting note:

CGEventTaps do not work by default in the pre-login environment rdar://problem/5636091. If you need a workaround, please get in touch with DTS.

Oh and even if we make a POC, and it seems to work:

The behavior of any given framework, and the routines within that framework, can change from release-to-release.

Which means we would need to test it on all previous macOS versions/archs, etc.

lwouis commented 2 years ago

@koekeishiya have you explored running (parts of) yabai as a daemon? According to the docs I linked above, it may give elevated privileges to some operations, and give a different access to WindowServer. I wonder if it could avoid the Dock injection

koekeishiya commented 2 years ago

I have not looked into that, but it would not be able to replace the Dock injection for the things that yabai provide as we do runtime code patching, which requires us to have access to the memory space.

I am not sure if running it as a pre-login daemon would work, but it could be interesting to provide something like an AX-service that could be called into using mach messages or an XPC interface. This would make it easier for other interested parties to develop this type of software, as the hard part is imo figuring out how to properly retrieve and manage the state of macOS.

Then again, for something like this to be worth the time investment there would need to be a sizeable amount of interested people, which from what I can tell seems to be rather lacking. There are quite some people that are interested in finished solutions, but very very few people are actually willing to put in any effort to make it happen.

The predecessor to yabai, chunkwm, although it had some issues, utilised a plugin system in which I attempted to make it easy for people to build and expand the core on their own, and the interest in this area was so low that it is just not worth it. Which is why yabai is made as a single-binary with preset features (although there are scripting capabilities for those who need a little bit more).

Maybe things have changed now and more people would be willing to commit, but I doubt it.

--

I guess I'd say the idea is interesting, but without doing an actual implementation it is hard to say how it would behave. When macOS restores applications/windows upon boot, are you able to get your code running before this happens and reach those applications using the AX API to retrieve AX-Refs to their windows? If not, then I don't really see a benefit in getting the software to run at this level.

lwouis commented 2 years ago

we do runtime code patching, which requires us to have access to the memory space.

I see, so not working then.

it could be interesting to provide something like an AX-service

Yes, this is indeed the idea we were discussing here. I've seen quite a few github projects that attempt to do just that. They all become abandoned after a while though.

And yes indeed it would be lots of work to separate that service from AltTab or Yabai and make it a general-purpose service.

very very few people are actually willing to put in any effort to make it happen.

I totally share this assessment. I've designed AltTab with other contributors in mind. The project is decentralized as can be, requires no external tools, etc. Yet barely anyone has ever contributed code, and they never stick around for more. I've also opened a ticket to look for a someone to take over half a year ago, and no-one has raised their hand.

The scope and complexity of this time of work demands a team/community, but there is none to be found. I'm really impressed at what you've been able to accomplish with yabai to be honest, in such a solitary environment.

are you able to get your code running before this happens

In theory it will, so that would be a nice benefit compared to today. In practice i have no POC. It's a very large amount of work to attempt even a simple POC. It's a true pain to develop such service since you have to reboot your machine every time to test it... then you have to figure out how to log, you can't use a debugger, nothing is documented, etc.

I don't see myself going that route. I would if there was a community supporting/cheering, but it's not the case.

metacodes commented 2 years ago

Currently, I'm doing the POC we discussed here based on this project. But I meet some problem due to my poor knowledge about developing Apple programs. @lwouis Do you have any idea about that problem?

lwouis commented 2 years ago

The project you linked to is quite impressive in complexity and exhaustiveness. I was impressed and at the same time terrified just reading through the long Readme. So many things to consider...

At some point they state:

the helper tool must be a Command Line Tool (not an app bundle)

This makes me wonder if you can use AppKit API inside the helper. Maybe not. NSWorkspace (https://developer.apple.com/documentation/appkit/nsworkspace) for example is an AppKit API.

I think you should clarify what type of APIs can be called from inside the helper tool. The fact that it's called a Command Line Tool, and the fact that your error messages mention the main UI loop being dead, make me think that maybe you can't have any UI code in there, which may include any code referencing AppKit API.

jkelleyrtp commented 2 years ago

In my dameon code above, I discovered it is not possible to use specific AX APIs from an app that does not access to a CFRunLoop.

lwouis commented 2 years ago

It seems you need to run some commands to ensure the main CFRunLoop continues to run, inside the CLP

metacodes commented 2 years ago

I just found some documentation that might be helpful for this POC. We may be able to achieve the goal of the POC with PreLoginAgents. Some references:

metacodes commented 2 years ago

I have tested the PreLoginAgents demo and it works. But the problem is PreLoginAgents will be terminated when the user logs in. It cannot constantly run in the background.

I have an idea, is it possible for us to develop a Daemon like WindowServer process that is started before the user logs in. It could listen to the AXRef of all the programs after it is started, similar to an AXRef state machine.

So far I think that idea is bankrupt.😮‍💨

lwouis commented 2 years ago

@metacodes yes, it seems that the Agent will be terminated when the user logs in, and a new instance of the Agent will be started. You can see it explained here:

image

Specifically:

If you set LimitLoadToSessionType to an array, be aware that each instance of your agent runs independently. For example, if you set up your agent to run in LoginWindow and Aqua, the system will first run an instance of your agent in the loginwindow context. When a user logs in, that instance will be terminated and a second instance will launch in the standard GUI context.

In your test, did you have LimitLoadToSessionType set to LoginWindow or set to LoginWindow + Aqua?

If you only use LoginWindow, maybe there is a way to keep the agent alive after the user logs in?

Alternatives we could consider:

metacodes commented 2 years ago

@lwouis Here is my launchd.plist.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.example.apple-samplecode.PreLoginAgentCocoa</string>
    <key>LimitLoadToSessionType</key>
    <string>LoginWindow</string>
    <key>KeepAlive</key>
    <true/>
    <key>ProgramArguments</key>
    <array>
        <string>/Library/PrivilegedHelperTools/PreLoginAgentCocoa.app/Contents/MacOS/PreLoginAgentCocoa</string>
        <string>-CleanExit</string>
        <string>NO</string>
    </array>
</dict>
</plist>

In your test, did you have LimitLoadToSessionType set to LoginWindow or set to LoginWindow + Aqua?

I just set LimitLoadToSessionType to LoginWindow.

Keeping it a LaunchAgent, but setting LimitLoadToSessionType to Background. @metacodes have you tried that? This StackOverflow thread here says: " It should run no matter a user is logged into the machine or not (aka in the background session)" which sounds good for us.

I will try to set LimitLoadToSessionType to Background and see how it works.

Using a Launch Daemon?

In Apple's documents, it says that it's not daemon-safe to use AppKit or Cocoa. But we need to use some API of AppKit or Cocoa, right? That's the reason I focus on PreLoginAgent. If Launch Daemon can use AppKit, it's the best choice.

metacodes commented 2 years ago

Keeping it a LaunchAgent, but setting LimitLoadToSessionType to Background. @metacodes have you tried that? This StackOverflow thread here says: " It should run no matter a user is logged into the machine or not (aka in the background session)" which sounds good for us.

@lwouis I have set LimitLoadToSessionType to Background, but it only works when user logs in. I also put the PreLoginAgent into /Library/LaunchDaemon, it didn't work at all.

lwouis commented 2 years ago

I have set LimitLoadToSessionType to Background, but it only works when user logs in.

Do you think it launches at the same time as other Agents with Aqua? It may be worth it to add another Aqua agent and compare the timestamps of their logs. This SO response seems to indicate that it's at the same time, but would be nice to test a bit just in case.

Because at the end of the day, we only need to be running slightly before the other GUI apps are running. We don't really need to be running pre-login. It may have been a way to run before other apps, but if Background runs before Aqua, then it would also achieve the same goal, with way less complexity.

if not, then it's looking grim for this new approach :/

I found this SO as well which seems to confirm that LaunchAgent may not be suitable for what we want, as they can't persist past the login screen.

lwouis commented 2 years ago

@metacodes so the whole discussion about agents/daemons/etc was so that we could have AltTab launch before the other GUI apps. It seems that it may not be possible to achieve that. But even if somehow we find a way to do it, it would only improve the situation of detecting apps on login/boot/reboot.

It may be more fruitful to spend energy trying to find tricks to get the state of the apps/windows at any point of time. This would benefit any situation: after login, but also anytime AltTab is launched during the session, including after the user updates AltTab.

It's something I mentioned when we started exploring the daemon idea (at the bottom of my comment) a while ago. Also, see the bullet points about all the problems we would need to solve to introduce that piece of technology.

It may be a better road to try and find better tricks, private APIs, and what-not, to be able to get the state of windows at any time. As a reminder, this is the situation we are trying to solve. I re-read the message, and it still nicely summarizes the challenges to overcome.

metacodes commented 2 years ago

Can we solve this problem by Dock injection like yabai?

lwouis commented 2 years ago

@metacodes that would be really nice. I don't have the skills though. It's pretty advanced retro-engineering. If you want to explore it, i will support you as much as i can

koekeishiya commented 2 years ago

By injecting code into the Dock, you will be able to focus any window using only the window id. This works because GUI applications register themselves with the Dock, setting up a communication channel using mach ports. (This is done automatically by the AppKit framework; the final call is in HIServices in Carbon).

So I guess yes, this will allow you to bypass the AX-Ref when focusing a window, but probably not if you want to close or minimise a window. I guess that doesn't really matter, as you can always grab that AX-Ref after it gets focus the first time.

This is the function you need to hook into (sample is for macOS 12.4.0 Intel x86-64) :

Screenshot 2022-05-18 at 10 17 53

Sample calling this function from yabai (after injecting into the Dock): https://github.com/koekeishiya/yabai/blob/master/src/osax/payload.m#L726

lwouis commented 2 years ago

Thanks @koekeishiya for sharing this precious information~

What worries me is that if we go that route:

koekeishiya commented 2 years ago

I agree, for the purposes of this software it is not an ideal solution.

My naive idea would instead be to focus every space once during startup, calling the AX API to retrieve refs -- once for each space, and then re-focus the original space again. I think this should work fine, but there will be a short span of visual flicker during first launch. Not sure how acceptable that is, but it should be able to detect all windows.

You'd need a combination of the following API's to do this:

extern void CGSManagedDisplaySetCurrentSpace(int cid, CFStringRef display_ref, uint64_t spid);
extern uint64_t CGSManagedDisplayGetCurrentSpace(int cid, CFStringRef display_ref);
extern CFArrayRef CGSCopyManagedDisplaySpaces(const int cid);
extern CFStringRef CGSCopyManagedDisplayForSpace(const int cid, uint64_t spid);
extern void CGSShowSpaces(int cid, CFArrayRef spaces);
extern void CGSHideSpaces(int cid, CFArrayRef spaces);

This process would likely have to happen for each connected monitor:

# https://developer.apple.com/documentation/coregraphics/1454603-cggetactivedisplaylist
CGGetActiveDisplayList(display_count, result, count);

# convert a CGDisplayID to a CFStringRef (UUID) used by the above spaces API
CFStringRef display_uuid(uint32_t did)
{
    CFUUIDRef uuid_ref = CGDisplayCreateUUIDFromDisplayID(did);
    if (!uuid_ref) return NULL;

    CFStringRef uuid_str = CFUUIDCreateString(NULL, uuid_ref);
    CFRelease(uuid_ref);

    return uuid_str;
}
lwouis commented 2 years ago

@koekeishiya this seems very interesting. However, looking around github for references, I found another ticket, where you were actually already giving that advice 2 years ago. However, back then, you were warning that without injecting Dock.app, it would bring Mission Control out-of-sync:

You can combine CGSShowSpaces, CGSHideSpaces and CGSSetCurrentSpaceForManagedDisplay to do a proper space switch (but mission control is out of sync, unless you update the internal datastructures in the Dock like yabai does.)

What are the consequences of that OOS? Can it break the user desktop? The worse I've seen in past experiment was the top section of the screen that slides down when you bring Mission Control up, and that lists all the Spaces, getting a black background. Almost like if your GPU is dying.

I'm worried to explore this workaround but that it would end up breaking some users environment some of the time.

koekeishiya commented 2 years ago

As long as you only use it as a trick to detect the open windows, and restore focus back to the original space, there should be no side-effects.

lwouis commented 2 years ago

I played around with these APIS and here are my observations:

CGSManagedDisplaySetCurrentSpace/ CGSHideSpaces

CGSManagedDisplaySetCurrentSpace does the job indeed. It brings the windows of the Space on the current Space, and we can grab their AXref. The windows pop in though, like we have today.

It seems to be possible to use CGSHideSpaces right after to hide these windows while we grab the AXref. However, in my tests, it would sometimes make the whole screen black, which is a pretty bad failure-mode for the users. But we could simply accept the pop-in as we have today. CGSHideSpaces can be viewed as a refinement to avoid pop-in, regardless on the API we end up using to grab AXrefs.

The bigger issue I think is that we have to grab the AXref 1 Space at a time with this API. This is a problem because grabbing the AXrefs happens in unbounded time. Today AltTab sets a 2s time-window, during which it will try and grab the AXrefs of all windows on screen. If we move to this new API, we would have to do something like divide 2s by how many Spaces the user has, and only have that small time window to try and grab AXrefs. This seems like it would create issues with users who have lots of Spaces.

CGSShowSpaces shows the windows from the Space(s) onto the current Space. However, it's purely visual, and we can't grab the AXrefs while the windows are shown. So this can be used to maybe mitigate pop-in, but not to grab AXrefs.

Update: this API is able to move fullscreen windows to the current Space. However, I found no way to remove them visually after we grab the AXref. I tried to:

In all cases, the fullscreen window stays on top of the current Space, and won't go away. @koekeishiya any idea on how to avoid that?

_CGSProcessAssignToAllSpaces/ _CGSProcessAssignToSpace

~I found that these APIs could also do the trick. They may be the best replacement for CGSAddWindowsToSpaces/CGSRemoveWindowsFromSpaces because it would be a 1-to-1 replacement. We would assign all the processes from the other Spaces to the current Space at once, grab the AXrefs, then assign them to their original Spaces.~

~The issue however, is that I don't know how to obtain the original Space ID for a given pid. So I can't restore the process to their original Spaces. @koekeishiya do you know a way to grab the SpaceID of a given pid?~

~There is also the issue that the windows pop in visually on the current Space. But that's already the case today, so again, 1-to-1 replacement. I also wonder if this API works in macOS 12.2+.~

Update: nevermind, it does not move fullscreen windows, so this API is a no go.

koekeishiya commented 2 years ago

In all cases, the fullscreen window stays on top of the current Space, and won't go away. @koekeishiya any idea on how to avoid that?

Unfortunately, I don't really have an answer to that. I have not spent a lot of time trying to dig into how fullscreen spaces work under the hood.

The bigger issue I think is that we have to grab the AXref 1 Space at a time with this API. This is a problem because grabbing the AXrefs happens in unbounded time. Today AltTab sets a 2s time-window, during which it will try and grab the AXrefs of all windows on screen. If we move to this new API, we would have to do something like divide 2s by how many Spaces the user has, and only have that small time window to try and grab AXrefs. This seems like it would create issues with users who have lots of Spaces.

Maybe you could try to figure out which spaces have windows on them to reduce the set of spaces that you need to look at in the first place. Not sure if that will have a noticeable effect in practice.

I guess the cleanest way might just be to do the Dock injection and focus windows without the AX API. You could use Yabai's scripting addition to make a proof of concept without having to invest too much time, but as you mentioned this has other cons associated with it -- mainly that it requires SIP to be disabled.

You don't necessarily even need to inject code into the Dock, but you do need to be able to read from its memory space. I wonder if this could be done with a simple codesigning entitlement: allow-task-for-pid.