Recomputing Thumbnails is slow and expensive

galli-leo commented 4 years ago

First of all, thank you for this great app, I really like it!

However, on my 2016 MacBook Pro, getting the UI to show up feels very sluggish, even though I set the delay to 0ms. After investigating the code a bit and running some tests, it seems like recomputing the thumbnails every time the UI shows up is quite expensive.

In my testing I had around 10 windows open and the following lines of code resulted in around 200ms of time: https://github.com/lwouis/alt-tab-macos/blob/7ff17b5882f07d17f66d9019c78449cf9cf526a6/alt-tab-macos/ui/Application.swift#L151-L154 Most of that time was spent in the thumbnail function of OpenWindow. A few suggestions on how this could be fixed:

Use multithreading for capturing the thumbnails.
Capture thumbnails on background thread, then when done update the UI. This would really help getting the UI to show instantly. However, to be able to show something for each window initially I would also:
Cache the previously captured thumbnails.
Figure out how Mission Control does it and replicate that. If it's similar to iOS multitasking, it's probably taking spontaneous snapshots and storing them somewhere on disk. If we could figure out where, then those could be used.
Have a continuously running background task that captures thumbnails every few seconds / minutes and caches them. (Maybe even checks if anything changed and if not increase the time for that specific window?)

lwouis commented 4 years ago

Thanks for the improvement suggestions! Here is some feedback:

Cache the previously captured thumbnails. Have a continuously running background task that captures thumbnails every few seconds / minutes and caches them. (Maybe even checks if anything changed and if not increase the time for that specific window?)

I really dislike the idea that the UI is shown to the user not up-to-date, then gets update while the user is interacting with it. I think it's bad UX, and I observe that neither Windows nor macOS built-in thumbnail systems do that. On Windows 10 the thumbnails are even live videos. They may have access to higher performance private APIs though; or they may have been done by C/C++ engineers who know how to code actual performant software, which is not my expertise unfortunately.

FYI, I had originally my own downscaling for the thumbnails, and it was too slow. I then offloaded that task to NSImageView which performed way better.

Capture thumbnails on background thread, then when done update the UI. This would really help getting the UI to show instantly. However, to be able to show something for each window initially I would also:

This is a bit tricky because you are not too interested in the 200ms delay before showing the UI, but I personally like it so I would like to keep it in. Keeping it in means that

After pressing the shortcut, we start computing the thumbnails, and after 200ms-if-they-are-ready-or-until-they-are-ready, we show them

It's clearly a better approach than currently which is:

After pressing the shortcut, we wait 200ms, then compute the thumbnails and show them

So yes it's great, just that concurrency on macOS is surprisingly difficult without third-party libraries. I'm used to nice APIs like observers RxJS/RxJava, but GCD on macOS is not the simplest API... So here waiting 200ms or more depending on if thumbnails are ready + handling the case where the user releases before that, and we switch focus to the window before showing the UI, while also cancelling the concurrent tasks so that the UI doesn't show up afterwards, confusing the user; well it's a bit of a challenge.

Use multithreading for capturing the thumbnails.

This is my favorite suggestion out of the bunch. I will look into it when I have time. It shouldn't be too hard, hopefully. Of course you can open a PR ;p

mfn commented 4 years ago

HyperSwitch does the caching, it works this way (from memory and years of using it):

You bring it up (with 0ms) delay, and you see the old thumbs.
If you wait "long enough" eventually in the background they were recreated.
If you "never wait long enough" and always quickly switch between the windows, then they will never update

As in: CPU cycles are only always "wasted" in case HyperSwitch is interactively used by the user, not otherwise.

It annoys me the thumbs are not up2date (being accustomed it "just works" on windows) but I put higher value the software does not perform tasks not obvious to me. I use it daily on a Late 2013 MacBook Pro and, in terms of resources, it never gets in my way.

lwouis commented 4 years ago

I think there is no need to compromise by caching.

Looking at modern games, it is clear that resizing thumbnails in real-time is doable on modern hardware. The solution involves better code; that's it. We discussed above CPU multi-threading, but I think the target goal should be to downscale on the GPU where it is basically free. I'm not sure how to do that through Cocoa though. NSImageView seems to perform poorly because it is downscaling using the CPU. There may be some lower level API like CALayer that can be drawn to and resize via the GPU for cheap.

jdheyburn commented 4 years ago

Could there be an option to remove thumbnails and just have the App icons and title? This would remove any element of recalculating thumbnails and speed up use.

I say this because I experience a huge delay in waiting for the UI to appear (> 1s) - I'd rather have no thumbnails if they'll cause this.

lwouis commented 4 years ago

Could there be an option to remove thumbnails

This was discussed in #4 for additional context. I'm open to the idea of a preference to remove thumbnails, however, you may just want to checkout the apps mentioned in the ticket I linked in that case because text-only opens new perspectives that these apps take full advantage of (e.g. fuzzy text search)

galli-leo commented 4 years ago

@lwouis Thanks for your input! I would love to help here, but I am very limited in time.

I am pretty confident that the bottleneck isn’t resizing the images, but rather the way they are retrieved. The CGImageList function call does an IPC call to the WindowServer which then captures the window. Then the WindowServer does another IPC call back with the image. From the disassembly it also looks like there is some locking going on, so that would incur even more overhead.

However, I also found some interesting private APIs inside WindowServer. One function seems to give you access to the raw framebuffer pointer. That could be an option, since the IPC calls would not have to transfer each image anymore, just a pointer.

lwouis commented 4 years ago

@galli-leo very interesting analysis! If what you say is true, and the expensive operation is getting the full-size images from the WindowServer, and not resizing them for thumbnails (like I thought), then the problem is going to be harder to solve.

A cheap test you could do is check the performance of HyperSwitch. Unless they pulled some private API trick, they should experience the same issue on your laptop. Could you run a quick test to compare?

On another note, I'm working on the Preferences UI at the moment, and while trying to add a dropdown for image quality, I read this documentation from Apple:

An NSPopUpButton object uses an NSPopUpButtonCell object to implement its user interface. Note that while a menu is tracking user input, programmatic changes to the menu such as adding, removing, or changing items on the menu is not reflected

Looks like they share my vision about not updating UI while the user is interacting with it ;p

lwouis commented 4 years ago

@galli-leo I just ran Instruments to profile and see what I get. I did not get the picture you painted (i.e. getting the pictures from the WindowServer being the bottleneck). What I got was something like this:

87ms total 44ms collectionview.layout 14ms get app icon Rest of the time spent in OS functions like CFRunLoop and NSView layouts

I'm don't have enough knowledge to push the performance investigation further. I wish I could though

galli-leo commented 4 years ago

@lwouis When I tested with Instruments I saw a lot of waiting inside the Run loop but not much in the actual drawing methods (which seems to be similar to your result). I highly suspect this is due to the way the XPC calls work, where the app waits inside the main run loop until it gets a response.

Regarding HyperSwitch: They use a ton of Private APIs, (I would guesstimate about 200 Private Functions, haven't had the time to fully look at all of them yet). And looking closer at HyperSwitch, it seems clear that you can actually get access to the raw window data (well at least you don't have to use XPC calls to transfer the images).

If I have some free time, I will try to get a Proof of Concept working, using what I learn from HyperSwitch :)

lwouis commented 4 years ago

Very interesting @galli-leo! I did a more in-depth exploration of performance on the discord channel. You may want to have a quick look to see what I believe the offenders are.

Regarding HyperSwitch, 200 private APIs is a shocking number! I can't picture in my head why they would need so many. Btw I integrated 1 private API to get screenshots of minimized windows. If you're curious you can see that in action in #78.

Regarding private APIs in general, my position is the following: if there is no choice to use one to implement a feature, then we should do it. However there is a steep cost in maintenance, maintainability, and portability. We discussed here how there is no compat table available online. That means that using a private API is not guaranteed to work on other OS versions than the author's local machine where the change was tested locally. Basically it's a can of worms so its usage should be minimized to limit impact

koekeishiya commented 4 years ago

@galli-leo @lwouis

highly suspect this is due to the way the XPC calls work, where the app waits inside the main run loop until it gets a response.

Regarding HyperSwitch: They use a ton of Private APIs, (I would guesstimate about 200 Private Functions, haven't had the time to fully look at all of them yet). And looking closer at HyperSwitch, it seems clear that you can actually get access to the raw window data (well at least you don't have to use XPC calls to transfer the images).

The private functions are also part of a client API that communicate with the WindowServer using mach messages (MIG IPC), triggering a corresponding server-side function invocation. IIRC the XPC mechanism is an abstraction that is built on top of mach.

Out of interest, could either of you post a symbols dump of HyperSwitch?

galli-leo commented 4 years ago

@koekeishiya Symbol dump, below. However, I don't know which ones of these were really used.

_CGSDefaultConnection
CGSMainConnectionID
CGSSetUniversalOwner
CGSSetOtherUniversalConnection
CGSNewConnection
CGSGetConnectionIDForPSN
CGSDisableUpdate
CGSReenableUpdate
CGSGetWindowCount
CGSGetWindowList
CGSGetOnScreenWindowCount
CGSGetOnScreenWindowList
CGSGetWorkspaceWindowCount
CGSGetWorkspaceWindowList
CGSGetWorkspaceWindowCountWithOptionsAndTags
CGSGetWorkspaceWindowListWithOptionsAndTags
CGSGetWorkspaceWindowGroup
CGSSessionCopyPreferencesForWorkspaces
CGSGetSpaceManagementMode
CGSGetParentWindowList
CGSGetWindowLevel
CGSSetWindowLevel
CGSCycleWindows
CGSOrderWindow
CGSWindowIsOrderedIn
CGSWindowIsVisible
CGSFlushWindow
CGSGetWindowBounds
CGSGetScreenRectForWindow
CGSMoveWindow
CGSSetWindowTransform
CGSGetWindowTransform
CGSSetWindowTransforms
CGSSetWindowAlpha
CGSSetWindowListAlpha
CGSGetWindowAlpha
CGSSetWindowListBrightness
CGSMoveWorkspaceWindows
CGSMoveWorkspaceWindowList
CGSSetWindowShadowAndRimParameters
CGSGetWindowShadowAndRimParameters
CGSInvalidateWindowShadow
CGSCopyWindowProperty
CGSSetWindowProperty
CGSGetWindowOwner
CGSConnectionGetPID
CGSSetWindowOriginRelativeToWindow
CGSGetWindowType
CGSSetActiveWindow
CGSGetWindowTags
CGSSetWindowTags
CGSClearWindowTags
CGSGetWindowEventMask
CGSSetWindowEventMask
CGSSetWindowWarp
CGSNewCIFilterByName
CGSAddWindowFilter
CGSRemoveWindowFilter
CGSReleaseCIFilter
CGSSetCIFilterValuesFromDictionary
CGSSetWindowBackgroundBlurRadius
CGSNewTransition
CGSInvokeTransition
CGSReleaseTransition
CGSGetWorkspace
CGSGetWindowWorkspace
CGSGetWindowWorkspaceIgnoringVisibility
CGSSetWorkspace
CGSSetWindowListWorkspace
CGSSetWindowListWorkspaceIgnoringConsistency
CGSRegisterNotifyProc
CGSRemoveNotifyProc
CGSRegisterConnectionNotifyProc
CGSRequestNotificationsForWindows
CGSSessionSetNotificationConnectionForWorkspaces
CGSNewRegionWithRect
CGSNewEmptyRegion
CGSReleaseRegion
CGSNewWindowWithOpaqueShape
CGSReleaseWindow
CGWindowContextCreate
CGContextCopyWindowCaptureContentsToRect
CGContextCopyWindowCaptureContentsToRectWithOptions
CGSCaptureWindowsContentsToRect
CGSCaptureWindowsContentsToRectWithOptions
CGImageSetCachingFlags
CGImageGetCachingFlags
CGSSetConnectionProperty
CGSHWCaptureWindowList
CGSFindWindowAndOwner
CGSSetWorkspaceForWindow
CGSSetFrontWindow
CGSCreateWindowDebugInfo
CGSPackagesCopyWorkspaceIdentifierForWorkspace
CGSPackagesGetWorkspaceType
CGSPackagesCopyWorkspaces
CGSSpaceGetCompatID
CGSSpaceGetType
CGSSpaceCreate
CGSSpaceCopyOwners
CGSSpaceCopyValues
CGSCopySpaces
CGSCopySpacesForWindows
CGSSpaceDestroy
CGSSpaceCopyName
CGSAddWindowsToSpaces
CGSRemoveWindowsFromSpaces
CGSCopyManagedDisplaySpaces
CGSGetActiveSpace
CGSCopyManagedDisplays
CGSCopyManagedDisplayForSpace
CGSCopyManagedDisplayForWindow
CGSCopyBestManagedDisplayForPoint
CGSCopyBestManagedDisplayForRect
CGSGetDisplaysWithUUID
CGSSetActiveMenuBarDisplayIdentifier
CGSCopyActiveMenuBarDisplayIdentifier
CGSReassociateWindowsSpacesByGeometry
CGSManagedDisplayCurrentSpaceAllowsWindow
CGSManagedDisplayGetCurrentSpace
CGSManagedDisplayIsAnimating
CGSCopyWindowsWithOptionsAndTags
_AXUIElementGetWindow
CoreCursorSet
CGSSetWindowToReleaseBackingOnOrderOut
CGSSetWindowAccelerationState
CGSGetWindowAccelerationState
CGSSynchronizeWindow
CGSWindowUpdateIsPending
CGSGetZoomParameters
CGSSetZoomParameters
CGSDisplayIsZoomed
CGSZoomPoint
CGSUnzoomPoint
CGSGetCurrentCursorLocation
CGSCurrentInputPointerPosition
CGSGetDisplayBounds
CGSGetSurfaceCount
CGPixelAccessCreateWithWindow
CGPixelAccessLock
CGPixelAccessUnlock
CGPixelAccessRelease
CGPixelAccessCreateImageFromRectNoCopy
_LSCopyAllApplicationURLs
CPSGetKeyFocusProcess
CGSEventIsAppUnresponsive
CGSSetWindowCornerMask
CGSWindowSetBackdropBackgroundBleed
CABackingStoreGetTypeID
CABackingStoreIsVolatile
CABackingStoreSetVolatile
CoreDockGetTileSize
CoreDockSetTileSize
CoreDockGetOrientationAndPinning
CoreDockSetOrientationAndPinning
CoreDockGetEffect
CoreDockSetEffect
CoreDockGetAutoHideEnabled
CoreDockSetAutoHideEnabled
CoreDockIsMagnificationEnabled
CoreDockSetMagnificationEnabled
CoreDockGetMagnificationSize
CoreDockSetMagnificationSize
CoreDockGetWorkspacesEnabled
CoreDockSetWorkspacesEnabled
CoreDockGetWorkspacesCount
CoreDockSetWorkspacesCount
CoreDockSetPreferences
CoreDockSendNotification
CoreDockGetRect
CoreDockGetContainerRect
CoreDockUpdateWindow
CoreDockAddFileToDock
CGEventGetEventRecordSize
CGEventGetEventRecord
CGEventGetWindowLocation
CGEventGetUnflippedLocation
CGEventGetUnflippedWindowLocation
CGEventRecordPointer
CGEventCreateWithEventRecord
CGEventCopyIOHIDEvent
CGEventSetIOHIDEvent
IOHIDEventCreateData
IOHIDEventGetType
IOHIDEventGetTypeID
IOHIDEventGetPhase
IOHIDEventGetFloatValue
IOHIDEventGetIntegerValue
IOHIDEventGetPosition
IOHIDEventGetChildren
_mthid_isPathCollection
_mthid_pathCollectionCopyAllPaths
_mthid_pathCollectionCopyTouchingPaths
_mthid_pathCollectionGetPosition
_mthid_pathGetPosition
_mthid_isPath
_mthid_pathGetIndex
_mthid_pathIsResting
_mthid_pathIsTouching
_mthid_pathIsStationary
_mthid_pathWasRejected
_mthid_pathGetVelocity

galli-leo commented 4 years ago

@lwouis After experimenting a bit, I came to the following conclusions:

Seems like accessing the raw frame buffer is not possible without either patching WindowServer or CoreGraphics.
HyperSwitch seems to be using a similar API to the one alt-tab is currently using, but they then also use another private API to get the window image a second time? (not really sure how that works yet)
Either Apple fixed something in the latest beta or my testing method was flawed (from the initial post), since it seems like getting the images from WindowServer is actually very fast now. The problem now seems to be the pure rendering of the images. It might be good to either figure out how to render them more quickly or scaling them down.
HyperSwitch basically does everything I proposed in the original post :): a. Updates Thumbnails in background. b. Continuously updates Thumbnails (e.g. you can see them change when the switcher is up) c. Multithreading and background using GCD

lwouis commented 4 years ago

On my system, I'm pretty sure now that getting the images from the WindowServer is the bottleneck. See discussion in that PR. What do you see in Instruments? I'm guessing you see little time spent in alt-tab-mac, and lots of time spent in WindowServer, if you use the Thread State Trace view.

HyperSwitch has a terrible UX I think. Try opening a video and pressing the shortcut multiple time. You'll see the thumbnail is super late to get updated.

Very interesting discussion here, although the solution discussed seems to only cover screen area capture, not windows capture. Also here, where the person explain the locking you mentioned.

galli-leo commented 4 years ago

@lwouis Hmm interesting, I will take another look.

I did the above testing with a small POC application though, maybe there is something alt-tab that slows down the WindowServer?

lwouis commented 4 years ago

I was reading online about how to deal with CGWindowListCreateImage being slow, and someone mentioned that Zoom is doing this. I realized "yeah zoom has screen share but also window only share, as a video stream". Then I checked my Accessibility settings and noticed I never authorized Zoom. It clearly means they are using a private API to grab screenshots/video-stream of other windows as the official CG API is too slow for video and requires the Accessibility permission, and the AV API is after compositing, so can't capture a window behind another window

lwouis commented 4 years ago

maybe there is something alt-tab that slows down the WindowServer?

It's possible. Maybe it's all the bridges between Swift/ObjC/C that create memory copy and destroy the performance of the CGWindowListCreateImage? But maybe it's just that this API is slow, and everybody else is using a faster private API

galli-leo commented 4 years ago

@lwouis You don't need Accessibility settings for screen recording. Also even with private APIs you still need to be allowed to record screen in the settings. I think it might fast enough if you only capture one window, but probably too slow once you start capturing multiple.

koekeishiya commented 4 years ago

@galli-leo @lwouis

I somehow stumbled upon a nicely written blogpost with quite some detailed information into how macOS Graphics work (under the hood) and figured it might be of interest to you:

https://avaidyam.github.io/2019/02/19/DIY-Core-Animation.html

lwouis commented 4 years ago

I've been exploring this topic again. I think this screenshot really shows the story:

Notice how the app is blocked most of the time when the shortcut is pressed. I'm not sure about thread preemption and other low-level mechanism of macOS, but basically how I read this is that the app is waiting on the WindowServer to get the windows screenshot. This is time we can't reduce. This time can be increased by the user having a 4k display, having other work loads on their computer when they press the shortcut, having integrated GPU, etc.

I read many interesting low-level posts from @avaidyam (referenced by @koekeishiya above). I discussed through email with him, and he suggested I use the CAPluginLayer demonstrated in the Diorama repo. This technique allows to integrate rendering of a user app into the OS rendering pipeline, making it possible to get high FPS "video" of all windows. That sounded like the perfect tech to use in alt-tab-macos.

However, that technique has been patched in 10.15 and can only render the app windows, not other apps windows. It was also unstable in previous macOS versions resulting in user session crashes on my 10.14 laptop for instance.

Going forward I have no lead on a tech we could use to address the bad performance of the current CGWindowListCreateImage call to the WindowServer. To recap:

Current CGWindowListCreateImage API is slow and can't be optimized because the slowness is in the OS implementation
Private API CGSCaptureWindowsContentsToRectWithOptions I experienced with to get minimized windows screenshots has the same low performance (it must go through the same internals as CGWindowListCreateImage).
Private API CAPluginLayer has excellent performance but crashes the whole user session frequently up to 10.14, and is downright unable to render other windows starting from 10.15
OS-bundled binary /usr/sbin/screencapture (the bundled Capture utility) is too slow

The only thing I can think of would be to pre-render/cache the thumbnails. But then we trade off responsiveness with showing the user incorrect data when they press the shortcut. Ok they see the list fast, and the list has the correct number of items with correct titles, but the screenshot will then change. We could put a spinner on each thumbnail so the user knows that we are now fetching the freshest image and showing them an old one in the meanwhile. To be honest I'm not sure this is a better experience.

I would love other people opinion here on how to move forward

mfn commented 4 years ago

The only thing I can think of would be to pre-render/cache the thumbnails. But then we trade off responsiveness with showing the user incorrect data when they press the shortcut

I bet this is what HyperSwitch also realized, hence it works that way there.

As a daily user of this feature, to me:: responsiveness is king than accurateness for this use case.

koekeishiya commented 4 years ago

@lwouis

My immediate thoughts for how to speed up this process would be to look for ways to perform a batch operation when communicating with the WindowServer, instead of initiating a new request for every single window. The private function CGSHWCaptureWindowList comes to mind, although I have not verified that it still works.

lwouis commented 4 years ago

I tried using the CGSHWCaptureWindowList function in batch mode without success. It's supposed to input and output multiple windows at the same time. Indeed here is the signature I was able to find on the internet:

extern CFArrayRef CGSHWCaptureWindowList(CGSConnectionID cid, CGSWindowID *windowList, CGSWindowCount windowCount, CGSWindowCaptureOptions options);

Here is how I call it:

func windowScreenshots(_ windowId: [CGSWindowID]) -> Array<CGImage> {
        return CGSHWCaptureWindowList(
                CGSMainConnectionID(),
                UnsafeMutablePointer(mutating: windowId),
                CGSWindowCount(windowId.count),
                CGSWindowCaptureOptions(kCGSCaptureIgnoreGlobalClipShape | kCGSWindowCaptureNominalResolution)
        )!.takeRetainedValue() as! Array<CGImage>
}

However the CGSHWCaptureWindowList always returns an array of 1 element: the window from the first ID in the array of 5 IDs.

I checked in the debugger, and I correctly send an array of 5 IDs for instance in UnsafeMutablePointer(mutating: windowId), and the size is 5 in CGSWindowCount(windowId.count).

@koekeishiya I looked at all results from Google, and all results from Github, but basically I see only 1 project on the web that seems to be using that function, and it's Webkit. The good news is that they still reference it on master today, so it seems like a safe private SPI to use. The bad news is that they call it with a hardcoded CGSWindowCount of 1. Do you know if I'm calling it incorrectly, or if perhaps they changed the spec and it only returns 1 window now.

@gingerr could you please test the performance on this branch? Somehow for a few days I haven't been able to have the app be slow on my laptop. I tried opening hundreds of windows like I used to, but still it doesn't slow down much. Could you test with your 4k windows on your system? Maybe you'll see a difference on that branch as it uses CGSHWCaptureWindowList (The HW in the name is hardware so potentially this stuff performs better)

gingerr commented 4 years ago

@lwouis quick feedback on performance: Tested it on my MacBook (Catalina) and my desktop Hackintosh (High Sierra) with external monitor. Same test as the last time, 16x windows maximised on a display running 3440x1440 and measuring the time between Application.showUiOrCycleSelection and ThumbnailsPanel.orderOut.

The performance is great (150ms - 250ms) and the performance issue I attempted to fix with #87 (which I could only reproduce on Hackintosh running High Sierra) is also gone! ThumbnailPanel summon feels instant on both systems.

I think you got a winner here for both performance and access to minimised windows. 🥇

lwouis commented 4 years ago

@gingerr that's good news!

I'm surprised by your results but I can't really reproduce (somehow everything is fast on my laptop, on master or on the minimized-windows branch, so I can't really compare), so I will have to trust you on this. Maybe other contributors here could test on their system and confirm also is the performance is good now? I think @galli-leo, @koekeishiya, and @mfn were interested in the fix :)

My worry now is that if we adopt this private API, we officially turn to the dark side. Like I said the good news is that this API is used in Webkit master today, and has been for year, so hopefully it's a stable private API. (Aside: how stupid is it that engineers on webkit use private APIs instead of asking the team at Apple in charge to maintain a public API that does what they need?)

I will review these other points I touch in the branch, and update the PR:

the tricky logic I used for minimized windows (doing 2 rounds of CGWindowListCopyWindowInfo with and without the .optionOnScreenOnly flag)
the change in multi-threading I made. I'm still unsure what the best pattern here. I want a thread to listen to keyboard events, and UI logic has to be on the main thread. Now how that looks like in terms of code is a bit confusing to me still. Hopefully I can get a clearer picture by reading more on these async queues from GCD and profiling more in Instruments.

galli-leo commented 4 years ago

@lwouis Very interesting results! I will have to check the changes out. When I tested it, CGSHWCaptureWindowList was way slower than the public API. But that could have been due to my test App doing something stupid. (Maybe the HW one already does something to speed up the rendering of the image later?)

Regarding the worry about Private APIs, I wrote a script that will check all SDKs for whether a function is present in that version or not, so we can check that regularly to verify that it still exists.

lwouis commented 4 years ago

@galli-leo

I will have to check the changes out.

Let us know how it performs on your system! Hopefully good

Regarding the worry about Private APIs, I wrote a script that will check all SDKs for whether a function is present in that version or not, so we can check that regularly to verify that it still exists.

Wow that's really cool! Could you please share with us?

I'm thinking the way forward should be some kind of Level of Service kind of like a hardware check in a video game. The private APIs seem to be better (more features and faster), but they are not guaranteed to work, so we could try first the private APIs, then fallback on public APIs for system who don't support the private ones.

That fallback mechanism could be decided:

At compile time (with your tool) However, I can imagine that a symbol could be present in an SDK, yet the function doesn't work in practice, or has different behavior depending on the SDK version.
At launch time Maybe at launch we can probe the OS and see what's available by requesting some windows and testing, the same way we test for permissions currently.
At compile-time possible with your tool Maybe we just take the safest road and try functions in order on each invocation, and fallback if needed

I think a mix of compile-time and run-time is needed to cover all cases. But maybe the private APIs are quite stable actually and we could just use them and that's it lol. I don't know how I can test that. Even if I leverage CI to build the project on multiple macOS/Xcode/SDK versions (which is a great leverage of the Cloud!), then I don't have multiple machines to test it. I tried VirtualBox'ing macOS a few months ago but it didn't boot. Any tips on testing on macOS? Maybe I could run on some VM in the Cloud somehow, like one of those e2e testing mac farms with manual testing would do the trick?

koekeishiya commented 4 years ago

I'm thinking the way forward should be some kind of Level of Service kind of like a hardware check in a video game. The private APIs seem to be better (more features and faster), but they are not guaranteed to work, so we could try first the private APIs, then fallback on public APIs for system who don't support the private ones.

I don't know how Swift handles this, but what you want to do is load the framework dynamically and look up functions. In C/Obj-C this would be achieved by using dlopen and dlsym to load and locate the function and assign its address to a function pointer that allows you to call said function.

The function lookup only has to be performed once at application startup. If the functions are located then you can simply allow the application to run its "fast" path, and otherwise have it run the "slow"/fallback path.

If the function is located I think it is fairly safe to assume that it works. However, it should be fairly trivial to perform a test of the functions at startup if they are located to verify if they produce the expected result, and then decide whether it is necessary to use the fallback-path or not.

lwouis commented 4 years ago

I just pushed a new version of #78. However while working on the minimized windows, thanks to the help of @koekeishiya and some research, I introduced a bunch a new private APIs that opened doors to major redesigns.

I think this PR will close #11 #45 #62. On my machine it is really fast. I can even do the thing I always wanted: if I press and release fast enough it will not even show the UI (with the delay param being at 0, so not interfering here). It also solves an issue that has no ticket: when you minimize or de-minimize a window, and press the shortcut during the animation. It used to block until the animation is over. Now it's instant.

@galli-leo @koekeishiya @gingerr if you guys could try to run the code on your setups, that would be super helpful. My only worry now is compatibility as this PR moves the project in the private API world (the dark side? :p). For extended compatibility, I plan on adding feature detection, probably using the dlopen approach described by @koekeishiya (I noticed this during my recent research too). But I see that as a second pass to increase compatibility. As I've read online regarding the few private APIs introduced in that PR, I have a feeling that they may not create compatibility issues. Depending on you guys results, I think it will either merge the PR as is, or add feature-detection to fallback on other APIs on some macOS versions.

Let me know how it performs on your machines! :)

lwouis commented 4 years ago

This ticket and a bunch of others are closed in v2 released today. Feel free to test that new version out and give feedback here! Hopefully you experience better performance, can interact with minimized windows, and interact with windows from other spaces and displays. Cheers!

zxti commented 4 years ago

I was super excited to come across this project! Unfortunately I find the v2 lag before the switcher shows up when pressing alt-tab to still be quite high (vs HyperSwitch). It feels probably close to 2x HS's lag, and HS's lag I was already quite sensitive to (compared to the instantaneous native alt-tab and Windows alt-tab I'm most used to). I know it sounds super sensitive, and I may just be in the minority, but this is one of the biggest hurdles against being comfortable with this type of switcher (HS is not exempt but its lag is noticeably faster). Just my 2c!

lwouis commented 4 years ago

Hi @zxti! Thanks for the feedback. I got 2 main points from your comment:

You want more performance

I want it too. Every contributor here wants it. I think we can all agree that for such a tool, snappy UX is priority number 2 (number 1 being "it actually shows the thumbnails and I can switch to them").

HyperSwitch is faster

The reason HyperSwitch is faster is because it shows you wrong information. It displays something as fast as it can, then it update the UI asynchronously. Open a youtube video, and trigger HS many times and see how it takes a few seconds to be correct. They are basically refreshing thumbnails on a scheduled timer like 4s. If you alt-tab right after the tick, it will be accurate and be a fast UX. If you trigger right before the tick, you will see a different window that could make it hard to locate the visual image you are looking for (e.g. you are looking for a big dominant color like big red background but it's not there on the thumbnail), or worse mislead you (it will show you the how it looked 4s ago and will make you select the wrong thumbnail).

My point is that they trade-off accuracy for speed. We could do that too in this project and cache thumbnails on a timer too. If we go that road I clearly want this to be a preference as some of us will prefer the opposite trade-off and would rather wait a few more ms and get accurate visuals.

But that's not all! I actually have good news for you! I'm currently working on v3 which is a total rework of the app. Instead of asking the OS about the state of the whole system on trigger (what we do today; hard to do fast), or asking the state of the whole system on a timer (what HyperSwitch does today; inaccurate) - instead of one of 2 approaches, v3 observes the Accessibility events such as "an app was launched", "a window was closed". This means we build a cache as we receive these events in the background, and when the user trigger the app, we can show accurate state of the windows instantly.

Of course there is no free lunch, so this approach has its own issues. However from my work on it from the past week, I'm very optimistic! The thing I'm the most excited about actually is not the perf (because on my machine even v2 is instant; I have a recent macbook and no 4k displays), but the fact that we will finally have the thumbnails in order of recently-used to least-recently-used, instead of the order of their stack (z-index) on the desktop. It's a big difference! There are many more limitations that are no longer applying also with this approach.

I hope to be able to release v3 either this week or the next one

lwouis commented 4 years ago

@zxti I opened a PR for v3. It's think it's ready but I'm afraid I didn't test enough. I tested very tricky cases but still I could be missing some. The issue with the new approach is that we maintain state, so if there are cases that we don't take into account, we can drift away from the actual state of the windows. However the benefits are many, as I hinted at in my previous comment.

Could you test this on your side and tell me if it performs better for you? :) Also if it's functioning correctly ;p

kawtharmonki commented 4 years ago

Any chance you'd be able to add a build to that branch? Would be happy to test it out :]

lwouis commented 4 years ago

@kawtharmonki sorry I was working on fixing a layout issue I introduced while reworking the layout code. I think layout works good now. Here is a build of the latest v3 PR code: alt-tab-macos.app.zip

kawtharrr commented 4 years ago

Heya, sorry for the late response to this. V3 has massive speed improvements, great work! A couple of bugs:

While the alt-tab 'window' is open, the focused window keeps jumping around (seems to be to the 2nd window), resulting in a difficult to use experience
The app seems to capture X+hotkeys, when more typical behaviour is only to respond to hotkeys only. e.g. if I've set ALT+TAB as the hotkey for this, and I've SHIFT+ALT+TAB as a hotkey to play/pause Spotify, alt-tab (the app) still gets invoked in the second scenario. This behaviour is in contrast to most other apps including e.g. HyperSwitch

Happy to open separate Issues - but not sure if they've already been addressed since it's been a while and you're making rapid progress + changes :)

lwouis commented 4 years ago

Thanks for the feedback @kawtharrr!

While the alt-tab 'window' is open, the focused window keeps jumping around (seems to be to the 2nd window), resulting in a difficult to use experience

The description you make is a bit weird. I have trouble imagining what this looks like exactly. I would say most likely this issue is gone, but if you want to confirm, let me attach the latest v3 branch build here: alt-tab-macos.app.zip

The app seems to capture X+hotkeys, when more typical behaviour is only to respond to hotkeys only. e.g. if I've set ALT+TAB as the hotkey for this, and I've SHIFT+ALT+TAB as a hotkey to play/pause Spotify, alt-tab (the app) still gets invoked in the second scenario. This behaviour is in contrast to most other apps including e.g. HyperSwitch

This behavior has been there from day 1 I believe. The keyboard events checks we do are "at least these keys are pressed", not "exactly these keys are pressed". I'll open a ticket to track that issue. As soon as ShortcutRecorder implements the ticket I opened, I will work on #72. ShortcutRecorder has exact matching so it will close this issue as well.

lwouis / alt-tab-macos

Recomputing Thumbnails is slow and expensive #45