Closed lunixbochs closed 4 months ago
On a build of Talon with this winit branch: https://github.com/talonvoice/winit/commits/0.28 and macOS 13.5 (arm64), a user reported this crash
Was the user using macOS 13.5, or is it only the build that happen there?
I also use glutin
Are you using both softbuffer
and glutin
in the same application? May I ask why?
I don't remember the specifics, but iirc objc sometimes messes with pointer bits after freeing objects, which could explain the weird pointer?
Hmm, not that I know of? If anything, it'll just be that since the object is freed, the space would've been reclaimed by some other part of the system, and hence the pointer would be some other unrelated data, not a pointer any more.
This crash happened repeatedly for the user, and stopped after a macOS restart.
Yikes! That makes it basically impossible to debug, as we can no longer reproduce the issue; otherwise I'd have suggested rerunning with malloc scribbling and such enabled:
DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib MallocStackLogging=YES NSZombieEnabled=YES MallocGuardEdges=YES MallocScribble=YES ./target/debug/my_binary
Honestly though, this sounds more like some weird macOS bug?
Was the user using macOS 13.5
Yes
Are you using both softbuffer and glutin in the same application? May I ask why?
Maybe 1% of my users don't have a working gpu/driver/opengl, so I automatically fall back to software rendering. This happens sometimes on macos with displaylink monitors, on windows if their gpu driver is broken, and honestly just a lot of Linux users are running a weird env with no gpu. (This is more work than you think! I render with Skia, even for egui, and fall back to Skia's software renderer. I also need to manually create my opengl context in a way where I can recover when the user doesn't have it. I have Metal support too, but it's disabled because it still has memory leaks rust-side I haven't tracked down since the rust port. Softbuffer also unfortunately doesn't support the alpha channel yet, which I use)
hence the pointer would be some other unrelated data, not a pointer any more
Ah yeah if that's the case I'll send them an ASAN build if it comes up again.
rerunning with malloc scribbling and such enabled
Thanks, I'll do this next time it pops back up, my guess is if I can't repro another user will hit it in 1-3 months.
Honestly though, this sounds more like some weird macOS bug?
My gut feeling is that it's not a macos bug. I've run into this sort of thing (very rarely) before the rust port and it was usually due to a misunderstanding about the objc reference counting or an unexpected interaction, e.g. incorrect usage of autoreleasepool.
Even bigger complication is this app is an accessibility client, and can look at its own UI, which can very much confuse UI frameworks due to the way AppKit calls back to itself from a deep call stack in the same thread (though I don't think the user was doing that). Talon is really an edge case factory. I've invested heavily in porting from Qt to Rust because I want more language level guarantees. It's gone well so far besides expect() calls in crates that should have been fallible (the app hard exits, which is really disconcerting for a user who is using it as primary input instead of a keyboard/mouse)
My biggest guess is it was caused by DisplayLink, which is basically the number 1 reason people with perfectly good machines+drivers mysteriously don't have a working gpu. It also may explain why a reboot may have fixed - their dock supports both DisplayLink and Alt Mode so my guess is it switched between them inadvertently. DisplayLink is rare enough that it could explain why more of my users haven't hit it yet. I'll grab a DisplayLink adapter and try to repro.
I'm going to close this, I believe we're fairly good at doing reference-counting nowadays (both in Winit and Softbuffer), and since it's not reproducible, it's not really actionable from our side. Feel free to re-open if your user hits this again!
On a build of Talon with this winit branch: https://github.com/talonvoice/winit/commits/0.28 and macOS 13.5 (arm64), a user reported this crash:
To my knowledge the only NSViews being created in my app are by winit and softbuffer. I also use glutin which seems to interact with NSViews via icrate, but it doesn't seem to create an NSView itself.
I interact with winit from a single thread only. This crash happened repeatedly for the user, and stopped after a macOS restart. They have a monitor connected via a displaylink dock.
I don't remember the specifics, but iirc objc sometimes messes with pointer bits after freeing objects, which could explain the weird pointer?
cc @madsmtm not sure if you have any ideas