obsproject / obs-studio

OBS Studio - Free and open source software for live streaming and screen recording
https://obsproject.com
GNU General Public License v2.0
60.27k stars 7.98k forks source link

OBS Virtual Camera crashes when switching breakout rooms in Zoom. #7287

Closed attaboyBrad closed 2 years ago

attaboyBrad commented 2 years ago

Operating System Info

macOS 12

Other OS

No response

OBS Studio Version

28.0.0

OBS Studio Version (Other)

28.0.1

OBS Studio Log URL

https://obsproject.com/logs/Chwt_hwR_9cbLuE8

OBS Studio Crash Log URL

No response

Expected Behavior

I expected to switch from one breakout room to another without issue.

Current Behavior

The OBS virtual camera froze in Zoom, and in attempting to restart it the virtual camera ceased to show up as an option in Zoom and Zoom fell back to a raw webcam feed. This has been an issue since the first OBS 28 beta and was mentioned a couple of times in the beta announcement forum thread.

Steps to Reproduce

  1. Open OBS 28
  2. Open Zoom (fully updated as of 22-09-04) and start a meeting.
  3. Start virtual camera in OBS 28.
  4. Enter a breakout room or try to switch breakout rooms in Zoom.

Anything else we should know?

This has been happening basically every time I try to enter, or especially to switch, breakout rooms in Zoom since the first OBS 28 beta (still an issue with 28.0.1).

It's not an issue on 27.2.4, and I've reverted back to that for the time being.

attaboyBrad commented 2 years ago

I have since retested using a completely blank profile and scene collection to try to rule out the possibility that the issue stems from my particular configuration and the issue still happens--in fact Zoom crashed outright.

I retested Zoom while using cameras other than OBS 28's Virtual Camera and Zoom worked fine so it appears to be very specific to OBS 28 and its Virtual Camera implementation.

fischernick commented 2 years ago

I've been running into the same problem on an M1 MacBook Pro. While using Virtual Camera in Zoom, going into a breakout room will cause zoom to just lock and display the spinner on the mouse icon. The other participant video and audio still works.

I've also run into related issue where I am in one meeting, leave, and start another meeting, and Zoom disables my video and I'm unable to switch to another video or turn on the video. If I close / force quit Zoom and rejoin it works.

fabiosantossilva commented 2 years ago

Same thing here. My zoom has been crashing randomly since I started using OBS 28.0.1 on my Mac Studio. The virtual camera stops working and zoom becomes unresponsive as soon as I change meetings. Closing zoom and reopening seems to be the only solution for now.

stevevrporter commented 2 years ago

Exactly the same issues here.

OBS 28.0.1 Zoom Version: 5.11.11 (10514) OSX Version 12.6 2021 Macbook pro, Apple M1 Max

bpgray commented 2 years ago

Same issue as well

OBS 28.0.1 arm64/Apple Silicon Zoom 5.11.11 (10514) macOS 12.5.1 MacBook Pro, Apple M1 Pro

I was also having this issue with OBS release candidates and prior versions of Zoom.

Fenrirthviti commented 2 years ago

Please ensure that you include a log file when confirming bug reports the future, rather than just "me as well" comments.

That said, I don't think we need more confirmations of the issue, but I don't think we have a solid idea on what exactly is happening or have had time to look in to it yet.

RytoEX commented 2 years ago

Please check if this is still an issue in OBS Studio 28.0.2.

fabiosantossilva commented 2 years ago

Please check if this is still an issue in OBS Studio 28.0.2.

Yes....The problem persists. Thank you for following up.

attaboyBrad commented 2 years ago

Please check if this is still an issue in OBS Studio 28.0.2.

Yes....The problem persists. Thank you for following up.

I have also reproduced the issue in 28.0.2. Log Below.

2022-09-24 11-07-24.txt

RytoEX commented 2 years ago

If anyone has a support contact with Zoom and they are able to report/escalate this there, that would be appreciated, thanks.

attaboyBrad commented 2 years ago

If anyone has a support contact with Zoom and they are able to report/escalate this there, that would be appreciated, thanks.

I'll open a ticket.

No guesses as to why OBS v27 works fine and v28 is bugged in this regard? Any chance of an interim remedy where the v27 Virtual Camera plugin can be installed in parallel or something?

RytoEX commented 2 years ago

I'll defer to @PatTheMav on any questions about the macOS Virtual Camera, but no, I don't believe we have any leads yet. Zoom is a black box to us, so diagnosing this is a bit difficult. If virtual camera functionality is critical to your workflow, please consider simply remaining on the older version that works and checking back periodically with our release notes and newer versions.

attaboyBrad commented 2 years ago

I've opened a ticket and will keep this thread apprised of any updates. Maybe it will help if other people open tickets as well.

As part of that ticket, I've uploaded a video where I reproduce the issue. In case it's of any use here, here's the link: https://youtu.be/8D6iDIk9ieI

attaboyBrad commented 2 years ago

Below is the response I got from Zoom Support, in case their recommendations provide any insight into what they think might be going wrong. These steps didn't help at all, and in fact now I get a "Zoom Crashed Unexpectedly" popup every time I launch Zoom--though I might have just disabled that notice before.

I expect it'll take some back and forth before this gets elevated to somebody that might have any idea what's going on. The first step is basically "uninstall and reinstall and cross your fingers". I'm not exactly sure what the second steps did regarding mDNSResponder and the like, but I'm less than pleased that Zoom is asking me to run terminal commands as Admin given their security track record.

Hi Attaboy,

Thank you for contacting Zoom Support!

Kindly follow this step if she is using a MacOS device:

  1. Open Zoom client and Quit from the Zoom menu

  2. Open Terminal

  3. Type: cd "Library/Application Support/zoom.us"

  4. Type: rm -rf data and hit enter:

  5. Uninstall Zoom Client

  6. Reboot Mac?

  7. Reinstall Zoom client: zoom.us/downloads

This should clear out any dirty data on the current profile. Please let me know if this helps!

The next troubleshooting step I have is:

  1. Open Terminal
  2. copy and paste the following command: sudo killall -HUP mDNSResponder; sleep 2
  3. Restart Mac
  4. Try again

Please let me know if you have any questions regarding this and I'd be happy to assist you further.

PatTheMav commented 2 years ago

That's a whole bunch of snake oil that's not doing anything remotely related to the crash issue.

Alas until we are able to reproduce this issue ourselves we are none the wiser (and given that Zoom is signed binary, we're not able to debug our plugin running within the address space of the Zoom app).

PatTheMav commented 2 years ago

What you could do (as you're able to reproduce the issue) is open a Terminal window and run this before trying to replicate the issue:

log stream --system | grep -i "zoom"

This streams all macOS log output to the standard output of your Terminal window and filters out all lines that do not contain the word "zoom" in it.

It's effectively the same output you get from Console.app (which you can run yourself on macOS). Note that filtering for "zoom" is me fishing in the dark, it could be that another keyword might actually yield the right log line, so you might either check the same lines in Console.app yourself and try to find the "gotcha" line, or adjust the keyword used by grep.

attaboyBrad commented 2 years ago

I appreciate the suggestions @PatTheMav, I gave it a go and the results are attached below. I'll make an effort to parse the output myself this afternoon, but suspect it'll mostly go over my head. Happy to do all I can to support the project--it's been invaluable to my work.

Have you been unable to reproduce this issue as you don't have an Apple Silicon based mac to try it on? Do you need a paid Zoom account and not have one? It sounds like you're "the guy" on Apple Silicon OBS Virtual Camera, so let me know if there's some needed resource you don't have on this and I'll do my best to support.

OBS v28 Zoom Breakout Room Crash Log from AttaboyBrad 220926.md

PatTheMav commented 2 years ago

What's the timestamp of the crash?

attaboyBrad commented 2 years ago

Hmm... good question. I don't know and I'm not really sure how to note that in realtime with sufficient accuracy when there are so many entries per second.

I've had an idea though:

https://drive.google.com/drive/folders/1j101ClBgnVLkrzAjqSujVWJS9Hv0lT9D?usp=sharing

This GDrive folder contains three screen recordings that show the console log running as crashes happen. I've cut each down to contain only a few extraneous seconds before and after. It contains three because I forgot to export the console log for the first two. I'm including the first two in case they provide any sort of useful cross-reference. Happy to make more with included Logs if they'd be helpful.

The console log for the latest video is in that folder and also linked below.

Screen Recording 2022-09-26 at 14.46.13.txt

The timestamp where (I think) the crash begins in the included log is 2022-09-26 14:50:39.824458-700. It ends (I believe) at 2022-09-26 14:50:50.207573.

In this timeframe I switch into a breakout room, and during that switch Zoom "stops video" for OBS Virtual Camera and it can't be restarted in Zoom as a video source.

A few seconds later, beginning at 2022-09-26 14:51:02.928854, I attempt to leave the breakout room and return to the "Main Session", and when I reach the Main Session (at 2022-09-26 14:51:04.981655) OBS Virtual Camera has disappeared entirely and Zoom has switched over to a different camera entirely.

In the first two videos you can see the crash happen in two other ways: In the first video Zoom crashes entirely, in the second video Zoom "loses" the OBS Virtual Camera entirely in a single Breakout Room switch.

EDIT: I've added a subfolder in that shared GDrive folder and reproduced the same actions and log using OBS v27 so you can perhaps note which things are different between v27, which seems to work fine, and v28.

PatTheMav commented 2 years ago

Please filter the log for caphost instead of Zoom, it seems that the virtual camera plugin is loaded into that helper application and not the main UI process.

Also if an actual crash happened, there should be an entry in Console.app under "Crash Reports".

attaboyBrad commented 2 years ago

Here's a Zip file of today's caphost and zoom.us crash reports from Console.app.

Let me know what best practices are for sharing these. I didn't know Console.app was a thing until just now.

Caphost and Zoom errors from 220926.zip

I'll make new videos tomorrow morning with the different log filter--I have meeting for the rest of the evening.

sa1 commented 2 years ago

Here's my caphost.log as well. This is with OBS 28.0.1 Another one: caphost2.log. This is with OBS 28.0.2

It doesn't crash every time I switch to a breakout room, but happens within 3-4 tries.

.ips files corresponding to these 2 crashes and a previous one: Archive.zip

attaboyBrad commented 2 years ago

Here is another caphost.ips Crash Log as well as a later log from Terminal filtering for 'caphost' for which I added another screen recording (named identically to the Terminal Log) to the GDrive link that shows the log in realtime as I reproduce the issue.

caphost-2022-09-28-123300.txt Screen Recording 2022-09-28 at 13.15.06 Caphost Log.txt

Curiously, no crash report showed up in Console.app for this later crash.

Not sure if it's relevant, but a few minutes after the issue was reproduced (and after I stopped the screen recording), another entry showed up in the log:

"2022-09-28 13:18:07.812393-0700 0x1131fc Default 0x0 0 0 kernel: process caphost[85338] caught waking the CPU 45001 times over ~258 seconds, averaging 173 wakes / second and violating a limit of 45000 wakes over 300 seconds. caphost[85338] () woke the CPU 45001 times over 258.65 seconds (average 173/sec), violating a CPU wakes limit of 45000 over 300 seconds."

Here's the OBS log from this timeframe:

OBS Log 2022-09-28 13-13-26.txt

I heard back from what I assume is a higher tier of "support" at Zoom. Unfortunately, the long response they sent still did not seem to reflect conscious engagement with what I had written. Anyway, they requested these logs as well (with a bunch of extra hoops) and I've submitted them. Maybe something will come of it.

PatTheMav commented 2 years ago

Some of the crashes seem to suggest that they occur when the cam source is added. I dunno how Zoom handles multiple instances (main call + breakout session or main call + config window).

The only time I was able to force a crash was being in a call and opening up video settings, which creates a second connection to the virtual camera for the settings window. But that one also didn't crash every time, so it wasn't too helpful.

It might suggest an issue with the camera plugin being initialised more than once per application, but I'd have to think about a way to reproduce this without relying on 3rd party apps.

attaboyBrad commented 2 years ago

From Zoom a few days ago regarding Breakout Rooms:

"From a Breakout Room standpoint, I wanted to make you aware of the backend functionality of BO rooms. What a BO room is from a backend technical point of view is an entirely separate meeting or sub-meeting with very limited connection to the host meeting. This is why there is a complete disconnection from the main meeting and connection to the new sub meeting or BO Room. Because a BO room can be hosted on a different server network and routing delays can happen. Additionally, cameras and audio devices have to disconnect from the host meeting and reconnect to the BO room for them to function so again there can be delays."

From Zoom more recently:

Thanks for the updated info and logs. I am going to open up an engineering ticket to review the crash logs and meeting logs to see what may be causing Zoom and the virtual cam to crash. I did check the meeting graphs and meeting details from meeting ID: 862 6598 6580 and we did detect one of the reconnection delays but the meeting didn't last long enough to gather much data so I am hoping there is more info within the actual logs. Hopefully, the engineering team will discover what exactly is going on.

We also released a new update that you may want to install and test to see if you are having the same issues. Current Release: September 26, 2022 version 5.12.0 (11129) Download type: Manual Download here: Download Center

Please test that and let me know if Zoom is still crashing and send us the crash logs. I will keep you updated on their findings.

attaboyBrad commented 2 years ago

Some of the crashes seem to suggest that they occur when the cam source is added. I dunno how Zoom handles multiple instances (main call + breakout session or main call + config window).

Possibly related to this, and something other users have reported in the thread: the crash almost never happens on the first switch between Main Session and Breakout Room. For me it's almost invariably on the 2nd switch between Main/Breakout or amongst Breakout rooms in some form (either no longer showing video but still showing OBS VC as a source option, or losing OBC VC as source entirely and swapping, and never beyond the 3rd.

After my workday ends today I'll see if I can reproduce the crash you mentioned with the Video Settings. LMK if there are any particular steps I should try.

I forwarded your comments to Zoom in case its of use to them. I'm cautiously optimistic that I've gotten past the TextExpander Snippet level of support over there.

RytoEX commented 2 years ago

If Zoom wants to engage with us directly, feel free to let them know that they can reach us at contact@obsproject.com.

attaboyBrad commented 2 years ago

Some of the crashes seem to suggest that they occur when the cam source is added.

If you mean 'cam source' in OBS, I made an empty Scene collection that has nothing but a still image source--no cameras or filters or nothing else--and it crashed exactly as before.

Console.app crash report from that below.

Zoom_Ignore_zoom.us-2022-10-02-161110 copy.txt

I also unsuccessfully tried to produce a crash by opening and closing 'Video Settings' while on a call in Zoom.

attaboyBrad commented 2 years ago

I notice the v28 virtual camera has a number of settings that weren't present in v27--namely 'Output Type' and 'Output Selection. I was able to reproduce the crash with each one of those options--Internal: Preview, Internal: Program, and Scene: [several scenes].

Any theories on why v27 plugin works just fine while v28 doesn't? Is there any sort of limited direct port of the v27 Virtual Camera plugin that I could try? Any sort of alternate version that would still work in v28? I tried just replacing the v28 plugin with the one from v27, and I assume it's of no surprise to the Devs here that it didn't work at all.

I unsuccessfully tried to reproduce the crash with other 'Virtual Cameras' I have that can work with Zoom without going through OBS--namely 'NDI Video' and Elgato's 'EpocCam'. The issue is somehow very specific to OBS v28.

WizardCM commented 2 years ago

@attaboyBrad Because there were some large internal changes to the plugin, including

Among smaller fixes based on related changes.

We don't yet know the cause of this issue specifically.

PatTheMav commented 2 years ago

Given that crashes are 100% reproducible without any other software (adding the virtual camera as a video source to OBS is sufficient), this is entirely related to our own handling of client mach ports:

OBS invalidates the reception port generated by the DAL plugin endpoint unilaterally if it isn't able to send a message (which can happen if the process that runs the DAL plugin has ended the session but not invalidated/removed the port itself).

This somehow also interferes with how CoreMedia initiates a new capture session for a camera, but it's easily reproducible by adding the virtual camera twice to an application (sometimes just switching back and forth from the virtual camera to an actual webcam is sufficient as well).

I haven't found the exact source of the crash, but this also doesn't seem related to 28 but should've been an issue all along.

PatTheMav commented 2 years ago

My current findings:

• Necessary condition for the crash seems to be that a new AVCaptureSession with our virtualcam is started or a session property is updated (calling AVCaptureSession_Tundra _doSessionUpdates) • In any case the crash happens in CFRunLoopAddSource • The error message is IOKit.framework:IONotificationPortGetRunLoopSource bad CFMachPort followed by the port information • According to IOKit sources, this error happens if CFMachPortCreateWithPort is called with a port number that already exists

This whole thing smells like "race condition" but I've tried to debug this and the mach ports we create are all fine (and the port number reported by the error is neither the send, receive, or IOSurface ports we use for virtualcam functionality.

Simple reproduction steps:

• Add a media source to OBS (for output) • Start VirtualCamera • Add Video Input Source, select Virtual Camera • Scale down the Video Input Source to create a small "hall or mirrors" effect • Add a second Video Input Source • With the properties window open, scale down the second Video Input Source to create a second "hall of mirrors" • In the properties window, change either the preset or switch between an "empty" source and the Virtual Camera • Repeat 2-3 times to crash (usually happens at less than 5 attempts)

The crash will either occur due to AVCaptureSession_Tundra _doSessionUpdates (updating preset) or AVCaptureSession_Tundra startRunning (starting a new session)

steven-michaud commented 2 years ago

@PatTheMav closed #7528 as a dup of this bug -- which it seems to be. I actually skimmed through this bug before I opened #7528, but didn't find any crash stacks. So I didn't realize it was a dup. So to help others who see these crashes, here's a crash stack. I made it in lldb because the crash stacks in Mozilla crash reports are somewhat corrupt. The "crash address" is 0x0000000000000068, on both Intel and Apple Silicon.

* thread #110, queue = 'org.webrtc.videocapture', stop reason = breakpoint 2.1
  * frame #0: 0x00007ff81205c110 CoreFoundation`CFRunLoopAddSource
    frame #1: 0x00000001a6bf5b4c CMIOBaseUnits`___lldb_unnamed_symbol389$$CMIOBaseUnits + 226
    frame #2: 0x00000001a6c6c5cf CMIOBaseUnits`___lldb_unnamed_symbol1346$$CMIOBaseUnits + 29
    frame #3: 0x00000001a6c23120 CMIOBaseUnits`___lldb_unnamed_symbol762$$CMIOBaseUnits + 58
    frame #4: 0x00000001a6befd1a CMIOBaseUnits`___lldb_unnamed_symbol330$$CMIOBaseUnits + 102
    frame #5: 0x00000001a6bf7ef1 CMIOBaseUnits`___lldb_unnamed_symbol421$$CMIOBaseUnits + 23
    frame #6: 0x00000001a6c04a16 CMIOBaseUnits`___lldb_unnamed_symbol551$$CMIOBaseUnits + 43
    frame #7: 0x00007ff81fd12505 CoreMediaIO`CMIOUnitNodeInfo::Initialize(CMIOGraph*) + 57
    frame #8: 0x00007ff81fd04e00 CoreMediaIO`CMIOGraph::Initialize() + 3072
    frame #9: 0x00007ff82be5d356 AVFCapture`-[AVCaptureSession_Tundra _buildAndRunGraph] + 2029
    frame #10: 0x00007ff82be5e913 AVFCapture`-[AVCaptureSession_Tundra _setRunning:] + 456
    frame #11: 0x00007ff82be5e460 AVFCapture`-[AVCaptureSession_Tundra startRunning] + 180
    frame #12: 0x0000000112ba7827 XUL`-[RTCVideoCaptureIosObjC startCaptureInBackgroundWithOutput:](self=0x0000000132708ac0, _cmd=<unavailable>, currentOutput=<unavailable>) at rtc_video_capture_objc.mm:200:3 [opt]
    frame #13: 0x00007ff811e030cc libdispatch.dylib`_dispatch_call_block_and_release + 12
    frame #14: 0x00007ff811e04317 libdispatch.dylib`_dispatch_client_callout + 8
    frame #15: 0x00007ff811e0a317 libdispatch.dylib`_dispatch_lane_serial_drain + 672
    frame #16: 0x00007ff811e0adfd libdispatch.dylib`_dispatch_lane_invoke + 366
    frame #17: 0x00007ff811e14eee libdispatch.dylib`_dispatch_workloop_worker_thread + 753
    frame #18: 0x00007ff811fb7fd0 libsystem_pthread.dylib`_pthread_wqthread + 326
    frame #19: 0x00007ff811fb6f57 libsystem_pthread.dylib`start_wqthread + 15
steven-michaud commented 2 years ago

given that Zoom is signed binary, we're not able to debug our plugin running within the address space of the Zoom app

Shameless plug: You could with HookCase. Though you'd need to use it on Intel hardware, since HookCase doesn't yet support Apple Silicon.

PatTheMav commented 2 years ago

Well that'd be a dealbreaker because all my Macs used as daily drivers are Apple Silicon-based. 😅

steven-michaud commented 2 years ago

I've got even more minimal STR for these crashes.

steven-michaud commented 2 years ago

My STR don't work in Chrome, though -- even though I also see the IOKit.framework:IONotificationPortGetRunLoopSource bad CFMachPort error there. I think the crucial difference is that Chrome uses the OS (macOS) to determine access to the camera (and microphone). Firefox has its own methods.

Edit: Oops, I may need to take this back. Chrome doesn't visibly crash, but the next line in Console is:

16:08:00.272170-0500    AMFI: Denying core dump for pid 4304 (Google Chrome He)
PatTheMav commented 2 years ago

My STR don't work in Chrome, though -- even though I also see the IOKit.framework:IONotificationPortGetRunLoopSource bad CFMachPort error there. I think the crucial difference is that Chrome uses the OS (macOS) to determine access to the camera (and microphone). Firefox has its own methods.

Edit: Oops, I may need to take this back. Chrome doesn't visibly crash, but the next line in Console is:

16:08:00.272170-0500  AMFI: Denying core dump for pid 4304 (Google Chrome He)

Chrome uses an out-of-process system on macOS which sandboxes cameras off into their own subprocess, which has the benefit of not bringing the entire browser down.

zwhite commented 2 years ago

I'm hitting what seems to be this same problem without using breakout rooms at all. Just leaving Zoom open and moving from meeting to meeting is enough to trigger. I even had one meeting where I stopped video for a minute and couldn't restart it after that. It started after I updated from v27 to 28.0.3.

PatTheMav commented 2 years ago

That sounds correct because the issue is unrelated to specific Zoom functionality but to the application starting or changing an AVCaptureSession.

steven-michaud commented 2 years ago

I've got a one-line fix for these crashes! It's really, of course, a workaround for Apple's bug(s).

diff --git a/plugins/mac-virtualcam/src/dal-plugin/OBSDALMachClient.mm b/plugins/mac-virtualcam/src/dal-plugin/OBSDALMachClient.mm
index c6126398c..d53b661de 100644
--- a/plugins/mac-virtualcam/src/dal-plugin/OBSDALMachClient.mm
+++ b/plugins/mac-virtualcam/src/dal-plugin/OBSDALMachClient.mm
@@ -114,8 +114,9 @@

                        IOSurfaceRef surface = IOSurfaceLookupFromMachPort(
                                [framePort machPort]);
-                       mach_port_deallocate(mach_task_self(),
-                                            [framePort machPort]);
+                       //mach_port_deallocate(mach_task_self(),
+                       //
+                       [framePort invalidate];

                        if (!surface) {
                                ELog(@"Failed to obtain IOSurface from Mach port");

I followed the standard build procedure with unaltered code. Then I tested the result with my STR and saw the crashes. Then I applied the patch and rebuilt, and tested again. I switched the cameras back and forth around 100 times (over about five minutes), and didn't see a crash.

I found the patch using a HookCase hook library. On an Intel box of course. I'll explain in more detail in later comments.

Get yourself an Intel Mac, @PatTheMav! I'm not sure when (or if) I'll be able to port HookCase to Apple Silicon. And VM support on Intel Macs is much better than on Apple Silicon boxes, and likely to remain so.

PatTheMav commented 2 years ago

The explicit deallocation was added in https://github.com/obsproject/obs-studio/commit/b02e4b109b03b56f6ed636d53064d3ffc4e024e0 to fix memory leaks that happened without it - does [framePort invalidate] address this as well?

steven-michaud commented 2 years ago

I don't know. It might. But first let me explain why my patch fixes these crashes.

PatTheMav commented 2 years ago

I just checked, removing this code brings back the memory leaks - according to instruments I got around 2800 memory leaks in 30 seconds.

steven-michaud commented 2 years ago

First some background:

Some time ago Apple closed the source for its CoreFoundation framework. But the latest version of the old source is still useful. For example it has an array called __CFAllMachPorts. This is used to save old CFMachPortRefs for recycling. The _CFMachPortCreateWithPort2() function adds items to this array when it needs to. It also sets the shouldFreeInfo flag when it re-uses an old CFMachPortRef. CFMachPortInvalidate() removes items from __CFAllMachPorts. This behavior is still basically the same in current copies of the CoreFoundation framework.

IONotificationPortGetRunLoopSource(), the method that displays this bug's diagnostic error messages (IOKit.framework:IONotificationPortGetRunLoopSource bad CFMachPort), calls CFMachPortCreateWithPort(), which in turn calls _CFMachPortCreateWithPort2(). It'd be nice to find out the "original" use of the re-used CFMachPortRef that triggers this bug's crashes. Then maybe we could explicitly call CFMachPortInvalidate() when we're done with it, and prevent it from getting re-used.

I used my HookCase hook library to find out that the "original" CFMachPortRef is always the framePort object whose behavior I changed with my patch. By looking at the assembly code for the Foundation framework, you can tell that -[NSMachPort invalidate] calls CFMachPortInvalidate(). Problem solved :-)

steven-michaud commented 2 years ago

We might be able to call both.

PatTheMav commented 2 years ago

We might be able to call both.

Yeah I just implemented that, got no crashes so far, but also lots of leaks. I still have to check if the original fix actually fixed those leaks at all.

PatTheMav commented 2 years ago

First some background:

Some time ago Apple closed the source for its CoreFoundation framework. But the latest version of the old source is still useful. For example it has an array called __CFAllMachPorts. This is used to save old CFMachPortRefs for recycling. The _CFMachPortCreateWithPort2() function adds items to this array when it needs to. It also sets the shouldFreeInfo flag when it re-uses an old CFMachPortRef. CFMachPortInvalidate() removes items from __CFAllMachPorts. This behavior is still basically the same in current copies of the Foundation framework.

IONotificationPortGetRunLoopSource(), the method that display's this bug's diagnostic error messages (IOKit.framework:IONotificationPortGetRunLoopSource bad CFMachPort), calls CFMachPortCreateWithPort(), which in turn calls _CFMachPortCreateWithPort2(). It'd be nice to find out the "original" use of the re-used CFMachPortRef that triggers this bug's crashes. Then maybe we could explicitly call CFMachPortInvalidate() when we're done with it, and prevent it from getting re-used.

I used my HookCase hook library to find out that the "original" CFMachPortRef is always the framePort object whose behavior I changed with my patch. By looking at the assembly code for the Foundation framework, you can tell that -[NSMachPort invalidate] calls CFMachPortInvalidate(). Problem solved :-)

Great writeup, it aligns with what I suspected might happen, I had no time to dig into the old CF sources (which had been closed by Apple as you mentioned), so thank you for putting in the work here!

I'll dig into this a bit more once I got a bit of time and forward the info to Apple, which have been looking into this for me as well.

steven-michaud commented 2 years ago

Hook library I tested with

Here's the HookCase hook library I tested with, as a diff on https://github.com/steven-michaud/HookCase/blob/v6.0.4/HookLibraryTemplate/hook.mm.

steven-michaud commented 2 years ago

Output of hook library

Here's the entire output of a single session with my hook library, logging what happens when I trigger one of this bug's crashes. I used a current (and unaltered) local build of OBS Studio, together with today's mozilla-central nightly -- a build of Firefox made from current trunk code. I'll explain what I learned from it in my next comment.