Closed attaboyBrad closed 2 years ago
Here's the last output before the crash:
(Sat Oct 8 14:01:56 2022) /Applications/Firefox Nightly.app/Contents/MacOS/firefox[19894] [0x70001367b000] Hook.mm: _CFMachPortCreateWithPort2(): port 0x20583, shouldFreeInfo 1, returning
<CFMachPort 0x148e11580 [0x7ff859d07d70]>{valid = Yes, port = 20583, source = 0x0, callout = __NSFireMachPort (0x7ff818107788), context = <CFMachPort context 0x13e15fcd0>}
(Sat Oct 8 14:01:56 2022) /Applications/Firefox Nightly.app/Contents/MacOS/firefox[19894] [0x70001367b000] Hook.mm: IONotificationPortGetRunLoopSource(): notify 0x14b6dd5b0, notify.masterPort 0x7907, notify.wakePort 0x20583, notify.cfmachPort 0x0, notify.source 0x0, notify.dispatch_source 0x0, notify.refcount 0, returning
(null)
I took the port number (20583
) from the port 0x20583
parameter to _CFMachPortCreateWithPort2()
and used it to search through the log.
The first hit is:
(Sat Oct 8 14:01:53 2022) /Applications/Firefox Nightly.app/Contents/MacOS/firefox[19894] [0x70000e786000] Hook.mm: _CFMachPortCreateWithPort2(): port 0x20583, shouldFreeInfo 0, returning
<CFMachPort 0x148e11580 [0x7ff859d07d70]>{valid = Yes, port = 20583, source = 0x0, callout = __NSFireMachPort (0x7ff818107788), context = <CFMachPort context 0x13e15fcd0>}
(hook.dylib) PrintStackTrace() + 0x68
(hook.dylib) Hooked__CFMachPortCreateWithPort2(__CFAllocator const*, unsigned int, void (*)(__CFMachPort*, void*, long, void*), CFMachPortContext*, unsigned char*, unsigned char) + 0xbd
(Foundation) -[NSMachPort initWithMachPort:options:] + 0x14d
(Foundation) +[NSMachPort parseMachMessage:localPort:remotePort:msgid:components:] + 0x167
(hook.dylib) +[NSObject(NSMachPortSwizzling) NSMachPort_parseMachMessage:localPort:remotePort:msgid:components:] + 0x4d
(Foundation) __NSFireMachPort + 0xa4
(hook.dylib) Hooked___NSFireMachPort(__CFMachPort*, void*, long, void*) + 0x31
(CoreFoundation) __CFMachPortPerform + 0x120
(CoreFoundation) __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE1_PERFORM_FUNCTION__ + 0x29
(CoreFoundation) __CFRunLoopDoSource1 + 0x26b
(CoreFoundation) __CFRunLoopRun + 0x96f
(CoreFoundation) CFRunLoopRunSpecific + 0x232
(Foundation) -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 0xd8
(Foundation) -[NSRunLoop(NSRunLoop) runUntilDate:] + 0x67
(unknown) unknown + 0x0
(libdispatch.dylib) _dispatch_call_block_and_release + 0xc
(libdispatch.dylib) _dispatch_client_callout + 0x8
(libdispatch.dylib) _dispatch_root_queue_drain + 0x2a1
(libdispatch.dylib) _dispatch_worker_thread2 + 0xa0
(libsystem_pthread.dylib) _pthread_wqthread + 0x100
(libsystem_pthread.dylib) start_wqthread + 0xf
This mac-virtualcam code code is running to deal with a mach port that's just "fired". Notice that _CFMachPortCreateWithPort2()
has returned CFMachPort 0x148e11580
.
The next line in the log is:
(Sat Oct 8 14:01:53 2022) /Applications/Firefox Nightly.app/Contents/MacOS/firefox[19894] [0x70000e786000] [NSMachPort parseMachMessage:]: localPort <CFMachPort 0x148c45a00 [0x7ff859d07d70]>{valid = Yes, port = 2090b, source = 0x148c78940, callout = __NSFireMachPort (0x7ff818107788), context = <CFMachPort context 0x14d2e9160>}, remotePort (null), msgid 0xe783ea4, components (
"<NSMachPort: 0x148e11580>",
{length = 8, bytes = 0x9e6505c9ce0e0000},
{length = 4, bytes = 0x1e000000},
{length = 4, bytes = 0x01000000}
)
Notice that the first item in components
is <NSMachPort: 0x148e11580>
, which has the same address as the CFMachPortRef returned by _CFMachPortCreateWithPort2()
. From looking at -(void)OBSDALMachClient handlePortMessage:(NSPortMessage *)message
you can tell that this object, at components[0]
, is used to initialize NSMachPort *framePort
.
So we've found what the CFMachPortRef that triggered this bug's crashes originally was, before it turned into a zombie.
@PatTheMav: I discovered that if, in my patch above, I restore the call to mach_port_dealloc()
(so that I'm calling it just before [framePort invalidate]
), it gets compiled out! I've never seen anything like this before, but then I've never used ARC (automatic reference counting).
I also had the idea of calling [framePort release]
instead of [framePort invalidate]
, but the compiler tells me this isn't allowed when using ARC. However I found I could play a trick on the compiler and call CFRelease((CFMachPortRef)framePort)
(like you do below using CFRelease(surface)
). This resolves the crashes. CFMachPortInvalidate()
still gets called on framePort
, and my STR no longer works. I wonder if it will also resolve the leaks.
diff --git a/plugins/mac-virtualcam/src/dal-plugin/OBSDALMachClient.mm b/plugins/mac-virtualcam/src/dal-plugin/OBSDALMachClient.mm
index c6126398c..a9c7212bf 100644
--- a/plugins/mac-virtualcam/src/dal-plugin/OBSDALMachClient.mm
+++ b/plugins/mac-virtualcam/src/dal-plugin/OBSDALMachClient.mm
@@ -114,8 +114,9 @@
IOSurfaceRef surface = IOSurfaceLookupFromMachPort(
[framePort machPort]);
- mach_port_deallocate(mach_task_self(),
- [framePort machPort]);
+ //mach_port_deallocate(mach_task_self(),
+ // [framePort machPort]);
+ CFRelease((CFMachPortRef)framePort);
if (!surface) {
ELog(@"Failed to obtain IOSurface from Mach port");
ARC quite apparently doesn't work on framePort
. I have a hunch the reason is that it's really a CFMachPortRef (created by _CFMachPortCreateWithPort2()
), and those aren't really covered by ARC. In many other respects NSMachPort *
and CFMachPortRef
are equivalent (toll-free bridged). But not this one.
I spoke too soon. My second patch doesn't work. My changes somehow didn't register unless I rebuilt OBS Studio entirely from scratch :-( This may also be the reason for the call to mach_port_deallocate()
apparently being "compiled away".
From this point, I'll re-clone each time and rebuild from scratch. Anybody know of a way to do make clean
?
For what it's worth, I have decent results with the following patch. I double-checked the obs-mac-virtualcam
module (in a disassembler) to make sure it really matched my change. Both [framePort invalidate]
and mach_port_deallocate()
showed up where they should be. I ran Activity Monitor on Firefox Nightly (the main process), assuming that any major leaks in obs-mac-virtualcam
would show up there. I didn't see any.
diff --git a/plugins/mac-virtualcam/src/dal-plugin/OBSDALMachClient.mm b/plugins/mac-virtualcam/src/dal-plugin/OBSDALMachClient.mm
index c6126398c..c6c0cde2a 100644
--- a/plugins/mac-virtualcam/src/dal-plugin/OBSDALMachClient.mm
+++ b/plugins/mac-virtualcam/src/dal-plugin/OBSDALMachClient.mm
@@ -114,6 +114,7 @@
IOSurfaceRef surface = IOSurfaceLookupFromMachPort(
[framePort machPort]);
+ [framePort invalidate];
mach_port_deallocate(mach_task_self(),
[framePort machPort]);
@PatTheMav, how do you measure leaks?
Anyone know of an "activity monitor" that will give you per-module memory use statistics?
Instruments has this built-in, it does leak checks automatically.
Using Instruments to detect leaks is a royal pain. I eventually figured out that I need to resign Firefox to give it a com.apple.security.get-task-allow
entitlement. Then Instruments crashed Firefox almost immediately. From the crash stack, it seems Instruments doesn't get along with Firefox's implementation of jemalloc.
So ... please try my latest patch and let me know if you see leaks with it.
I did a local mozilla-central build (a Firefox build from current trunk code) with jemalloc disabled (ac_add_options --disable-jemalloc
). Instruments' leak detector works fine with that. Then I ran my "fixed" OBS and performed the STR with which I was originally able to reproduce this bug's crashes. I switched cameras back and forth ten or more times, and saw no leaks that could be attributed to the obs-mac-virtualcam
plugin.
@PatTheMav, I await the results of your tests. If all goes well I'll do a pull request for my latest patch.
Alas without any changes or fixes Instruments reported hundreds of memory leaks with our current virtual cam implementation that I'm currently trying to get to the bottom of. There are several places where the handling of mach port rights don't seem right, as well as unnecessary IOSurface locks.
This will take me quite a bit to figure out, but as I mentioned above we're in the process of replacing this virtual camera implementation with a Camera Extension anyway, so we're probably only getting the legacy DAL plugin into a workable state and leave it at that.
Then I'll let you take the lead on this. Let me know when you want me to do a pull request.
But don't wait too long :-) These crashes are among the top five Mac crashes on Firefox betas and releases, and their numbers seem to be growing.
For what it's worth, I don't see any leaks (with my STR) in a local mozilla-central build with ASAN (Address Sanitizer) support. (This build also disables jemalloc.)
Annoyingly enough the leaks supposedly all happen within -[NSMachPort initWithMachPort:options:]
which calls _CFMachPortCreateWithPort2
down the line as you expected, initiated by +[NSMachPort portWithMachPort:options:]
.
These leaks happen continuously as long as virtual camera clients are connected and there is a visible spike in memory usage after sending a frame message, but overall memory consumption seems stable overall.
Also here's what leaks
has to say:
6 (464 bytes) ROOT CYCLE: <OS_dispatch_source 0x600001e01480> [128]
5 (336 bytes) ROOT CYCLE: 0x600001be2ae0 [112]
2 (112 bytes) ROOT CYCLE: 0x6000025486c0 [64]
1 (48 bytes) ROOT CYCLE: <__NSMallocBlock__ 0x600003fc26d0> [48] CoreFoundation ___CFMachPortCreateWithPort2_block_invoke 0x194601260
__strong [capture] --> CYCLE BACK TO <OS_dispatch_source 0x600001e01480> [128]
2 (112 bytes) 0x600002549ec0 [64]
1 (48 bytes) <__NSMallocBlock__ 0x600003fc3210> [48] CoreFoundation ___CFMachPortCreateWithPort2_block_invoke.3 0x194662490
Overall your suggested fix does get rid of the crashes, so I'll throw this in with some other fixes I have implemented. If you don't mind, I'd use the email address on your GitHub profile for the co-author attribution?
I've been doing my best to keep up with the back and forth here over the last few days, but I'm afraid it's almost entirely over my head.
I'll continue to try to follow along, but do let me know if there's any sort of testing I can help with in the meantime.
I'm so grateful @PatTheMav and @steven-michaud for the effort you're putting in.
@attaboyBrad Long story short:
It seems to be that if we don't invalidate the port (a sharing mechanism between processes on macOS) for the IOSurface (a macOS thing that represents image data) after receiving it in the Virtual Camera client (the thing that gets loaded into your browser or conferencing software), we exhaust the amount of ports available to IOKit (the macOS framework responsible for handling webcams, among other things), leading macOS to recycle ports, which IOKit really doesn't like and crashes.
I don't know why explicit deallocation doesn't do the trick, but I take any fix I can get.
Overall your suggested fix does get rid of the crashes, so I'll throw this in with some other fixes I have implemented. If you don't mind, I'd use the email address on your GitHub profile for the co-author attribution?
That's fine with me.
I get different results from leaks
. Here are some that I've picked out that seem to be relevant:
5 (192 bytes) ROOT LEAK: <AVCaptureDeviceFormat_Tundra 0x6000006d0ac0> [32]
4 (160 bytes) _internal --> <AVCaptureDeviceFormatInternal_Tundra 0x600000cbf450> [48]
3 (112 bytes) videoSupportedFrameRateRanges --> <NSArray 0x600000206b30> [16]
2 (96 bytes) __strong _object --> <AVFrameRateRange_Tundra 0x6000002060d0> [16]
1 (80 bytes) _internal --> <AVFrameRateRangeInternal_Tundra 0x60000203c690> [80]
5 (192 bytes) ROOT LEAK: <AVCaptureDeviceFormat_Tundra 0x6000006d8560> [32]
4 (160 bytes) _internal --> <AVCaptureDeviceFormatInternal_Tundra 0x600000c5d8f0> [48]
3 (112 bytes) videoSupportedFrameRateRanges --> <NSArray 0x60000026ea50> [16]
2 (96 bytes) __strong _object --> <AVFrameRateRange_Tundra 0x60000026ea30> [16]
1 (80 bytes) _internal --> <AVFrameRateRangeInternal_Tundra 0x6000022fc5a0> [80]
3 (96 bytes) ROOT LEAK: <AVCaptureDeviceFormat_Tundra 0x60000074c0e0> [32]
2 (64 bytes) _internal --> <AVCaptureDeviceFormatInternal_Tundra 0x600000c5adf0> [48]
1 (16 bytes) videoSupportedFrameRateRanges --> <NSArray 0x600000254cf0> [16]
2 (80 bytes) ROOT LEAK: <AVCaptureDeviceFormat_Tundra 0x6000001c3180> [32]
1 (48 bytes) _internal --> <AVCaptureDeviceFormatInternal_Tundra 0x600000cbb9c0> [48]
None of this, of course, comes directly from the plugin itself (obs-mac-virtualcam
).
I'm just doing leaks [firefox main process pid]
. These "tree format" results come at the end of a very long list of single-line items. (This is my first time using leaks
or Instruments
.)
Edit: I don't see ROOT CYCLE
anywhere in my leaks
output. Do you need to play tricks to generate it?
leaks OBS
was all I had to do while running a debug build of the app.
The leaks don't come from the Mach client part, but from the Mach server inside OBS when creating a new NSMachPort
with the port given by the IOSurface
.
OK. Now I see your results, in both leaks
and Instruments.
Here's a stack trace from Instruments:
_malloc_zone_malloc
_Block_copy
_dispatch_Block_copy
_dispatch_source_set_handler
_CFMachPortCreateWithPort2
-[NSMachPort initWithMachPort:options:]
+[NSMachPort portWithMachPort:options:]
-[OBSDALMachServer sendPixelBuffer:timestamp:fpsNumerator:fpsDenominator:]
virtualcam_output_raw_video(void*, video_data*)
default_raw_video_callback
video_output_cur_frame
video_thread
_pthread_start
thread_start
@PatTheMav,
If it's not too much trouble, would you be so kind as to provide instructions for how a layperson might test/implement your fixes from the linked pull request?
OBS 29 seems like a long time to wait...
@PatTheMav,
If it's not too much trouble, would you be so kind as to provide instructions for how a layperson might test/implement your fixes from the linked pull request?
OBS 29 seems like a long time to wait...
The artifacts are available here to download: https://github.com/obsproject/obs-studio/actions/runs/3257497328
Operating System Info
macOS 12
Other OS
No response
OBS Studio Version
28.0.0
OBS Studio Version (Other)
28.0.1
OBS Studio Log URL
https://obsproject.com/logs/Chwt_hwR_9cbLuE8
OBS Studio Crash Log URL
No response
Expected Behavior
I expected to switch from one breakout room to another without issue.
Current Behavior
The OBS virtual camera froze in Zoom, and in attempting to restart it the virtual camera ceased to show up as an option in Zoom and Zoom fell back to a raw webcam feed. This has been an issue since the first OBS 28 beta and was mentioned a couple of times in the beta announcement forum thread.
Steps to Reproduce
Anything else we should know?
This has been happening basically every time I try to enter, or especially to switch, breakout rooms in Zoom since the first OBS 28 beta (still an issue with 28.0.1).
It's not an issue on 27.2.4, and I've reverted back to that for the time being.