Closed Archaeopteryx closed 1 year ago
Unfortunately our build process excluded symbols for the virtual camera module which makes this difficult to diagnose. We'll try to have this fixed for the next version and I'll keep an eye on the Mozilla crash reports to try and get a useful stack trace.
Hi, I'm working on this issue from Mozilla side.
To locate the crashing address, I built obs-virtualcam-module64.dll on tags/26.1.0 locally and compared its binary pattern to the minidumps in our crash reports. So far I could locate four different crashing patterns as below.
dst
was inaccessible.dst
was inaccessible.src_uv
was inaccessible.dst
was inaccessible.I hope this information helps investigation on the OBS project side.
Thanks for the information. Those hints are useful. I apologize for any issues it might be causing.
If anyone has any other information, particularly any information on how to reproduce, please let us know.
@jp9000 I notice that all the crashes are in the yuy2 conversion, which was added recently (4 months ago).
Do you know what the use case for this is? Like, what kind of setup is likely to use this and thus lead to STR?
If we can see that this code is buggy/crashing, can it be disabled in OBS? Or is it required for something? (And then what is it required for?)
Have any updated versions with symbols been released? We are continuing to get crash reports from Chrome that might also be this issue and I could analyze these much more easily with obs_virtualcam_module64 symbols.
I've only looked at one crash and it happens when chrome!media::VideoCaptureDeviceClient::OnIncomingCapturedData tries to make a virtual function call. The vtable pointer is invalid (c08f2200`a8280000) so this doesn't work. I am assuming that there is a use-after-free happening here but I can't be certain. The call stack is:
00 chrome!media::VideoCaptureDeviceClient::OnIncomingCapturedData 01 chrome!media::VideoCaptureDeviceWin::FrameReceived 02 chrome!media::SinkInputPin::Receive 03 obs_virtualcam_module64 ???
The module information is:
Timestamp: Mon Dec 14 09:57:59 2020 (5FD7C3C7)
CheckSum: 00000000
ImageSize: 000E0000
File version: 26.1.0.0
That stack trace is actually quite informative. I appreciate that, and I'm currently looking into it. Unfortunately it seems the symbol file for the virtualcam module for the current release was annoyingly lost, but I think we shouldn't need it as long as we have the addresses and image base. I'm really sorry for the trouble.
I also found a crash with obs-virtualcam-module32.dll. When my program starts, it will enums all camera devices in pc. Sometimes a crash occurs when it's the turn of obs-virtualcam-module32.dll Here is the crash stack: I compiled the obs-virtualcam-module32.dll symbol locally and forced it to load. obs-virtualcam-module.dll info: Any ideas for this crash?
Any updates?
Also, has there been consideration given to re-releasing the current version, but this time with symbols available? If I could download the PE file and PDB file then I could give a much more helpful analysis of what is happening in Chrome. Right now I don't even know if this Firefox crash is the same crash that we see in Chrome.
We're aiming to start the release process for v27 near the end of the month. This will include PDBs for the virtual camera module. Based on previous update roll-outs, we expect a majority of users to have updated within 24-48 hours of a final release.
Unfortunately we weren't able to figure out exactly where we are crashing, so there is no fix for the crash yet.
We've identified one cause of crashing that could occur when the user replaced the default placeholder.png with an image that was not the same resolution. The scaling could go out of bounds when the placeholder was displayed, which seems to correlate with the original report.
@slivermeteor Can you confirm if you had a modified placeholder.png when you experienced your crash?
I could manually resolve the callstack of the crash. The crash on Firefox is different from Chrome's crash or the placeholder's crash.
obs_virtualcam_module64!nv12_scale_nearest_to_yuy2
obs_virtualcam_module64!nv12_do_scale
obs_virtualcam_module64!video_queue_read
obs_virtualcam_module64!VCamFilter::ShowOBSFrame
obs_virtualcam_module64!VCamFilter::Frame
obs_virtualcam_module64!VCamFilter::Thread
obs_virtualcam_module64!std::thread::_Invoke<std::tuple<<lambda_4e6f779957cbddf8fba6c1db701dac17> >,0>
obs_virtualcam_module64!thread_start<void (__cdecl*)(void *),0>
I think the crash happened because the allocated buffer was too small. In the example below, accessing 00000130e4a9c000
caused AV because it was a free region. Probably the allocated buffer was [130e4800000, 130e4a9c000)
located above.
0:104> .excr
rax=0000000000040bf0 rbx=0000000000000186 rcx=0000000000000059
rdx=00000000000003c0 rsi=0000000000006db0 rdi=000000000000db60
rip=00007ffc917cacde rsp=0000001cd30bfc20 rbp=0000000000000000
r8=0000000000000551 r9=00000130e4a9c000 r10=00000000001032cc
r11=00000000000003c0 r12=000000000000030c r13=00000130e3e0db08
r14=0000000000000780 r15=00000130e3dba480
iopl=0 nv up ei pl nz na pe nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202
obs_virtualcam_module64+0xacde:
00007ffc`917cacde 418809 mov byte ptr [r9],cl ds:00000130`e4a9c000=??
0:104> !address
...snip...
130`e47c2000 130`e4800000 0`0003e000 MEM_PRIVATE MEM_RESERVE <unknown>
+ 130`e4800000 130`e4a9c000 0`0029c000 MEM_PRIVATE MEM_COMMIT PAGE_READWRITE <unknown>
+ 130`e4a9c000 130`e4b00000 0`00064000 MEM_FREE PAGE_NOACCESS Free
...snip...
Maybe this can happen if the video size gets smaller while streaming? In every frame, VCamFilter::Frame
handles such size change, but I cannot find how the size of a buffer, which is set in OutputPin::LockSampleData
, is updated.
OBS Studio 27.0.0-rc2 has been released with PDBs included for the OBS Virtual Camera DLLs, and the PDBs should be included in all future releases. While this won't help with tracking down crashes related to previous versions of OBS, this should help track down any crashes related to OBS Studio 27 and newer.
@notr1ch has been working on some bug fixes for the virtual camera. Some of those changes have already shipped (#4417, https://github.com/obsproject/obs-studio/commit/5c8587eab3164f8711e8a19791b5c4d54edf9fa0, https://github.com/obsproject/obs-studio/commit/670156db8bde1ab940f2f2489c70653334b7ec3b, https://github.com/obsproject/obs-studio/commit/45643adb036c3bb40a7e400203499abb4ad7a3dd, https://github.com/obsproject/obs-studio/commit/642490112036e1e6f51356d64cb39e3612786b09) and some are still in the works. He is more familiar with the code than I am, so he might be able to comment further on those changes.
If it's determined that there are multiple unique crashes being reported in this thread, could I trouble the reporters to open new issues for the ones that don't match the originally reported crash? That should help us target each crash separately without mixing up information.
OBS Studio 27.0.0-rc2 has been released with PDBs included for the OBS Virtual Camera DLLs, and the PDBs should be included in all future releases. While this won't help with tracking down crashes related to previous versions of OBS, this should help track down any crashes related to OBS Studio 27 and newer.
@notr1ch has been working on some bug fixes for the virtual camera. Some of those changes have already shipped (#4417, 5c8587e, 670156d, 45643ad, 6424901) and some are still in the works. He is more familiar with the code than I am, so he might be able to comment further on those changes.
If it's determined that there are multiple unique crashes being reported in this thread, could I trouble the reporters to open new issues for the ones that don't match the originally reported crash? That should help us target each crash separately without mixing up information.
Thanks, I had this happen to me twice the last 7 days on Firefox 88. Is this related to hardware problems, or entirely it is software?
I had been having some hardware problems that I hope were fixed by swapping the GPU.
@kgc35 If you go to about:crashes, and link some of the related reports, we may be able to investigate.
I did some crash queries and found four related crashes with the latest OBS and latest Chrome. It's too early to tell if the crash rate will be affected by the newest version but I don't think so. Here is the data from one of the crashes, with the OBS symbols:
0:016> .ecxr
rax=000000520e5ff618 rbx=000032f4002778e0 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000000 rdi=000000520e5ff618
rip=00007ff837d63fe9 rsp=000000520e5ff5e0 rbp=000000520e5ff8e0
r8=0000000000000000 r9=000000520e5ff608 r10=0000000000000000
r11=0000000000000246 r12=000000520e5ff8f0 r13=000000520e5ff8e0
r14=000032f40043a080 r15=000000520e5ff8c8
iopl=0 nv up ei pl nz na po nc
cs=0033 ss=0000 ds=0000 es=0000 fs=0053 gs=002b efl=00010206
chrome!media::VideoCaptureDeviceClient::OnIncomingCapturedBufferExt+0x199:
00007ff837d63fe9 488b06 mov rax,qword ptr [rsi] ds:00000000
00000000=????????????????
0:016> kc
** Stack trace for last set context - .thread/.cxr resets it
Call Site
00 chrome!media::VideoCaptureDeviceClient::OnIncomingCapturedBufferExt
01 chrome!media::VideoCaptureDeviceClient::OnIncomingCapturedData
02 chrome!media::VideoCaptureDeviceWin::FrameReceived
03 chrome!media::SinkInputPin::Receive
04 obs_virtualcam_module64!DShow::OutputPin::UnlockSampleData
05 obs_virtualcam_module64!DShow::OutputFilter::UnlockSampleData
06 obs_virtualcam_module64!VCamFilter::Frame
07 obs_virtualcam_module64!VCamFilter::Thread
08 obs_virtualcam_module64!VCamFilter::{ctor}::l2::
(note that I change the spelling of thread to threed to stop an emoji from being inserted there)
Does that help?
BTW, one of the four crashes had a slightly different build of the DLL. The version number is the same but the build date was April 2nd instead of April 12. FWIW:
0:015> lmv m obs_virtualcam_module64
Browse full module list
start end module name
00007ffcca120000 00007ffc
ca201000 obs_virtualcam_module64 T (no symbols)
Loaded symbol image file: obs-virtualcam-module64.dll
Image path: C:\Program Files\obs-studio\data\obs-plugins\win-dshow\obs-virtualcam-module64.dll
Image name: obs-virtualcam-module64.dll
Browse all global symbols functions data
Timestamp: Fri Apr 2 08:52:08 2021 (60673DA8)
CheckSum: 00000000
ImageSize: 000E1000
File version: 27.0.0.0
Note: the crash rate with 27.0 does seem to be much lower, so maybe the bug is fixed? We'll need a lot more users to upgrade to be sure. The low 27.0 crash rate might just represent low usage. Are there usage numbers you can share?
Note: the crash rate with 27.0 does seem to be much lower, so maybe the bug is fixed? We'll need a lot more users to upgrade to be sure. The low 27.0 crash rate might just represent low usage. Are there usage numbers you can share?
@kgc35 If you go to about:crashes, and link some of the related reports, we may be able to investigate.
Thanks for the updates. Unfortunately Firefox did not create a crash dump, so there is no file to analyze.
@randomascii how can I update to 27.0? It seems like the official release is still at 26.
I didn't update to 27.0 (I'm not an OBS user) but I grabbed the DLL and PDB files for 27.0 from here:
https://github.com/obsproject/obs-studio/releases/tag/27.0.0-rc2
There's also an installer
@randomascii As pointed out in https://github.com/obsproject/obs-studio/issues/4243#issuecomment-819011643, the Chrome crash has a different signature and may very well have a different root cause (so the lower crash volume you see may or may not have anything to do with the problem this issue is about...). Can you file a different issue (as already requested here https://github.com/obsproject/obs-studio/issues/4243#issuecomment-819159085)? This will prevents mixups in the analysis.
Good point. I filed this issue: https://github.com/obsproject/obs-studio/issues/4615
Unfortunately Firefox did not create a crash dump, so there is no file to analyze.
This makes it somewhat less likely it's this issue, as that is caught by our crash reporter. If the problem is persistent, it may be worth trying Firefox Nightly, which is able to catch and report some hardware or driver related crashes previous versions couldn't.
Yeah, Firefox did not generate any crash dumps, nor did Windows. However, I do have these error messages from Windows Reliability.
@kgc35 Thank you for sharing those screenshots! The exact same crash (the module was unloaded and the offset is 0xcdf7) has been filed as bug 1707060. Our WebRTC team is still looking, but at least it's different from the issue discussed here or bug 1691902.
@kgc35 Thank you for sharing those screenshots! The exact same crash (the module was unloaded and the offset is 0xcdf7) has been filed as bug 1707060. Our WebRTC team is still looking, but at least it's different from the issue discussed here or bug 1691902.
Thanks. What should I do in the meantime in order to avoid this problem? I regularly use OBS virtual camera in Firefox.
What should I do in the meantime in order to avoid this problem?
If you can comment in https://bugzilla.mozilla.org/show_bug.cgi?id=1707060 and provide more details or steps to reproduce the problem, it may help us track it down more quickly. As we don't understand the root cause yet I don't know of any workaround.
What should I do in the meantime in order to avoid this problem?
If you can comment in https://bugzilla.mozilla.org/show_bug.cgi?id=1707060 and provide more details or steps to reproduce the problem, it may help us track it down more quickly. As we don't understand the root cause yet I don't know of any workaround.
No problem. I am glad to help with this problem. I have mentioned that it is due to unloading the camera, when I end a call or switch a type of call on Doxy.me.
I also linked the images here at Bugzilla.
What should I do in the meantime in order to avoid this problem?
If you can comment in https://bugzilla.mozilla.org/show_bug.cgi?id=1707060 and provide more details or steps to reproduce the problem, it may help us track it down more quickly. As we don't understand the root cause yet I don't know of any workaround.
No problem. I am glad to help with this problem. I have mentioned that it is due to unloading the camera, when I end a call or switch a type of call on Doxy.me.
I also linked the images here at Bugzilla.
Are you able to reproduce the crash in something that doesn't require specific meeting rooms or signing up for the service, such as https://webcamtests.com/ or some other testing tool? Does this also happen with the doxybot precall test from https://intercom.help/doxyme/en/articles/2350624-video-issue-my-patient-can-t-see-me ?
Are you able to reproduce this with the virtual camera from OBS Studio 27.0.0-rc2? This will require either uninstalling OBS Studio 26.1.1 and installing 27.0.0-rc2, or running (as administrator) virtualcam-uninstall.bat
from OBS Studio 26.1.1 and running virtualcam-install.bat
from OBS Studio 27.0.0-rc2.
It seems to me that https://bugzilla.mozilla.org/show_bug.cgi?id=1691902 and https://bugzilla.mozilla.org/show_bug.cgi?id=1707060 are different crashes, so I don't think we should track both in this Issue. The replies on 1707060 also seem unclear if this is something we need to address in OBS or if it needs to be addressed in Firefox.
What should I do in the meantime in order to avoid this problem?
If you can comment in https://bugzilla.mozilla.org/show_bug.cgi?id=1707060 and provide more details or steps to reproduce the problem, it may help us track it down more quickly. As we don't understand the root cause yet I don't know of any workaround.
No problem. I am glad to help with this problem. I have mentioned that it is due to unloading the camera, when I end a call or switch a type of call on Doxy.me. I also linked the images here at Bugzilla.
Are you able to reproduce the crash in something that doesn't require specific meeting rooms or signing up for the service, such as https://webcamtests.com/ or some other testing tool? Does this also happen with the doxybot precall test from https://intercom.help/doxyme/en/articles/2350624-video-issue-my-patient-can-t-see-me ?
Are you able to reproduce this with the virtual camera from OBS Studio 27.0.0-rc2? This will require either uninstalling OBS Studio 26.1.1 and installing 27.0.0-rc2, or running (as administrator)
virtualcam-uninstall.bat
from OBS Studio 26.1.1 and runningvirtualcam-install.bat
from OBS Studio 27.0.0-rc2.It seems to me that https://bugzilla.mozilla.org/show_bug.cgi?id=1691902 and https://bugzilla.mozilla.org/show_bug.cgi?id=1707060 are different crashes, so I don't think we should track both in this Issue. The replies on 1707060 also seem unclear if this is something we need to address in OBS or if it needs to be addressed in Firefox.
Yes, I can try in the next two days.
Regarding the crashes, well I am inclined to think it is a problem on OBS' end, since people are reporting crashes in both Chrome and Firefox, independent of hardware. It seems that the virtual camera driver is the commonality.
What was changed in Release 27 that might address this issue?
While these may all look like "OBS virtual camera crashes" from the outside, there are distinct differences between them:
The original report in this thread is one crash reported in Firefox that we're looking in to as it clearly points to our code. This might be caused by custom placeholder images - if a user used the wrong size image it would cause a crash. This was fixed in v27, further testing is needed to confirm if this was the same crash as identified in this issue.
The Chrome report is a different crash in Chrome's code, de-referencing a null pointer. This may be our fault for not shutting down the virtual output properly, but it's a very different crash from the original report which is why it was moved to #4615.
Finally there is this report, where Firefox is crashing trying to access code in the filter that has been unloaded. Since we wait on the output thread as part of our filter destruction, it is unlikely that our own filter code is trying to execute after unloading, so Firefox may be holding a stale reference to the filter somewhere. It's again, unrelated to the original report in this thread, so I would prefer to keep discussion to the Firefox bug tracker at https://bugzilla.mozilla.org/show_bug.cgi?id=1707060. If after investigation it turns out to be something that's more likely to be on our end, we can open a separate issue for it.
While these may all look like "OBS virtual camera crashes" from the outside, there are distinct differences between them:
* The original report in this thread is one crash reported in Firefox that we're looking in to as it clearly points to our code. This might be caused by custom placeholder images - if a user used the wrong size image it would cause a crash. This was fixed in v27, further testing is needed to confirm if this was the same crash as identified in this issue. * The Chrome report is a different crash in Chrome's code, de-referencing a null pointer. This may be our fault for not shutting down the virtual output properly, but it's a very different crash from the original report which is why it was moved to #4615. * Finally there is this report, where Firefox is crashing trying to access code in the filter that has been unloaded. Since we wait on the output thread as part of our filter destruction, it is unlikely that our own filter code is trying to execute after unloading, so Firefox may be holding a stale reference to the filter somewhere. It's again, unrelated to the original report in this thread, so I would prefer to keep discussion to the Firefox bug tracker at https://bugzilla.mozilla.org/show_bug.cgi?id=1707060. If after investigation it turns out to be something that's more likely to be on our end, we can open a separate issue for it.
Understood, the root cause for each of the problems are different, although they come from the interaction between the browser and OBS virtual camera.
What are the steps to update my virtual camera to version 27? Will I need to uninstall all of OBS? I want to do some testing that will help us clarify this problem.
Regarding the crashes, well I am inclined to think it is a problem on OBS' end, since people are reporting crashes in both Chrome and Firefox, independent of hardware. It seems that the virtual camera driver is the commonality.
Minor pedantry: on Windows, the virtual camera is not a driver. It's a DirectShow filter.
What was changed in Release 27 that might address this issue?
See my earlier comment for changes made to the virtual camera code for OBS Studio 27. Though I don't know for sure that any of those changes would address the crash that you are seeing.
What are the steps to update my virtual camera to version 27? Will I need to uninstall all of OBS? I want to do some testing that will help us clarify this problem.
See my other earlier comment about this procedure:
This will require either uninstalling OBS Studio 26.1.1 and installing 27.0.0-rc2, or running (as administrator) virtualcam-uninstall.bat from OBS Studio 26.1.1 and running virtualcam-install.bat from OBS Studio 27.0.0-rc2.
Since 27.0.0-rc3 is now out, use that instead. You don't need to uninstall your userdata (profiles and scene collections). You just need to either:
Though, please also understand that while we try our best to prevent bugs, we don't consider RCs to be stable releases.
We're still receiving the crash reports from obs-virtualcam-module64.dll v27.
The crashing stack is the same as I mentioned in the earlier comment. Either the destination or source buffer was not accessible in nv12_convert_to_yuy2
.
The first crash (ed786d15-8ed2-49d9-ae5d-a2bd60210611) is read AV for 00000253015de000
. In my understanding the source buffer is a mapped file created by OBS Studio. As you can see below, there is a mapped region just above the crashing address. So I believe the problem was buffer copy was started with a wrong (or old) frame size.
0:120> r
Last set context:
rax=000000000000004a rbx=0000025302b18c00 rcx=00000253015de002
rdx=0000025302b18480 rsi=000000e47070bca0 rdi=0000000000000002
rip=00007ff9685db66a rsp=0000009aebe3fce8 rbp=0000000000000000
r8=000002530158c300 r9=0000025302b18c00 r10=00000253015ddb40
r11=0000000000001100 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na pe cy
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010283
obs_virtualcam_module64!nv12_convert_to_yuy2+0x4a:
00007ff9`685db66a 0fb641fe movzx eax,byte ptr [rcx-2] ds:00000253`015de000=??
0:120> !address
...
253`012fe000 253`01300000 0`00002000 MEM_PRIVATE MEM_RESERVE <unknown>
+ 253`01300000 253`015de000 0`002de000 MEM_MAPPED MEM_COMMIT PAGE_READONLY <unknown> [................]
+ 253`015de000 253`01700000 0`00122000 MEM_FREE PAGE_NOACCESS Free
...
I'm seeing Firefox 95.0.2 crash everytime I end a Google Meet meeting (or disable my camera in the meeting) with OBS 27.1.3
Are you using the virtual camera as your meeting camera? If so, what canvas resolution are you using in OBS?
Same here, FF 95.0.2 crashes every time I close out of a google meeting with OBS 27.1.3 with virtual camera running at 1280x720. I'm not sure if it's 100% of the time, but it's pretty close. obs-virtualcam-module64.dll | CFGControl::SetFirstVW is the culprit in FF crash reports, EXCEPTION_ACCESS_VIOLATION_EXEC
Here is one of my crashes: https://crash-stats.mozilla.org/report/index/16bd3089-8c58-4407-9cec-85ca30220104
Does it still occur if you run OBS at 1920x1080?
Does it still occur if you run OBS at 1920x1080?
Yes it occurs if I run the virtual camera at 1920x1080 and 1280x720.
https://crash-stats.mozilla.org/report/index/88da9854-bc4e-4e99-a14f-78b930220111
and https://crash-stats.mozilla.org/report/index/df95af1b-af6d-4c15-a7ce-9812e0220111
Used Google Meet for a little over an hour; Firefox crashed immediately upon clicking end call. OBS base and output resolutions are 1920x1080
Another crash https://crash-stats.mozilla.org/report/index/a70c755a-0c67-4ec8-922c-c50c80220113
Used Google Meet for a little over an hour; Firefox crashed immediately upon clicking end call. OBS base and output resolutions are 1920x1080
Another crash https://crash-stats.mozilla.org/report/index/a70c755a-0c67-4ec8-922c-c50c80220113
Is this with the same steps-to-reproduce as your previous comment? Please include that information when submitting crash reports. Thanks!
Yes; same steps. Updated comment.
Frame 0 and 1 seem to always be consistent (pasting here for search visibility)
Frame 0: obs-virtualcam-module64.dll@0x000000000000e247
Frame 1: void CFGControl::SetFirstVW(struct IVideoWindow*)
Frame 2 seems to be variable (probably a memory address?:)
and then the rest of the stack seems consistent as well:
Frame 3: webrtc::videocapturemodule::VideoCaptureDS::~VideoCaptureDS()
Frame 4: rtc::RefCountedObject<webrtc::videocapturemodule::VideoCaptureDS>::~RefCountedObject()
...
@crossan007 @SD777x Your reports resemble Mozilla Bug 1707060 more than the original crash for this Issue, Mozilla Bug 1691902. See the following in each crash stack:
xul.dll webrtc::videocapturemodule::VideoCaptureDS::~VideoCaptureDS
xul.dll rtc::RefCountedObject<webrtc::videocapturemodule::VideoCaptureDS>::~RefCountedObject
xul.dll rtc::RefCountedObject<webrtc::videocapturemodule::VideoCaptureDS>::Release
As previously discussed in this Issue, I'd recommend commenting there. If the Mozilla folks monitoring that bug think that what you have experienced is actually an OBS bug, I'd recommend opening a new GitHub Issue for the crash you're experiencing to keep that tracked separately from this one. If you do open a new GitHub Issue with us, please make sure to include an OBS log from a session where the issue occurred.
Please include as much detail as you can regarding the steps-to-reproduce when commenting on Mozilla Bug 1707060 or if you end up opening a new GitHub Issue with us.
Having said that, we've thus far been unable to reproduce the crash you've described in Firefox 95.0.2 or 96.0.1.
I neglected to follow up on this post with an update. Quite some time ago, we were able to reproduce a crash that we suspect was this one and determined the cause. The crash that we saw happens when the camera is being stopped/unloaded.
Something in Firefox (or maybe the WebRTC library or elsewhere) is fully unloading the DLL without waiting for the camera's filter to finish being destroyed. The destroy is a synchronous operation so it means that there's likely some race condition in Firefox or one of its libraries.
The virtualcam module stops its thread in its filter destructor by signaling the thread to stop and then joining the thread. However, that thread can take some number of milliseconds to stop because it may be waiting or sleeping, thus it doesn't always immediately stop, and thus can possibly stall for some number of milliseconds while waiting for the thread to join. It seems that Firefox is not waiting for the filter to destruct, and in some other thread, UnloadLibrary is called on the virtualcam module DLL, causing data that the destructor uses to become invalidated before it can finish, and thus causing a crash when code tries to use that data.
Basically, there's a likely a race condition where in Firefox which will unload the DLL regardless of whether it's still waiting to destroy the filter elsewhere.
I originally tried to patch our DLL to not wait in the filter destructor, but the filter is designed to spawn a thread on construction and cleanup on destroy, rather than spawn a thread on filter start and kill the thread on filter stop. When I first tried to fix it, the fix to this was not as trivial as it sounds, and I unintentionally introduced more bugs which were even worse than Firefox's race condition. So I had to revert it at the time because I had a pressing release to make, and thus the bug is still present and I need to go back to it for a more thorough patch, but ultimately I do not believe that the bug is technically the fault of OBS. It's just triggering a race condition bug that something inside of Firefox itself has due to its somewhat unconventional design.
I still intend to fix that design flaw, I've just been pulled left and right constantly and it unintentionally fell by the wayside yet again. But again, I do want to emphasize that I believe that this is a bug with something within Firefox itself (not sure if it's WebRTC or what exactly is the cause, I don't recall hearing about this happening on Chrome but I could be wrong).
Firefox doesn't know anything about OBS at all, let alone this DLL, so this analysis seems a bit weird to me. We can't call UnloadLibary on a DLL we don't even know the existence of.
Firefox just uses the DirectShow Capture API, and OBS is presenting itself as a camera there. This is done in the WebRTC library which is shared with Chrome/Chromium/whatever else (and who had or have the same crash...): https://searchfox.org/mozilla-central/source/third_party/libwebrtc/modules/video_capture/windows
https://crash-stats.mozilla.org/report/index/7fd6b54d-f1e8-4b4f-a1fa-0eb4d0220619#allthreads Look for: Thread 70, Name: VideoCapture https://hg.mozilla.org/releases/mozilla-release/file/1ff2cec0bb36e389df1a209a9f882b443ed48495/third_party/libwebrtc/webrtc/modules/video_capture/windows/video_capture_ds.cc#l47
Crashes happen when we release our (MSCOM) reference to the IBaseFilter*
we were holding. So we aren't freeing or unloading anything, we're signaling to Windows we are no longer using a capture device, i.e. reducing the reference count on it.
The code literally does
someIBaseFilterPtr->Release();
someIBaseFilterPtr = nullptr;
It's quite conceivable that if this is the last reference, the Windows DirectShow/COM machinery starts cleaning up but that isn't anything we have any control over!
As far as the WebRTC code knows, it's asking Windows for anything that supports CLSID_VideoInputDeviceCategory
, gets a reference to it, and then drops the reference when it is no longer required (at this point OBS, and only OBS, sometimes crashes). It doesn't explicitly load or unload any DLL's, that's all OBS/DirectShow/Windows.
(This is potentially a different issue than was first reported here, which was a mis-sized buffer in OBS, and a different one the Chrome team complained about, which is frame callbacks being called even after the capture was stopped)
Given that the DLL is unloaded when Firefox releases our MSCOM reference to IBaseFilter* that presumably means that when OBS starts up it needs to increment its own reference count while it has threads running, or something similar. I'm not a COM expert but the problem of COM objects with independently running threads must be a common issue with standard solutions.
From a bit of searching a few options cropped up: 1) Call LoadLibrary on yourself, and don't call UnloadLibrary until you are not executing any code from yourself anymore (tricky, since that final UnloadLibrary call would presumably be in the library) - https://devblogs.microsoft.com/oldnewthing/20191127-00/?p=103153 2) Increment the reference count for each thread - each reference should increment the reference count, which is obvious when you say it out loud: https://devblogs.microsoft.com/oldnewthing/20081103-00/?p=20353 3) Implement DLLCanUnloadNow - https://docs.microsoft.com/en-us/windows/win32/api/combaseapi/nf-combaseapi-dllcanunloadnow and https://pubs.opengroup.org/onlinepubs/009899899/CHP06GDC.HTM and https://www.ooportal.com/basic-com/module3/implementDllCanUnloadNow.php
I'm not a COM expert so I'm not sure how well these different techniques work. The tricky part is that if you increment your own reference count then you need to decrement it (or call UnloadLibrary to decrement the DLLs reference count) and that means that you are running your code that will cause your code to unload, which is racy.
I think that DLLCanUnloadNow is probably the simplest and safest solution but I'm not sure.
@gcp
I completely understand your defensiveness, and to be clear, I would like to point out that I did try to very carefully say "something in Firefox"; I was implying the Firefox process holding the filter. I should have stated that a bit more clearly. Indeed, it's completely possible maybe it could be Windows DirectShow/COM machinery code, however it does not really change the fact that a race condition has to exist somewhere outside of the virtual camera module code. I'll make some speculations a bit later about this.
And yes, as I stated it happens during the destruction of the filter, thus it would be in the Release() call of the final reference, where ever that may be. So the code you pointed out is likely where it could crash indeed if it was the last reference holder. However, to state the issue again, something somewhere else is unloading the library before that Release() call there completes, assuming it's the final reference. It really has to be specifically UnloadLibrary() somewhere. That was what we discovered. I am simply felt it was important to report that.
I already acknowledged the flaw that the virtual camera module has by using the construction/destruction rather than the start/stop calls, but the library being unloaded during destruction is something that I felt was still worth reporting here. If it seemed like I was trying to place any specific blame for this I apologize; that is not my intent, so please do not take it that way. That being said, that condition will still remain as a possibility regardless of whether I change the destruction logic or not, so again, I felt that it was very important that I note it here, because that is not something that the virtual camera module can control, and it will always be a latent race condition that exists somewhere.
Something, somewhere in a Firefox process that uses the filter, is unloading the virtual camera module library before is has a chance to finish destroying, invalidating memory being used during destruction, causing the crash. Whether that's Firefox itself, one of its dependencies, or something deeper in Microsoft, that was something I found to be the case.
As for the one the Chrome team complained about, that would likely be due to the the design flaw as well. I take responsibility for the design flaw there because I'm the one who wrote that. Sadly I made things a little bit difficult for myself, otherwise I would have fixed it last patch as I had intended rather than have delayed it for the next patch. I can only blame myself for that one.
I intend to fix the start/stop logic for the next patch, and while this may remedy the situation I'm reporting, it will not solve the underlying UnloadLibrary() race condition that exists somewhere outside of the virtual camera module in the process using it. It was still important that I report it. I'm just putting extra emphasis on this here because I want people to understand that writers of filter modules have absolutely no control over libraries being unloaded, except maybe by forcing an extra LoadLibrary reference to itself on library initialization, but that would have the negative effect of forcing the library to always be loaded in memory until the process using it shuts down.
I don't think that anyone using the DirectShow API is going to be handling modules directly, and now that I think about it, I would agree that it is unlikely that any of the people writing any of that code is overtly unloading any libraries themselves (I hope). I think it may just be a threading issue somewhere with COM/DirectShow. COM is very specific about threading, and how COM behaves with multiple threads is dependent upon how COM is initialized. DirectShow is also notoriously picky about threading as well. Your speculation that it could be within the Microsoft machinery somewhere may indeed be correct, although it could also be that the Release() call is occurring in a thread other than the thread in which the object was created/used in conjunction with COM being initialized in mode that doesn't support that. Similarly, it's possible that DirectShow might have its own thoughts on the threading there, but I'd be more likely to lean towards COM. So now that I think about, the thing I'd be most likely to point my finger at is just threading usage with either COM or DirectShow. (Have I mentioned that I absolutely hate DirectShow passionately?)
However, that's just my idle speculation based upon my experience with COM and DirectShow; I have not browsed the WebRTC/Firefox code enough to know whether that's the case. If COM was properly initialized in multithreading mode, then it may very well be a latent bug in Microsoft's code somewhere that I've only triggered by happenstance. But either way, it may be worth checking COM initialization code relative to camera filters, and checking to see what threads the filter Release() calls can occur in, and making sure that the threads where those things are occurring make sense.
@randomascii
I appreciate the feedback. DLLCanUnloadNow might be helpful, however I think that just fixing the construction/destruction logic in the virtual camera module is all that needs to be done to remedy it. If I am unable to spend adequate time on this fix, I will use that or some other simple remedy to ensure that the library can't be unloaded until filter destruction is complete.
I'm somewhat of the opposite opinion, we're clearly doing something wrong if every other DLL seems to behave fine. There's no "latent bug in Microsoft's code", we must have some code that is violating the requirements of what a COM DLL is supposed to do somewhere.
Looking over the code again, we do seem to have a DllCanUnloadNow
function which returns S_OK
as long as nothing has called VCamFactory::LockServer
which seems immediately wrong. If we're telling DShow that it's OK to unload our DLL, then we shouldn't be surprised when DShow unloads our DLL prematurely.
Thanks for the follow-up. I was surprised by the diagnosis but going back I see that I missed that this was bug 1707060 not bug 1691902 as I was coming from. Unfortunately the crash rate for bug 1707060 is fairly high (150-400 crashes per day). I understand that at this point 3 separate problems are multiplexed in this GitHub issue.
https://bugzilla.mozilla.org/show_bug.cgi?id=1707060#c1 and https://bugzilla.mozilla.org/show_bug.cgi?id=1707060#c17 are super-careful to not claim OBS is the one with the bug for this specific one :-)
That said, I think the high crash rate is because OBS is the most popular capture solution. However, if we'd have a generic bug in the lifetime management of DirectShow capture filters, we should have much more or other crashes with webrtc::videocapturemodule::VideoCaptureDS::~VideoCaptureDS
in the stack. I'll try to verify that this is not the case, which could then confirm that at least OBS is special (in an unfortunate way) here.
Regarding DllCanUnloadNow
, we don't call CoFreeUnusedLibraries()
directly but we do certainly call CoUninitialize()
which would then call that (AFAIK).
Platform
Operating system and version: Windows (7, 8.1, 10) OBS Studio version: 26.*
Expected Behavior
Firefox doesn't crash on websites accessing webcams.
Current Behavior
I (person looking at crash reports for Firefox) and people I asked cannot reproduce the issue but some users encounters on websites which access web cams (e.g. Google Meet, https://webcamtests.com/ , BigBlueButton installs; interestingly no mention of Zoom). Recent crash reports (the "Reports" tab provides access to individual crash reports).
Steps to Reproduce
unknown
Additional information
Bug in Mozilla's bug tracker: bug 1691902