RFC: WASAPI: handle COM initialization internally again

ligfx commented 5 years ago

I'd like to suggest a new approach to COM within cubeb, where cubeb would again initialize and handle COM transparently for the user. Currently, the requirement to initialize and understand COM falls on the user, which can be a complex and error-prone task.

For reference, cubeb used to initialize COM on behalf of the user, but that was changed in #424 to fix issues documented in #416. Cubeb was initializing COM on the stack for each API call, which could lead to crashes when objects outlasted their COM context.

The trick here is that COM objects are tied to the context on the thread they're created in— so, I'm proposing that cubeb create and manage a thread solely for creating COM objects. If the thread is a child of the main cubeb context, all COM objects will last for the lifetime of cubeb. Here's an example:

For testing purposes, I was able to create crashes by removing the global COM context in tests/common.h and only using auto_com within get_default_endpoint. This raises a number of exceptions, in IUnknown::Release, IMMDevice::Activate, IAudioClient::Start, and more.

~~Then, after adding a COMApartment thread to the WASAPI cubeb context, porting instances of CoCreateInstance, and removing all other calls to CoInitializeEx, the crashes disappeared.~~

~~Thoughts? If this makes sense, I can clean up my local branch and put up a pull request for more review.~~

EDIT: After reading through more COM documentation, I've come upon a simpler approach. As long as the COM Multi-Threaded Apartment (MTA) exists, every thread in the process is by default in the MTA. That means they can use COM functions without having to explicitly initialize COM.

We can ensure the existence of the MTA using an extra dummy thread (in Windows 8, CoIncrementMTAUsage / CoDecrementMTAUsage were added to make this easier). Then the rest of the WASAPI backend can assume that COM is initialized.

For testing purposes, I was able to create crashes by removing the global COM context in tests/common.h and only using auto_com within get_default_endpoint. This raises a number of exceptions, in IUnknown::Release, IMMDevice::Activate, IAudioClient::Start, and more.

Then, after adding an MTA dummy thread to the WASAPI cubeb context and removing all other calls to CoInitializeEx, the crashes disappeared

Thoughts? If this makes sense, I can clean up my local branch and put up a pull request for more review.

class MTACookie {
public:
  MTACookie() {
    this->ready_event = CreateEvent(NULL, 0, 0, NULL);
    this->stop_event = CreateEvent(NULL, 0, 0, NULL);
    this->thread = (HANDLE) _beginthreadex(NULL,
                                           8 * 1024,
                                           ThreadFunc,
                                           this,
                                           STACK_SIZE_PARAM_IS_A_RESERVATION,
                                           NULL);
    WaitForSingleObject(this->ready_event, INFINITE);
  }
  MTACookie(const MTACookie&) = delete;
  MTACookie& operator=(const MTACookie&) = delete;
  ~MTACookie() {
    SetEvent(this->stop_event);
    WaitForSingleObject(this->thread, INFINITE);

    CloseHandle(this->thread);
    CloseHandle(this->ready_event);
    CloseHandle(this->stop_event);
  }
  static unsigned
  ThreadFunc(LPVOID arg) {
    MTACookie* cookie = static_cast<MTACookie*>(arg);
    XASSERT(SUCCEEDED(CoInitializeEx(NULL, COINIT_MULTITHREADED)));
    SetEvent(cookie->ready_event);
    WaitForSingleObject(cookie->stop_event, INFINITE);
    CoUninitialize();
  }
  HANDLE thread;
  HANDLE ready_event;
  HANDLE stop_event;
};

kinetiknz commented 5 years ago

This seems like a reasonable approach, but I'm not a COM expert so it's hard to evaluate any potential trade offs. I'm not sure we'd want to use this approach in Gecko simply because the current solution seems to be working well (and until we ship sandboxed/remoted audio, we'd take the extra thread cost per-process rather than just once).

ligfx commented 5 years ago

Okay, being against the performance cost of the extra thread makes sense to me.

Where I’m coming from is, currently integrating cubeb into another application can require a lot of thinking around COM: what’s the difference between MTA and STA? does COM have to be initialized on every thread that access cubeb, or just when cubeb_init is called? what's the performance overhead of COM?

I think it would be great at the least to put more explanation around this in the documentation, or even better figure out how to move that logic back inside cubeb (without the performance costs). This would make integration into other apps easier, and make the COM requirements more transparent for others working on Firefox.

Couple notes on COM and WASAPI for posterity:

WASAPI’s objects don’t seem to rely on global COM state being kept around in-between calls: e.g., if you get an IAudioClient, you can de-initialize COM for the whole process, then re-initialize it, and the IAudioClient will still function afterwards. This, combined with the render thread keeping COM alive while audio is playing, is probably why cubeb never saw issues before the WASAPI Exclusive Mode PR
I investigated the issue with WASAPI Exclusive Mode more. It seems to be loading AUDIOKSE.DLL to access the audio device directly through WDM/KS [1], which then gets unloaded when COM is uninitialized (At the process level. The documentation for CoUninitialize[2] says that it unloads all DLLS at the thread-level— I didn’t see this behavior.). When the audio client starts, it then tries to access a block of memory which no longer exists (I assume some sort of shared memory). This can be prevented by keeping COM initialized anywhere in the process, or by keeping a handle to AUDIOKSE.DLL explicitly so it doesn’t get unloaded.
WASAPI seems to work fine in either MTA or STA mode (disregarding the note about Windows 8 [3] which I wasn’t able to test). Overall, the WASAPI objects seem to be relatively COM-agnostic.

My overall conclusion is that cubeb could safely manage COM internally with auto_com right now for most situations. For other situations, it seems to suffice to have COM initialized somewhere — e.g. the main STA thread that all Windows GUI programs would have.

I see a couple possibilities, from easiest to hardest:

Ignore all of this, and let developers figure it out like I did :smile:
Add documentation saying that COM needs to be initialized on the thread during all calls to cubeb, but not otherwise (I’d also note that initializing COM MTA anywhere initializes it implicitly everywhere [4], since I think that can make it easier for implementers). If exclusive mode gets merged, add more documentation noting that COM must stay initialized somewhere in the program.
Bring back using auto_com internally, and remove the requirement on COM being initialized for cubeb_init
Bring back using auto_com internally, and enforce that COM exists somewhere else in the program on cubeb_init (check that either the MTA is implicitly/explicitly initialized, or that initializing an STA doesn’t create the APTTYPE_MAINSTA [5])
Bring back auto_com, and refactor cubeb_stream’s thread to exist for the lifetime of the cubeb_stream, not just while the stream is started. This would ensure COM always exists. This is definitely the hardest, since I see there’s already been a couple BMO’s around the thread logic and it'd be easy to (re-)introduce bugs.

[1] https://app.assembla.com/spaces/portaudio/git/source/master/src/hostapi/wdmks/pa_win_wdmks.c [2] https://docs.microsoft.com/en-us/windows/win32/api/combaseapi/nf-combaseapi-couninitialize [3] https://docs.microsoft.com/en-us/windows/win32/api/audioclient/nn-audioclient-iaudioclient [4] https://devblogs.microsoft.com/oldnewthing/20130419-00/?p=4613 [5] https://docs.microsoft.com/en-us/windows/win32/api/combaseapi/nf-combaseapi-cogetapartmenttype

kinetiknz commented 5 years ago

There's a tiny bit of documentation here: https://github.com/kinetiknz/cubeb/blob/8c3e32bd24933f82220b66c88c845c4b2bdf28e7/include/cubeb/cubeb.h#L423 but it could certainly be improved. And https://github.com/djg/cubeb-rs/issues/45 shows one of the down sides to simply documenting this - you need to propagate that documentation note elsewhere (but at least the failure is fairly clear).

Suggestion 5 (use the stream thread to manage COM) is what I planned to do for the Rust rewrite of the WASAPI backend. We could use the same approach here, but it is the highest risk (less of an issue for rewrites/alternative backends).

Suggestion 4 seems closest to what we're trying to achieve now - wasapi_init checks for COM init and fails (with an assert, admittedly), the stream thread manages COM for itself while the stream is active. That just leaves re-inserting auto_com into the other API surface to cover being called from random threads?

https://bugzilla.mozilla.org/show_bug.cgi?id=1449555 has some details about strange COM behaviour we've seen in the wild, where the main thread should already be STA but COM acts as if it's not initialized. My concern in these cases is around what problems we may introduce by initializing COM in MTA in these situations.

Your investigation of the exclusive-mode COM issue is great, thank you for digging in to that!

ligfx commented 5 years ago

Regarding https://bugzilla.mozilla.org/show_bug.cgi?id=1449555 : eesh! That's a nasty bug. Though another motivator in my eyes for teaching cubeb to handle COM internally and not rely on external state.

My concern in these cases is around what problems we may introduce by initializing COM in MTA in these situations.

I think if COM acts as though it's not initialized, then it's probably actually not initialized, and auto_com won't mess things up. What sort of problems were you thinking of?

For suggestion 4, yes, I was thinking re-insert auto_com at the API surface. That would handle being able to call cubeb from arbitrary threads. I'd keep the assertion in to make sure users know about COM, and document the expectations (e.g. if you have COM elsewhere, like a GUI thread, you could just wrap wasapi_init in CoInitializeEx/CoUninitialize).

Long-term I think suggestion 5 would be the only way to ensure that COM always works in the presence of weirdness like 1449555, and to hide COM completely from the end-developer.

I can take a crack at 4 soon, if that sounds reasonable.

kinetiknz commented 5 years ago

Sorry for the slow replies.

Maybe we should try for suggestion 5 directly then, if we can implement it without too much risk. Thinking about the lifecycle of cubeb_streams inside Firefox, keeping the stream thread alive longer would be fairly low cost in terms of resources, since we don't tend to have idle (stopped) streams around anyway.

ligfx commented 5 years ago

Sorry for the slow replies.

No worries! I understand this isn't high priority.

Maybe we should try for suggestion 5 directly then, if we can implement it without too much risk.

The reconfigure and timeout portions of the render_loop function seem the most difficult here.

Reconfiguration would need to know whether the stream is started or not. It looks like it used to know that, but lost the capability in 3e29fbc, which also moved reconfiguration from into the render thread itself. What motivated that migration?

The wait timeout would necessarily need to be infinity when a stream is not started, since nothing would be triggering events. However, the commit which added the timeout (e032985) says "add additional sanity checks to catch leaked render threads", which implies there's some issue here. Do you have any more details?

This also might actually provide a way to clean up some of the emergency bailout logic. If the thread's lifetime matches that of the cubeb_stream's, then the cubeb_stream could stick around until the thread finishes, uses something like an internal ref count à la Boost's intrusive_ptr. Basically adding another option to https://bugzilla.mozilla.org/show_bug.cgi?id=1315194#c2 : share ownership of the resources between the render thread and the rest of the code.

kinetiknz commented 5 years ago

Reconfiguration would need to know whether the stream is started or not. It looks like it used to know that, but lost the capability in 3e29fbc, which also moved reconfiguration from into the render thread itself. What motivated that migration?

From memory, it was just to avoid doing any heavy lifting inside whatever thread OnDefaultDeviceChanged ran on to simplify reasoning about threading in the code. The change corresponds to BMO bug 1134102, which seems to confirm that guess.

However, the commit which added the timeout (e032985) says "add additional sanity checks to catch leaked render threads", which implies there's some issue here. Do you have any more details?

I think that was just for catching bugs early rather than for a known issue.

This also might actually provide a way to clean up some of the emergency bailout logic. If the thread's lifetime matches that of the cubeb_stream's, then the cubeb_stream could stick around until the thread finishes, uses something like an internal ref count à la Boost's intrusive_ptr. Basically adding another option to https://bugzilla.mozilla.org/show_bug.cgi?id=1315194#c2 : share ownership of the resources between the render thread and the rest of the code.

That sounds similar to option 2 in the BMO bug. Sounds good to me.

ligfx commented 5 years ago

I took a crack at this in #542 . Makes the lifetime of the render thread match the lifetime of the stream, converts cubeb_stream in the WASAPI backend to be ref-counted, keeps the reconfiguration logic inside the render thread, but removes the timeout logic entirely.

mozilla / cubeb

RFC: WASAPI: handle COM initialization internally again #534