kyleneideck / BackgroundMusic

Background Music, a macOS audio utility: automatically pause your music, set individual apps' volumes and record system audio.
GNU General Public License v2.0
16.11k stars 675 forks source link

StopIO in BGMDriver is not triggered #470

Open neilm-code opened 3 years ago

neilm-code commented 3 years ago

Hi, thank you so much for the amazing work!

I found an issue with BGM_Device's StopIO: sometimes it's not called when apps stop using the BGM device, and I cannot find the reason why that happens. Therefore, the IO is left always running, consuming CPU and battery.

I tried to insert a LogWarning on the first line of BGM_StopIO and it does not even get called when this problem happens.

Any thoughts on why this might be happening? Thanks in advance.

alexmil23 commented 3 years ago

Was able to reproduce the bug. The same problem happens with StartIO too which causes an app to not be connected to the Background Music at all.

Steps to reproduce:

  1. Add LogWarning()s into BGM_StartIO and BGM_StopIO.
  2. Open some app (make sure that only that no other app is connected except this one though) that allows you to select audio devices (such as Skype) and switch devices back and forth (BGM Device <-> Built-in speaker) while playing test sound. Monitor the Console app's log.
  3. At some point you will see that the BGM_StartIO and BGM_StopIO are not called (happens once in ~5-7 tries).

Tested this on Big Sur 11.1.

alexmil23 commented 3 years ago

@kyleneideck Do you have any ideas? As far as I understand the AudioDriverPlugin's StartIO and StopIO are both called directly by HAL. What could be affecting those calls? Could WillDoIOOperation() be the cause?

kyleneideck commented 3 years ago

Thanks for the report. I think this has probably happened for me, too. I'd guess it might have started happening when Big Sur came out. It also seems like it might happen when the system wakes up from sleep, or at least happen more often when it does.

I actually had a similar bug in some related work where the HAL wasn't calling StopIO for every client when the driver deactivated/unpublished the device. It only happened on Big Sur. The fix was just to reset the client state (mClients) when the driver reactivated the device.

I'll look into applying the same changes to BGMDriver when I get a chance, since they might be helpful for this bug. The extra debug logging should help, at least. It logs more details about which clients are starting and stopping IO, so we'll be able to tell from the logs which clients BGMDriver thinks are still doing IO when they've actually stopped.

Hopefully the HAL is giving our driver some other indication that all clients have stopped IO or something like that. It might help to try logging every notification the driver gets from the HAL to see if we can use one of those to tell when BGMDriver should stop IO even though the HAL might not call StopIO for every client.

If the HAL really doesn't give us any indication, I'm not sure if there's anything we can do other than filing Radars about it. The only workaround I can think of is that we might be able to use kAudioServerPlugInIOOperationThread. If the HAL passes it to our BGM_EndIOOperation, but doesn't call BGM_StopIO for the same client, we can probably assume that it should have. And similarly for BGM_BeginIOOperation and BGMStartIO.

Actually, I just had a look at the code for that and it looks like it might already do that. So maybe that means the HAL isn't calling BGM_EndIOOperation either.

@alexmil23 Thanks, having repro steps is really helpful. I'll try them out when I get a chance and let you know what I find out.

It's interesting that you need to switch between devices to get the bug to happen. Presumably that would be making the app remove and re-add itself as a CoreAudio client each time. Does that mean that you can't reproduce the bug just by making an app stop and start IO repeatedly?

I think I might have had the HAL fail to call BGM_StartIO as well. At least, I've got a spindump from a time when audio wasn't playing and should have been. Although, it might have been that the audio had been playing and then stopped for no reason, even though the app playing the audio wouldn't have stopped and restarted IO. I guess I should have taken some notes with it.

The spindump doesn't show too much, unfortunately. Just that BGMDevice wasn't running IO. But that does help narrow down the problem and indicates that the bug might have been caused by StartIO not being called when it was supposed to be.

One other thing I should mention is that there's a bug (I think it's a bug) in BGMDriver where it doesn't allow for the possibility of one client starting multiple IO runs at the same time. That is, if the HAL calls BGM_StartIO twice for the same client and then calls BGM_StopIO once for that client, BGMDriver will assume that client has stopped IO completely. I haven't actually tested it yet, so I'm not sure if clients even can do that, but I think BGMDriver should still handle that case correctly to be safe.

I don't think that would explain why this bug is occurring, though, since it wouldn't keep the HAL from calling BGM_StopIO.

Could WillDoIOOperation() be the cause?

I can't see anything in our WillDoIOOperation that looks immediately suspicious to me. It more-or-less always returns the same values for each IO operation. Is there something about it that looks suspicious to you? It might help to have a look at some other driver's implementations of WillDoIOOperation to see what they do differently.

alexmil23 commented 3 years ago

@kyleneideck Thanks for the quick reply!

Presumably that would be making the app remove and re-add itself as a CoreAudio client each time. Does that mean that you can't reproduce the bug just by making an app stop and start IO repeatedly?

I tried to add LogWarning()s to the BGM_AddDeviceClient and BGM_RemoveDeviceClient and it looked like device clients were only added when an audio app (e.g. Skype or music player) was launched or terminated on the latest Big Sur. In other words, device clients are not added or removed when you select the BGM device in audio settings on Big Sur (they are added only on startup or termination of a music app process), at least on my system. Switching between devices in the audio app's settings seems to only invoke BGM_StartIO and BGM_StopIO which means that you can actually reproduce the bug by making an app stop and start IO repeatedly.

I also forgot to mention an additional step that I took to reproduce the bug: I disabled the logic that sets the BGM Device as system-wide default. Otherwise, it's hard to keep track of apps that interact/do IO with it.

Is there something about it that looks suspicious to you?

It doesn't look suspicious to me either, now that I took a look. Also, looked at similar implementations in other projects and they are all pretty much the same.

The fix was just to reset the client state (mClients) when the driver reactivated the device.

Please, could you elaborate on this a little more?

Thank you.

neilm-code commented 3 years ago

Man, looks like a nasty bug. Thanks for looking into this @kyleneideck and describing in detail @alexmil23. So I tried tweaking the code based on your comments above to test some of the assumptions.

Here are my findings:

So now, the two best guesses on what might be happening here that I have: Deadlock somewhere in BGM_TaskQueue OR Methods taking too much time to execute which causes the overload. I timed them and they sometimes take 5-7 milliseconds to execute which might be slow according to #367.

kyleneideck commented 2 years ago

@neilm-code Do you still have the code where you commented out the logic in BGM_Device::StartIO, StopIO, BeginIOOperation, EndIOOperation by any chance? I finally got around to trying it out, but the HAL still didn't always send the kAudioServerPlugInIOOperationThread IO operation when it was supposed to.

If you do still have it, could you post a patch or push it to a branch or something?

The way I'm reproducing the bug is by starting IO in two different apps and then stopping IO in one of them (without BGMApp open). As far as I can tell, the HAL doesn't give BGMDriver any indication that one of the apps has stopped IO, except that it presumably stops doing IO operations for that client.

So if the app that's still running IO was BGMApp, BGMDriver wouldn't know that it should update its kAudioDeviceCustomPropertyDeviceIsRunningSomewhereOtherThanBGMApp property and that would mean BGMApp wouldn't know to stop running IO as well.

The fix was just to reset the client state (mClients) when the driver reactivated the device.

Please, could you elaborate on this a little more?

I haven't gotten around to merging it yet (or figuring out whether it's actually useful to merge), but you can see the commit here: https://github.com/anteo/BackgroundMusic/commit/7d13ea0bb0f688a075c3fe2c4a6fe8146e6c86d0