nicklan / pnmixer

Volume mixer for the system tray
GNU General Public License v3.0
152 stars 32 forks source link

spurious warning dialog: Connection to sound system failed. #182

Open foresto opened 6 years ago

foresto commented 6 years ago

Fairly often while I'm using multiple applications that produce sound, pnmixer throws up a dialog box:

Warning: Connection to sound system failed.
Do you want to re-initialize the audio connection ?

Reading through the code, it looks like poll_watch_cb() is calling g_io_channel_read_chars(), and treating a G_IO_STATUS_NORMAL status as an error. Isn't that a mistake? I'm not familiar with the API, but it looks to me like that status just indicates that there's still more data to read on the channel, and since g_io_channel_read_chars() is called in a loop, that data will be cleared on a subsequent iteration.

If that's true, then couldn't the annoying dialog boxes be banished by simply continuing the loop on G_IO_STATUS_NORMAL, without triggering an error callback?

elboulangero commented 6 years ago

Honestly I don't know, this part of the code is not an easy one. Did you try to make the change you suggest? Does it fix your problem?

hasufell commented 6 years ago

Relevant code: https://github.com/nicklan/pnmixer/blob/903298b9402198c14d480a17ab46217385aa1f84/src/alsa.c#L684-L719

Most of the time we hit G_IO_STATUS_AGAIN with 0 bytes read, which means that we break the loop in the next iteration. I was able to hit G_IO_STATUS_NORMAL with non-zero bytes read when I play around with the volume slider in pulseaudio like a madman.

The question now is what does it really mean when read non-zero bytes. Afaiu we usually expect alsa to clear the channel and so we read no data. If we read data, it means alsa failed to clear the channel, but is that bad?

I did:

        case G_IO_STATUS_NORMAL:
            WARN("G_IO_STATUS_NORMAL");
            continue;

and it seemed to be fine. We get no infinite loop. But I really am not sure either what the correct way is to treat the alsa gio channel. Maybe we should use snd_pcm_poll_descriptors_revents instead and deal with that?

But the documentation is so bad, it's not really clear what any of those things means.

This does also some polling: https://github.com/ldm5180/hammerhead/blob/master/hab/alsa/check-alsa-poll.c

hasufell commented 6 years ago

Another example of alsa polling https://github.com/sardemff7/libgwater/blob/master/alsa-mixer/libgwater-alsa-mixer.c

foresto commented 6 years ago

Did you try to make the change you suggest? Does it fix your problem?

I made the change, along with a WARN message similar to hassufell's. I'm now waiting for the problem to trigger again so I can see the results.

foresto commented 6 years ago

Okay, so the change I suggested seems to run just fine, but the dialog box still pops up from time to time, along with these session log messages:

error: alsa.c: GIO error has occurred error: main.c: Connection with audio failed, you probably need to restart pnmixer.

I don't know exactly what's triggering it, but if it's going to keep happening, perhaps what I need here is an "always re-initialize" option to keep that pop-up dialog box from nagging me all the time?

elboulangero commented 6 years ago

Maybe in this case it makes sense to re-initialize without prompting the user. No need for an "always re-initialize" option, PNMixer should handle errors by itself when it can.

You can look at the code that is triggered when user manually re-initialize PNMixer for an example, it shouldn't be complicated.

hasufell commented 6 years ago

I have a feeling that pulseaudio async nature does not play well with alsa and actually messes with the alsa daemon.

@foresto are you running an alsa-only setup?

foresto commented 6 years ago

Yep. Pure ALSA over here.

hasufell commented 6 years ago

I've created a PR for @elboulangero suggestion, but I would still say we need to investigate our error handling and alsa polling.

@foresto could you test the auto-reload branch?

foresto commented 6 years ago

@hasufell I applied the auto-reload branch changeset as a patch to version 0.7. Triggering the new code path avoided the pop-up dialog, but left pnmixer with a red X over its notification tray icon, and clicking it resulted in a segfault.

SegvAnalysis:
 Segfault happened at: 0x55b6d32ebc4c <alsa_card_get_volume+12>:    mov    0x20(%rdi),%rbp
 PC (0x55b6d32ebc4c) ok
 source "0x20(%rdi)" (0x00000020) not located in a known VMA region (needed readable region)!
 destination "%rbp" ok
SegvReason: reading NULL VMA
SourcePackage: pnmixer
Stacktrace:
 #0  0x000055b6d32ebc4c in alsa_card_get_volume ()
 #1  0x000055b6d32eccbf in audio_set_volume ()
 #2  0x00007f37f4e611d6 in  () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
 #3  0x00007f37f4e7c79f in g_signal_emit_valist () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
 #4  0x00007f37f4e7cecf in g_signal_emit () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
 #5  0x00007f37f4e60f9d in g_closure_invoke () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
 #6  0x00007f37f4e73cde in  () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
 #7  0x00007f37f4e7c4b5 in g_signal_emit_valist () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
 #8  0x00007f37f4e7cecf in g_signal_emit () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
 #9  0x00007f37f58e3dc7 in  () at /usr/lib/x86_64-linux-gnu/libgtk-3.so.0
 #10 0x00007f37f5a73a2d in  () at /usr/lib/x86_64-linux-gnu/libgtk-3.so.0
 #11 0x00007f37f5a0fcf1 in  () at /usr/lib/x86_64-linux-gnu/libgtk-3.so.0
 #12 0x00007f37f4e60f9d in g_closure_invoke () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
 #13 0x00007f37f4e73758 in  () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
 #14 0x00007f37f4e7be4f in g_signal_emit_valist () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
 #15 0x00007f37f4e7cecf in g_signal_emit () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
 #16 0x00007f37f5a7732a in  () at /usr/lib/x86_64-linux-gnu/libgtk-3.so.0
 #17 0x00007f37f5a0f5a7 in  () at /usr/lib/x86_64-linux-gnu/libgtk-3.so.0
 #18 0x00007f37f4e611d6 in  () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
 #19 0x00007f37f4e7c16d in g_signal_emit_valist () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
 #20 0x00007f37f4e7cecf in g_signal_emit () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
 #21 0x00007f37f5b5a174 in  () at /usr/lib/x86_64-linux-gnu/libgtk-3.so.0
 #22 0x00007f37f5a0c43e in  () at /usr/lib/x86_64-linux-gnu/libgtk-3.so.0
 #23 0x00007f37f5a0e5b8 in gtk_main_do_event () at /usr/lib/x86_64-linux-gnu/libgtk-3.so.0
 #24 0x00007f37f550c9a5 in  () at /usr/lib/x86_64-linux-gnu/libgdk-3.so.0
 #25 0x00007f37f553e172 in  () at /usr/lib/x86_64-linux-gnu/libgdk-3.so.0
 #26 0x00007f37f4b87fb7 in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
 #27 0x00007f37f4b881f0 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
 #28 0x00007f37f4b88502 in g_main_loop_run () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
 #29 0x00007f37f5a0d695 in gtk_main () at /usr/lib/x86_64-linux-gnu/libgtk-3.so.0
 #30 0x000055b6d32eb058 in main ()
hasufell commented 6 years ago

Can you run with --debug command line flag? I suspect it fails to hook a soundcard in audio_hook_soundcard.

foresto commented 6 years ago

Output from patched version running with --debug option:

warning: alsa.c: Alsa failed to clear the channel debug: audio.c: Dispatching signal 'card to be reloaded' from 'unknown', vol=51, muted=no debug: audio.c: Unhooking soundcard from the audio system debug: alsa.c: 'hw:0': Closing mixer debug: audio.c: Dispatching signal 'card cleaned up' from 'unknown', vol=0, muted=yes debug: audio.c: Hooking soundcard 'SB Audigy 5/Rx [SB1550] ((null))' to the audio system debug: audio.c: Could not hook soundcard, trying every card available debug: alsa.c: 'default': Opening mixer error: alsa.c: 'default': Can't attach card to mixer: No such file or directory debug: audio.c: No soundcard could be hooked ! debug: audio.c: ** Dispatching signal 'no card' from 'unknown', vol=0, muted=yes

elboulangero commented 6 years ago

Are you familiar with GLib? I think that the code that tries to hook the soundcard shouldn't be executed immediately (as it is in the current code), but should be delayed with a g_idle_add(). You can try to experiment a bit with that and see if it works. Beware, all this audio-related code is kind of ugly.

foresto commented 6 years ago

I tried calling audio_reload() via g_idle_add(), but got the same crash. Here's the code I used (along with the patch from earlier):

gboolean
do_audio_reload(gpointer data)
{
    audio_reload((Audio *)data);
    return FALSE; /* Run this callback only once. */
}

static void
on_audio_changed(Audio *audio, AudioEvent *event, G_GNUC_UNUSED gpointer data)
{
    switch (event->signal) {
    case AUDIO_CARD_DISCONNECTED:
        audio_reload(audio);
        break;
    case AUDIO_CARD_ERROR:
        if (run_audio_error_dialog() == GTK_RESPONSE_YES)
            audio_reload(audio);
        break;
    case AUDIO_CARD_RELOAD:
        g_idle_add(do_audio_reload, audio);
        break;
    default:
        break;
    }

}

Interesting that audio_reload() seems to work in response to a dialog box but crashes in an idle callback.

elboulangero commented 6 years ago

What I see in your logs is debug: audio.c: No soundcard could be hooked !.

Looking in the code: https://github.com/nicklan/pnmixer/blob/44fb908b1662b946e764a905f290ffdb65cbb397/src/audio.c#L716-L720

You get that log if soundcard is NULL. And just the line before, we've set audio->soundcard = soundcard. So audio->soundcard is NULL at this point.

Then when I look at the stack trace you've pasted:

 #0  0x000055b6d32ebc4c in alsa_card_get_volume ()
 #1  0x000055b6d32eccbf in audio_set_volume ()

Then looking at the code again: https://github.com/nicklan/pnmixer/blob/44fb908b1662b946e764a905f290ffdb65cbb397/src/audio.c#L566-L581

As you can see, if audio->soundcard is NULL, then the function return. We don't go further. However your stack tract shows that the code is calling alsa_card_get_volume ().

So you should investigate in this direction. It looks like there's a mismatch between the logs and the stack trace you've pasted. Maybe you're experiencing different issues at the same time. Ensure you're running PNMixer from git master, that you don't have weird patches around, always run pnmixer from the console with debug logs. Then add debug messages where suitable, according to the output you et in the console.

If you can reproduce the bug, then you can solve it :D

foresto commented 6 years ago

However your stack tract shows that the code is calling alsa_card_get_volume ().

I'm using version 0.7 (as mentioned above), not git master. It doesn't have your "automatically reload audio when needed" commit that added the if (!soundcard) check. Even so, I would expect audio_reload() to behave the same way regardless of whether run_audio_error_dialog() is called.

Maybe you're experiencing different issues at the same time.

Probably. I'm trying to get rid of the annoying dialog box. The crash in the patched version occurred later, after the dialog box was skipped, when I saw a red X on pnmixer's icon and tried adjusting the volume. If the dialog can be avoided without entering a bad state, I suspect the crash will go away as well and I'll be able to move on. (Of course, you probably also want to make pnmixer avoid crashing in that red X state, in case it comes up in other circumstances, but that's not my immediate problem.)

hasufell commented 6 years ago

Please run git master when debugging, contributing or applying patches. We have no time or manpower supporting outdated versions.

foresto commented 6 years ago

Same behavior and debug output with git master.

foresto commented 6 years ago

On closer inspection: Although patched git master has the same behavior as the patched release versions (same log messages and red X over the icon), it recovers gracefully instead of crashing when I then try to use the icon.

So, I'd say #183 is helpful. The red X is still mildly annoying, but it goes away as soon as I click it or use the scroll wheel to adjust the volume, which is better than the dialog box that was interrupting me without the patch.

foresto commented 6 years ago

Also, backporting commits 9b23149 and ce1d1a3 to version 0.7.2 seems to work, which will be helpful for package maintainers who want to address the problem in linux distro repositories.

(Being a developer myself, I do of course understand the temptation to only troubleshoot git master, but I also realize that the vast majority of users get their applications from a package repo, not git, so fixing the problem in the packaged versions is rather important.)

hasufell commented 6 years ago

I'm currently not particularly keen on merging ce1d1a39853ab97f191b230b369fa5c8bff058a7 because I still don't really know if we use Alsa correctly here.