mk-fg / python-pulse-control

Python high-level interface and ctypes-based bindings for PulseAudio (libpulse)
https://pypi.org/project/pulsectl/
MIT License
170 stars 36 forks source link

Segmentation fault when getting peak level #35

Closed MedicMomcilo closed 4 years ago

MedicMomcilo commented 4 years ago

I'm writing a small tool to do some very basic normalizing of the sound. Python script works just fine for a while and then it segfaults.

I've managed to create minimalistic example where segfault happens consistently. Time before crashing varies, it is sometimes 10-15 seconds, and sometimes a couple of minutes.

Details about my system (Fedora 30):

$ rpm -q pulseaudio
pulseaudio-12.2-9.fc30.x86_64
$ rpm -q python3-pulsectl
python3-pulsectl-19.9.5-1.fc30.noarch
$ rpm -q kernel
kernel-5.2.17-200.fc30.x86_64

Script to reproduce the issue:

#!/usr/bin/python3

import time
import pulsectl

def find_vlc():
    while 1:
        sinks = pulse.sink_input_list()
        for sink in sinks:
            if sink.proplist["application.process.binary"] == "vlc":
                return sink
        print("No VLC found, waiting.")
        time.sleep(5)

pulse = pulsectl.Pulse('segfa')

while 1:
    pa_vlc = find_vlc()
    peak = pulse.get_peak_sample(pulse.sink_info(pa_vlc.sink).monitor_source, 0.2, pa_vlc.index)
    print(round(peak, 3))

Note that finding VLC can be removed from the loop, and it still segfaults.

Please let me know if there is anything else I can provide to further help with resolving.

mk-fg commented 4 years ago

Hm, yeah, something's wrong there.

Worked fine on my desktop machine, but ran python3 -m unittest pulsectl.tests.dummy_instance.DummyTests.test_get_peak_sample on a laptop just now, and it seem to hang indefinitely.

Thanks for bringing it up, will look into what's going wrong there sometime later.

mk-fg commented 4 years ago

Worked fine on my desktop machine, but ran ... on a laptop just now, and it seem to hang indefinitely.

Was a separate issue fixed in 4bc5437, which surprisingly didn't popup anywhere earlier.

mk-fg commented 4 years ago

Can totally reproduce a segfault here too using code snippet that you've provided.

mk-fg commented 4 years ago

Looking into this a bit, quickly found that segfault seem to happen due to mix of two calls:

Removing either one from the loop avoids the issue.

Segfault itself happens when _pulse_info_cb gets called by libpulse, receiving struct with sink info. PulseObject class tries to copy all fields from that struct on init (via _copy_struct_fields), but struct seem to be freed already, so trying to access memory offsets from that pointer cause segfault.

As _pulse_info_cb is passed as a callback to pa_context_get_sink_info_by_index, and introspect.h clearly states that:

 * Data members in the information structures are only valid during the
 * duration of the callback. If they are required after the callback is
 * finished, a deep copy of the information structure must be performed.

Can only assume that such behavior is a bug in libpulse - it straight-up seem to call that callback with an invalid pointer.

Guess I'll try to report it to pulseaudio devs somewhere, probably after making some trivial C example with same calls first (as python has a lot of confusing wrapping to do same thing).

Workarounds: Unless I'm wrong about the underlying issue, it should be impossible to fully work around this, unfortunately, as libpulse can return invalid pointer like that to any info call, regardless of where or whether you call that peak-thing (can be some other stream closing at the time, it's just a simple way to reproduce the issue), but avoiding any info-calls right after get_peak_sample() should make it less likely to happen.

mk-fg commented 4 years ago

Unless I'm wrong about the underlying issue, it should be impossible to fully work around this

Correction:

I mixed-up sink and stream info calls in my head in the msg above.

Since it's pa_context_get_sink_info_by_index info callback that fails, which should not be related to list of streams normally (didn't look at how it's implemented it libpulse though), I'd suspect that might be some client-side caching/allocation issue, so might be avoided entirely if info calls aren't too close to peak_sample or not intermingled with them somehow.

Shouldn't really matter, unless you need some quick workaround.

MedicMomcilo commented 4 years ago

Hi Mike,

Thank you for looking into this, this does help me introduce a workaround. I guess we can't fix it here, but rather upstream.

You can go ahead and close this issue if you prefer so. Thank you for this awesome work you've done.

Kind regards, Momo.

mk-fg commented 4 years ago

You can go ahead and close this issue if you prefer so.

Obvious upstream bugs like that are kinda rare, and it's still a bug in the module exposing any broken functionality.

Despite the fact that I couldn't find initial explaination of why accessing that pointer within info callback was causing segfault, trying to reproduce same thing in C ( https://gist.github.com/mk-fg/7c3bc67ec8a13541182e0139f0a1a5b4 ), couldn't do it - everything seem to work fine there.

So has to be related to how this wrapper works after all.

mk-fg commented 4 years ago

Fixed it in 19.10.3, but still don't understand why this issue happens.

Thought to look at anything else odd about these callbacks with invalid pointer in them, and found that:

Either of these allows to at least filter-out these weird extra calls, so did that as a semi-temporary solution.

Better fix would be to rebuild python and pulseaudio (plus maybe glib, glibc and such deps) with debug symbols and see what happens on C ABI level via gdb, but not sure if I want to bother with this right now, and if fix works, maybe underlying issue will just go away by itself sometime (if it's a bug in ctypes or one of the libs involved).

Do let me know if updated version still segfaults for you though - can be that it only works on my machine for whatever reason.