mumble-voip / mumble

Mumble is an open-source, low-latency, high quality voice chat software.
https://www.mumble.info
Other
6.28k stars 1.11k forks source link

Memory Leak on Linux/Ubuntu #1074

Closed tatokis closed 4 years ago

tatokis commented 10 years ago

I have noticed a memory leak on mumble for Ubuntu (Both stable and snapshot PPAs). There is a similar issue (https://github.com/mumble-voip/mumble/issues/959) and I suspect that I have figured out more information about this leak. It appears that the when the Echo option in the configuration menu is set to multichannel, it causes mumble to start using a lot of ram. I ran it for 1:30 hours with both echo set to disabled and multichannel. It appears that it started using almost 1.1GB of RAM when it was set to multichannel, but with echo disabled it didn't use more than 30MB. In both cases mumble was sitting in the tray idling, connected to a server and muted. If left running for more time it will consume even more RAM (it reached almost 3GB one time) Running Ubuntu 13.10 amd64 and current mumble version is 1.2.4-1~ppa2~saucy1 with 8GB of ram. Here is how much RAM mumble was using with echo enabled the time it was started mumble1 The setup on Pulseaudio with Echo set to multichannel mumble2 RAM usage when mumble was started mumble3 Mumble running for 1:30 hours and its ram usage mumble4 Configuration, with echo set to disabled now for the next test mumble5 Here's how much RAM mumble used when it started with echo disabled mumble6 and here is how much ram it used after idling for 1:30 hours mumble7

jstrader commented 8 years ago

Ran into this issue yesterday and ran it through Valgrind Massif today. The allocation for this memory leak is: AudioInput::addEcho(void const, unsigned int) (AudioInput.cpp:478) `short outbuff = new short[iEchoFrameSize];`

It appears to expose itself when echo correction is enabled and the speaker output rate is greater than the microphone input rate. This seems to be most common when Mumble is operating at the same time an application that implements a low-level "sink-input" in the parlance of PulseAudio, causing two independent callback streams to essentially double the data being passed to echo correction without deallocating the memory. Many games, remote desktop, and other forms of cross-platform software appear to implement these lower level hooks; in my case it is the game Cities: Skylines implementing an ALSA plug-in that appears to launch its own PulseAudio threads, though I am not familiar enough with these systems to know if this is the exact issue.

I am commenting on this bug, because I am 100% certain it's the same issue; however, this could potentially be the same bug described in many different memory and "unresponsive" issues for this repository. I will avoid referencing all of the issues in case they are not the same bug, but a quick search of open issues with "memory" will return them. The code shows the potential for the issue to be cross-platform, though I am unsure if Windows implements callbacks in the same manner to cause this. Attached is a text file with the relevant output from Massif, used with a build of Mumble tag "1.2.12" (same version as in the Ubuntu Wily PPA) that has debug symbols active. If you want more information, let me know and I'll try to get back to it as quickly as I can.

mumble leak ms_print.txt

mkrautz commented 8 years ago

Thank you very much for the detailed analysis @jstrader.

It's been a while since you reported this, so I'd like to give you a status update. (For you, and for any onlookers perhaps interested in helping to fix the issue...)

I tried a bit back to follow along with the code, and concluded I needed to actually reproduce the PulseAudio setup myself before I could get further.

Also, I found that the code at https://github.com/mumble-voip/mumble/blob/master/src/mumble/AudioInput.cpp#L399-L423 was behaving weirdly. In particular, the logic related to iJitterSeq and iMinBuffered seems like it was not complete when it was committed.

I haven't yet had the chance to set up a test environment where I can investigate the memory leak myself, though!

Thanks again.

mkrautz commented 7 years ago

Finally found some time to get back to this.

I can reproduce a memory leak when I do this:

while [ 1 ]; do dd if=/dev/zero of=/dev/stdout count=1 bs=64 | aplay --rate=8000 -; done

This is using PulseAudio with AudioInput rate=44100, and AudioOutput rate=48000 as @jstrader describes.

jstrader commented 7 years ago

Sorry, the original response to my comment fell through the cracks of my email somehow. I dug up the notes and builds from last March. I had determined at the time that the risk of introducing a regression would be too high to warrant developing a fix myself, because the logic is too deep and the refactor to fix this issue is likely too complex to have someone unfamiliar with the code just pick it up.

I can confirm that the reproduction from @mkrautz above is the same leak that was exposed when I submitted my comment. I'm not certain that this issue is a simple allocation leak though.

Out of curiosity, I threw a couple qWarning calls in to dump the amount of data coming from the PulseAudioSystem::read_callback calls to see the ratio of addMic/addEcho buffers, expecting to see about twice as much data when reproducing the bug. Without the aplay command, it was 867,328/803,422 bytes or almost exactly the difference for the 48,000/44,100 I'm currently running; with aplay, it was 50,656/1,795,600 bytes or about 38 times larger echo buffer than what it should be. Both tests were across about 3,500 calls, to give an idea of just how much more that callback is firing with aplay.

This may either be a bug in PulseAudio sending incorrect events or a flag somewhere in their "thorough" API that is not being passed. If it is not a PulseAudio issue, then I honestly don't know that there is a fix beyond implementing a ring buffer and having it just quietly disable echo correction when it overruns, which doesn't really feel like a bug fix so much as mitigation against the leak.

mkrautz commented 7 years ago

Thanks for the update, @jstrader. FWIW, it's also possible to reproduce the issue with Pulse-native tools like 'pacat'. So the problem is seemingly not related to Pulse's ALSA plugin after all.

I'm trying to reproduce on Ubuntu Precise as we speak, to see if PulseAudio 1.1 also has the "bug".

mkrautz commented 7 years ago

OK, it happens there as well. So I don't think we can classify it as a regression!

mkrautz commented 7 years ago

FWIW, I can reproduce the same behavior without Mumble, by just recording the PulseAudio monitor stream:

ctr is the following program:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <stdint.h>
#include <time.h>

static uint64_t now() {
    struct timespec ts;
    int err = clock_gettime(CLOCK_MONOTONIC, &ts);
    if (err == -1) {
        exit(2);
    }

    uint64_t out = 0;
    out += (ts.tv_sec * 1000000);
    out += (ts.tv_nsec / 1000);
    return out;
}

int main(int argc, char *argv) {
    uint64_t timestamp = now();
    uint64_t bytes = 0;

    while (1) {
        char buf[4096];

        int nread = read(0, &buf, sizeof(buf));
        if (nread == -1) {
            fprintf(stderr, "err = %i: %s\n", errno, strerror(errno));
            exit(1);
        }

        bytes += nread;

        uint64_t nao = now();
        uint64_t elapsed = nao - timestamp;
        if (elapsed >= 1000000) {
            fprintf(stderr, "wrote %lu shorts in 1 sec\n", bytes/2);
            bytes = 0;
            timestamp = nao;
        }
    }
}

I run:

mkrautz@ubuntu:~$ parec --format=s16le --rate=48000 --channels=1 --device="alsa_output.pci-0000_02_02.0.analog-stereo.monitor" | ./ctr 
wrote 45124 shorts in 1 sec
wrote 48424 shorts in 1 sec
wrote 48645 shorts in 1 sec
wrote 48545 shorts in 1 sec
wrote 48554 shorts in 1 sec
# I run the loop that triggers the error.
wrote 92245 shorts in 1 sec
wrote 153388 shorts in 1 sec
wrote 170318 shorts in 1 sec
wrote 151427 shorts in 1 sec
wrote 171301 shorts in 1 sec
wrote 152405 shorts in 1 sec
wrote 126881 shorts in 1 sec
# I've ended running the loop
wrote 48681 shorts in 1 sec
wrote 48354 shorts in 1 sec
wrote 47386 shorts in 1 sec
^C

The loop being the following snippet from earlier:

while [ 1 ]; do dd if=/dev/zero of=/dev/stdout count=1 bs=64 | aplay --rate=8000 -; done

... I'm quite baffled by this behavior from the monitor device.

tatokis commented 7 years ago

Also reproducible here. Here's a snippet of pulseaudio's debug info while the aplay loop is running. https://tasossah.com/mumblepulselog.txt

I think this might not be a mumble issue after all.

Additionally, from what I can see when launching parec, PA switches the source output to 48KHz and then passes audio straight to parec without any resampling.

Relevant log here as well https://tasossah.com/mumblepulselog_parec.txt

mkrautz commented 7 years ago

Hmm, I wonder if our implementation is too simplistic. (But I guess parec is, too, then?)... Perhaps we need to listen to more events to be able to catch underruns, rewinds, holes?

mkrautz commented 7 years ago
02:28:55 <@mkrautz> Whooa21: hmmm... not reproducible in general on Ubuntu 9.04
02:29:17 <@mkrautz> Whooa21: there are some cases where the monitor source generates more data than expected when parec'ing, but it's not excessive...
02:29:57 <@mkrautz> I just had to try an older PulseAudio... I was pretty certain the bug wasn't there when slicer wrote that code... :)
New messages
02:30:15 <@mkrautz> this is pulseaudio 0.9.14
mkrautz commented 7 years ago

I asked on #pulseaudio (freenode):

02:46:20 <mkrautz> hi, I'm experiencing a problem where I can trigger a monitor source to generate more samples than expected (> sample rate) when an underrun occurs in the corresponding sink... this is mostly a question if I'm missing something obvious. Short write-up at https://gist.github.com/mkrautz/d0994d82eb900c336d64ac971fc1847e

Got a response:

05:04:15 <tanuk> mkrautz: You're not missing anything. 05:04:20 <tanuk> mkrautz: https://lists.freedesktop.org/archives/pulseaudio-discuss/2017-January/027412.html

mkrautz commented 7 years ago

@tanuk says this is https://bugs.freedesktop.org/show_bug.cgi?id=68107

tatokis commented 7 years ago

Thank you, @mkrautz I guess all mumble can do for now is discard the extra data due to the rewinds.

aeruower commented 4 years ago

Although I haven't tested it myself, it seems to be fixed here: https://gitlab.freedesktop.org/pulseaudio/pulseaudio/-/issues/304 and already released in PulseAudio 13.0: https://www.freedesktop.org/wiki/Software/PulseAudio/Notes/13.0/ (ctrl+f for "source-output: Fix rewinding bug")

Krzmbrzl commented 4 years ago

@aeruower thank you for reporting this :+1: In that case I'll assume it is indeed fixed. If it is not, let me know and I can reopen this issue...