simojenki / bonob

sonos SMAPI implementation allowing integrating different music sources with sonos.
GNU General Public License v3.0
228 stars 17 forks source link

Memory Leak (TCP_Mem Leak / Internet Slow Down) #164

Open Vorror opened 1 year ago

Vorror commented 1 year ago

Bear with me here since narrowing down what was happening took two weeks. Anyway, so some background:

I'm using bonob in a docker container on the following system(this system is my media server that's only used to run docker containers):

System:

Distributor ID: Ubuntu
Description:    Ubuntu 22.04.1 LTS
Release:    22.04
Codename:   jammy

So, I noticed my internet connection on my Ubuntu media server would become really slow, from 1 Gbs to ~20Mbs. I use an app called Speedtest Tracker to monitor this.

Screenshot from Speedtest Tracker:

CleanShot 2023-05-28 at 10 49 41@2x

I've been noticing the above behavior for weeks. My media server's download/internet speed would drop for the whole system and stay low; only a restart would fix it.

Cause

I figured out that the issue was bonob after killing apps one by one and waiting for the issue to reoccur; I figured out what was happening. There seems to be some TCP memory leak with bonob.

When bonob is first launched, my system looks like this(image from docker stats):

CleanShot 2023-05-28 at 10 59 16@2x

and my TCP Memory looks like this:

$ cat /proc/net/sockstat

sockets: used 1190
TCP: inuse 31 orphan 0 tw 0 alloc 188 mem 39
UDP: inuse 17 mem 44
UDPLITE: inuse 0
RAW: inuse 1
FRAG: inuse 0 memory 0

$ echo $(( 39 * 4096  / 2**20 ))MB
0MB

After listening to music for a few hours, my stats become this:

CleanShot 2023-05-28 at 11 06 54@2x

$ cat /proc/net/sockstat 

sockets: used 1189
TCP: inuse 30 orphan 0 tw 0 alloc 319 mem 117948
UDP: inuse 17 mem 45
UDPLITE: inuse 0
RAW: inuse 1
FRAG: inuse 0 memory 0

$ echo $( 117948 * 4096 / 2**20 ))MB
460MB

And inside my /var/log/syslog I'm seeing this message repeated:

TCP: out of memory -- consider tuning tcp_mem

bonob is using 460MB of TCP Memory! This causes the network speed of my server to drop until a) Killing the bonob container or b) restarting the computer.

Killing bonob's container immediately restores my TCP memory and fixes my speed issue.

Vorror commented 1 year ago

For some more info,

In addition to using Speed Test Tracker, I also confirmed my internet speed using Speedtest's cli tool and can confirm the internet speed reduction. The internet speed could sometimes stay permanently below 9 Mbits/s!

In fact, the only reason I noticed this issue was occurring was that when trying to play Navidrome music using bonob on my Sonos speakers, there would be "buffering"(delay before playing and frequent pausing mid-song). My music server, Navidrome, and Sonos speaker are all on the same Lan, so there should never be buffering. So, the issue affects all TCP connections on the server rather than just ones to the internet.

The issue takes several hours to appear, and Ubuntu will never reclaim either the system memory being used or the TCP memory, even after not playing music for several hours/days. You must either stop/restart the bonob container or restart your computer.

simojenki commented 1 year ago

Thank you for the great analysis, that's an interesting issue.

How much memory do you give the bonob container?

With your listening pattern, are you simply listening to an album/playlist track by track or are you frequently changing tracks manually?

Unfortunately I cannot test this right now as my system is in a box whilst I move house.

Vorror commented 1 year ago

@simojenki I use Portainer to managed my docker containers. I don't explicitly set a maximum size.

I have about 3000 songs in a queue and rarely manually change tracks. It simply plays for hours until I get bored.

dbuttric commented 1 year ago

I don't mean to be adding fuel to the fire here, but I can confirm this. I have a playlist with 5400 tracks. I left it playing overnight, and when I woke up in the AM, the TCP memory use was at 150MB.

Here is my setup: Debian 12 bookworm 16GB RAM Intel(R) Core(TM) i5-4570 Running Navidrome and Bonob using dockerr-compose.

Then I started doing my rating-my-songs behavior.

I left the playlist running, and just started rating the songs that are playing. Once I rated them, I would fast forward to the next song. After doing this for a few minutes, the TCP memory use is now 500MB.

At this time, the docker container is using 350MB, and TCP is using 500MB. So the rating and fast forwarding may cause memory to balloon. I have not done any more extensive testing here.

So that I've said it, I have created a playlist with about 3 hours of music in it. I start this in the morning and then start it again around lunch time. This seems to keep things behaving.

One thing I'd love to do is be able to attach a debugger to the containerized app, to see if I can identify where this problem is happening. But I've never debugged a container before, much less an app like this. So I'd need some help doing that.

simojenki commented 1 year ago

Interesting...

Why do you say it is TCP memory usage? What command did you run to see that?

Are you transcoding your tracks? One thing I have noticed is that sometimes a transcoding (ffmpeg) process in the ND container will be still running well after the track has completed, I think some processes are being left behind, perhaps on skip track?

dbuttric commented 1 year ago

No, I'm not transcoding anything. In the OP, @Vorror mentions the use of the /proc/net/sockstat to analyze TCP stack kernel memory use. I've been watching the value that comes back from that, I have a oneliner: awk '/TCP/ {print $NF}' /proc/net/sockstat Evaluate that and do the math: (sockstat reports memory in kernel pages.) echo $(($(awk '/TCP/ {print $NF}' /proc/net/sockstat)* 4096 / 1024 ))"KBytes" and you get back a value of the kbytes used by TCP to keep sockets working. At this point, I'm not sure whether this is a bonob or a navidrome issue. I'm trying to do some packet capturing. I'm completely clueless about monitoring networking in docker containers. Thoughts: Perhaps the docker-compose.yaml that I'm suing needs to make reference to docker internal IP's of things, instead of external? Is it possible that there's a leak in the networking stack in docker that causes streams to consume more memory if you use external references?, if for example I set the BNB_SUBSONIC_URL to the internal IP of the navidrome server, instead of its external hostname? I'm just throwing darts at this point, but anything that will help us eliminate possible causes can help use identify the actual cause here. David

simojenki commented 1 year ago

What happens if you restart one of either bonob or navidrome, does the memory get released? That may help determine which container is leaking.

re: docker-compose.yml, you should be able to just refer to the name of the container, rather than either internel or external IP.

Given the wide spread use of docker (and friends) I'd be surprised if the issue is caused by docker/Linux kernel, I'm guessing it's going to be an app issue somewhere.

dbuttric commented 1 year ago

I'll do some more testing this evening. When I restart the continers, the tcp memory is released. Tonight I will experiment with recreating the high memory condition, and I'll restart one each time, to see if I can discover which service is holding the memory.

Thanks! David

dbuttric commented 1 year ago

Here's what I can see: dbuttric@jack2:/var/local/docker/navidrome$ /usr/local/sbin/gettcpmem 80784KBytes dbuttric@jack2:/var/local/docker/navidrome$ docker restart 29b00e1ac55f_navidrome_navidrome_1 29b00e1ac55f_navidrome_navidrome_1 dbuttric@jack2:/var/local/docker/navidrome$ /usr/local/sbin/gettcpmem 44996KBytes

It looks like if I restart navidrome, all the memory does not get cleared, only about half.

Where as if restart bonob, all the memory gets cleared out... dbuttric@jack2:/var/local/docker/navidrome$ /usr/local/sbin/gettcpmem 122340KBytes dbuttric@jack2:/var/local/docker/navidrome$ docker restart navidrome_bonob_1 navidrome_bonob_1 dbuttric@jack2:/var/local/docker/navidrome$ /usr/local/sbin/gettcpmem 1392KBytes

Thoughts? Anything else you need me to do to help get to the cause of this, I'm here and available. Thanks alot! David

simojenki commented 1 year ago

I guess for some reason bonob is keeping open a connection to navidrome, perhaps because it is unaware that the sonos device has moved onto the next track.

simojenki commented 1 year ago

I've tidied up the way streams between bonob -> navidrome are handled, could you please try again with the 'master' build/release of bonob and see what behaviour you get?

dbuttric commented 1 year ago

This worked! I'm now seeing TCP memory utilization that looks like the other applications on my media server. Thank you!

dbuttric commented 1 year ago

I'm running different playlists to two separate speakers. So far, memory utilization is great! I see some times where memory use goes from 156K to 0K 10 seconds later. Definitely better behavior than before. I will let this run for a few more hours and report back.

dbuttric commented 1 year ago

I've been at this for a few hours now. I'm not seeing any of the same behaviors.

I think this is fixed!