High memory usage if too many connections failed

desavil commented 6 years ago

Hello,

My one liquidsoap instance try connect to SHOUTcast server all day - it's not a mistake, it's supposed to be (As the source disconnects, the source from liqudisoap will connect as a backup).

I observe that RAM usage is increasing all the time. I suppose RAM memory is not flushed correctly, in which case. After 3 days of such action, the use of RAM is already 4 GB (and it's still growing). And liquidsoap practically does nothing, only reconnects.

Htop - memory usage:

Last lines from log file:

2018/04/07 23:39:46 [output(dot)shoutcast:3] Connecting mount sid#1 for @server.tld... 2018/04/07 23:39:46 [output(dot)shoutcast:2] Connection failed: bad answer: Stream In Use 2018/04/07 23:39:46 [output(dot)shoutcast:3] Will try again in 5.00 sec. 2018/04/07 23:39:46 [stderr:3] strange error flushing buffer ... 2018/04/07 23:39:46 [stderr:3] strange error flushing buffer ... 2018/04/07 23:39:52 [output(dot)shoutcast:3] Connecting mount sid#1 for @server.tld... 2018/04/07 23:39:52 [output(dot)shoutcast:2] Connection failed: bad answer: Stream In Use 2018/04/07 23:39:52 [output(dot)shoutcast:3] Will try again in 5.00 sec. 2018/04/07 23:39:52 [stderr:3] strange error flushing buffer ... 2018/04/07 23:39:52 [stderr:3] strange error flushing buffer ... 2018/04/07 23:39:58 [output(dot)shoutcast:3] Connecting mount sid#1 for @server.tld... 2018/04/07 23:39:58 [output(dot)shoutcast:2] Connection failed: bad answer: Stream In Use 2018/04/07 23:39:58 [output(dot)shoutcast:3] Will try again in 5.00 sec. 2018/04/07 23:39:58 [stderr:3] strange error flushing buffer ... 2018/04/07 23:39:58 [stderr:3] strange error flushing buffer ... 2018/04/07 23:40:04 [output(dot)shoutcast:3] Connecting mount sid#1 for @server.tld... 2018/04/07 23:40:04 [output(dot)shoutcast:2] Connection failed: bad answer: Stream In Use 2018/04/07 23:40:04 [output(dot)shoutcast:3] Will try again in 5.00 sec. 2018/04/07 23:40:04 [stderr:3] strange error flushing buffer ... 2018/04/07 23:40:04 [stderr:3] strange error flushing buffer ... 2018/04/07 23:40:10 [output(dot)shoutcast:3] Connecting mount sid#1 for @server.tld... 2018/04/07 23:40:10 [output(dot)shoutcast:2] Connection failed: bad answer: Stream In Use 2018/04/07 23:40:10 [output(dot)shoutcast:3] Will try again in 5.00 sec. 2018/04/07 23:40:10 [stderr:3] strange error flushing buffer ... 2018/04/07 23:40:10 [stderr:3] strange error flushing buffer ... 2018/04/07 23:40:16 [output(dot)shoutcast:3] Connecting mount sid#1 for @server.tld... 2018/04/07 23:40:16 [output(dot)shoutcast:2] Connection failed: bad answer: Stream In Use 2018/04/07 23:40:16 [output(dot)shoutcast:3] Will try again in 5.00 sec. 2018/04/07 23:40:16 [stderr:3] strange error flushing buffer ... 2018/04/07 23:40:16 [stderr:3] strange error flushing buffer ... 2018/04/07 23:40:22 [output(dot)shoutcast:3] Connecting mount sid#1 for @server.tld... 2018/04/07 23:40:22 [output(dot)shoutcast:2] Connection failed: bad answer: Stream In Use 2018/04/07 23:40:22 [output(dot)shoutcast:3] Will try again in 5.00 sec. 2018/04/07 23:40:22 [stderr:3] strange error flushing buffer ... 2018/04/07 23:40:22 [stderr:3] strange error flushing buffer ...

Version: Liquidsoap 1.3.3+scm (git://github.com/savonet/liquidsoap.git@7908cf94b5c75dc0e69f235312cdc526b402a3a8:20180217:163006)

Please cheack this.

LeonarddeR commented 6 years ago

I can confirm this. It looks like this problem started to occur around version 1.3.2, I don't recall this with older versions. I'm not sure, though.

toots commented 6 years ago

Hi.

Thanks for reporting. I wonder if that is related to the issues reported with the windows build.

I'll have a look at it. Do you have a minimal scenario for reproducing the issue?

toots commented 6 years ago

I'm running a test script:

set("log.stdout",true)
set("log.file",false)

radio=sine()
clock.assign_new(id="icecast", [
    output.icecast(
      %wav,
        host="127.0.0.1", port=8000,
        user="source", password="source",
        mount="/radio",
        description="radio", genre="radio", url="radio",
        mksafe(buffer(radio))
    )
]
)

(there's no icecast responding)

Can't reproduce. Could you test this script and/or give me another one to reproduce? Thks!

desavil commented 6 years ago

I will check it in my free time and let you know. I will also submit you what configuration causes me a problem.

toots commented 6 years ago

Awesome, thank you! There's been a lot of bugfixes around the icecast output so there's a good chance that this issue is fixed now.

toots commented 6 years ago

I've been running a test script for a long time without issues. Unless you can reproduce I will set this issue aside for the coming release.

LeonarddeR commented 5 years ago

I'm afraid I can still reproduce increasing memory usage with LS 1.3.6 built with opam. My script is trying to connect to an icecast server all day with an interval of 15 seconds, so it is similar to this issue.

toots commented 5 years ago

Thanks for reporting @leonardder , I'll start testing again.

toots commented 5 years ago

@leonardder, what encoding format are you using?

LeonarddeR commented 5 years ago

@toots: I use a combination of 6 encoders, 2x lame, 2x opus and 2x FDK-aac. The system is continously polling an icecast stream in order for it to be transcoded to several formats. I already disabled the opus stream for a little while, but that didn't make any difference.

LeonarddeR commented 5 years ago

I was suffering from these issues up to LS 1.3.5. On a vps with 1gb of ram, I had to restart the system once every three weeks. On LS 1.4.0RC1, the problem is much more severe in that I have to restart it once every three days. I'm not sure whether the issue has to do with failed connections, but it would really help if we could find a way to isolate these issues. Unfortunately, I have a pretty complex set up.

toots commented 5 years ago

Thanks for this report. I'll try to have a look at it again before the release. Any chance you might be able to try on one single encoder at a time to see if we could narrow it down to a single one of them?

LeonarddeR commented 5 years ago

I'll try to narrow down to 2 lame encoders to begin with. It is a production setup. Normally I wouldn't use a rc in production, but now I thought I might benefit from less memory issues.

I can also try to share my setup privately with you if that could help.

toots commented 5 years ago

I'll try with your list of encoders first, starting a script right now.

toots commented 5 years ago

Also, I guess it'd be nice to have your script after all, if you don't mind. You can send at toots@rastageeks.org. Thks!

LeonarddeR commented 5 years ago

I just provided you the scripts by e-amil, thanks a lot for investigating this!

toots commented 5 years ago

Got it, thanks!

toots commented 4 years ago

Hi again! I'm finally getting around to looking at this one. One thing I noticed is that you are using the SSL operators. Have you tried the most recent code? We fixed some file descriptor leaks with these very recently that could have a positive impact on your issue..

LeonarddeR commented 4 years ago

Thanks for looking into this! What would be the best way to test this from a 1.4.0rc1 installed with opam using the pinning method for only liquidsoap itself?

toots commented 4 years ago

Yes, that should work. You may need to pin other dependencies but I believe that they should be all release in opam by now.

toots commented 4 years ago

I'm running your script BTW and I may be seeing the memory leak, though. Stay tuned!

toots commented 4 years ago

I can confirm a slow memory leak with you script.. On it!

toots commented 4 years ago

error_report = ref ""

def on_ic_error(id,error) =
    error_report:=!error_report^string_of(gettimeofday())^": ("^id^") "^error^"\n"
    10.
end

This has to be your issue, @leonardder. The error_report string keeps growing at each failure to connect an ouputput.icecast...

LeonarddeR commented 4 years ago

Mm, the point is that output.icecast actually never fails, as the icecast server is always there.

I don't have my scripts at hand here, but I'm also pretty sure that the string is discarded every five minutes, after an error report is sent by e-mail.

Nevertheless, I will have an in depth look into this codepath. Thanks for going to the bottom of this!

LeonarddeR commented 4 years ago

@toots I'm running my script for some time without the on_ic_error handler on the outputs, but the memory usage still increases over time. Haven't had the time to upgrade to current master, though.

toots commented 4 years ago

Hi @leonardder. Thanks for trying again and for your patience. I'll get back to this issue. I think the plan at the moment is to roll out the release as-is but, be assured that it'll be the first to go with the first following bugfix.

illestrater commented 4 years ago

Awesome I'm glad this is being looked into. @toots, I've reported a memory leak myself a while back (6 months?) and you very graciously assisted by going over my script. I had an issue with my script which you helped me fix, but the memory leak has persisted at a slower rate. It's been problematic for me as I've had to write something to check memory usage on the CPU and restart when necessary and cause downtime.

+1!!

Appreciate your work Toots

toots commented 4 years ago

Hi all!

I've restarted efforts on this one. Currently running the following script:

set("log.stdout",true)
set("log.file",false)

radio=sine()
clock.assign_new(id="icecast", [
    output.icecast(
      %mp3,
        host="127.0.0.1", port=8000,
        user="source", password="source",
        mount="/radio",
        description="radio", genre="radio", url="radio",
        mksafe(buffer(radio))
    )
]
)

With the latest v1.4.3-pre-release branch, will report if I see a memleak. Any small scripts that can reproduce the issue would be greatly appreciated!

toots commented 4 years ago

Hey y'all,

Apologize, I feel like I have dropped the ball here, that strange error while flushing the buffer has been reported many times and I never paid attention to it..

I just pushed a series of commit, namely: savonet/ocaml-lame@a39d8700531bd0e2901248b1d6dd8b7eb7a446f6, 062666695cb9991d9ad0c770b57a7283201716d2 for v1.4.3-prelease and 286e41a07732ed923de4be8d16eb4c95a1066996 for master. The error should now be gone with these.

I have a strong feeling that the memory leaks might have been related to this error message.. Any chance one of y'all could test them?

I believe fixed builds should appear shortly at https://github.com/savonet/liquidsoap/releases/tag/v1.4.3-pre-release once the build at https://github.com/savonet/liquidsoap/actions/runs/197165456 is finished. This may or may not be true for the windows build, however but definitely ready to test with any of the .deb packages..

Fingers crossed!

illestrater commented 4 years ago

@toots i'm thrilled to hear about this development! this will be of immense help for my music platform's devops and am ecstatic to give the release a test

toots commented 4 years ago

Awesome! Now I sure hope that I didn't give you false hopes :-)

Let me know what comes out and otherwise let's get right back at it. If this doesn't fix it, I believe the next step would be to try to get a valgrind report of the memleak..

Looking forward to hearing back!

toots commented 4 years ago

Any update on this one? No news is good news?

illestrater commented 4 years ago

Hey toots haven't gotten around to building and testing it yet, been out of development for a bit. Will give it a shot and long running test and report back as soon as I can!

toots commented 4 years ago

That would be great! I still have a couple more issues to deal with the current release and I'd like to get it out sooner rather than later afterward. Worst case we can just keep working on the issue after the release.

illestrater commented 4 years ago

@toots now testing live in my prod environment! will let you know if memory usage decreases compared to before after a while - when is your release target date?

toots commented 4 years ago

This is pretty awesome! Release is as soon as possible, possibly this weekend if I get through the remaining issues. We're very close!

toots commented 4 years ago

Hmm @illestrater did you remove your earlier message that memory usage appeared fixed? I close the issue but now I don't see it anymore..

illestrater commented 4 years ago

@toots I apologize, it seems like the memory usage does not seem to be garbage collecting.. I am doing one last long-running test and will take some time before an expected crash

toots commented 4 years ago

Really sorry to hear.

savonet / liquidsoap

High memory usage if too many connections failed #541