savonet / liquidsoap

Liquidsoap is a statically typed scripting general-purpose language with dedicated operators and backend for all thing media, streaming, file generation, automation, HTTP backend and more.
http://liquidsoap.info
GNU General Public License v2.0
1.4k stars 130 forks source link

Liquidsoap OOM (2.0.3) #2254

Closed scottgrobinson closed 2 years ago

scottgrobinson commented 2 years ago

Describe the bug Liquidsoap 2.0.3 under docker dies with OOM issues even with the runtime.gc.set "patch" applied. The container is currently limited to 2Gb for troubleshooting of this issue, however, higher limits also seem to have the same issue.

Graph here shows analysis of the container over the past 24 hours (same issues when going back further than that) - You can see the falls in memory usage when the OOM killer comes into play and restarts the container.

image

To Reproduce Run the below script on the latest docker container of Liquidsoap that has a 2Gb limit applied.

#!/usr/bin/liquidsoap

# https://github.com/savonet/liquidsoap/issues/2251
runtime.gc.set(runtime.gc.get().{
  space_overhead = 20
})

# General settings
log.level.set(4)
log.stdout.set(true)

# Telnet server settings
settings.server.telnet.bind_addr.set("0.0.0.0")
settings.server.telnet.port.set(8500)
settings.server.telnet.set(true)

# Harbor HTTP server settings
settings.harbor.bind_addrs.set(["0.0.0.0"])
settings.harbor.max_connections.set(10)
settings.harbor.timeout.set(10.)
settings.harbor.verbose.set(false)

# Audio settings
audio.samplerate.set(44100)
audio.channels.set(2)

# Clocks settings
settings.root.max_latency.set(5.)
settings.clock.allow_streaming_errors.set(false)

#####################
# START OF PROCESSING
#####################

#Badly made crossfades that need more work
def crossfadeOn(a,b)
  add(normalize=true,
      [ sequence([ blank(duration=2.),
        fade.initial(duration=5.,b) ]),
        fade.final(duration=5.,a) ])
end

def crossfadeOff(a,b)
  a = eat_blank(a)
  add(normalize=false,
      [ sequence([fade.initial(duration=5.,b) ]),
        fade.final(duration=5.,a) ])
end

# Incoming icecast/shoutcast stream on /live1
live1 = input.harbor("live1",port=8050,password="xxxx", id="live1")

# Incoming icecast/shoutcast stream on /live2
live2 = input.harbor("live2",port=8050,password="xxxx", id="live2")

# Incoming icecast/shoutcast stream on /automation
automation = input.harbor("automation", port=8050, password="xxxx", id="automation")
automation = mksafe(automation, id="automation_mksafe")

# Define the radio stream
radio = fallback(track_sensitive=false, transitions=[crossfadeOn, crossfadeOff],
         [live1, live2, automation], id="radio")

# Streaming Processing
#radio_processed = mksafe(pipe(process='/usr/local/bin/stereo_tool_cmd_64 - - -s /etc/stereotool/config.sts -q -k "<xxxx>"', radio), id="radio_processed")
radio_processed = radio

# Output Stream - MP3 (Processed)
output.icecast(%mp3(bitrate=128),
  host = "icecast", port = 8001,
  password = "xxxx", mount = "xxxx", icy_metadata = "true", public = false,
  radio_processed)

# Output Stream - AAC (Processed)
output.icecast(%ffmpeg(format="adts",
    %audio(
      channels=2,
      samplerate=44100,
      codec="aac",
      b="196k",
      profile="aac_low"
    )),
  host = "icecast", port = 8001,
  password = "xxxx", mount = "xxxx", icy_metadata = "true", public = false,
  radio_processed)

# Output Stream - AAC LQ (Processed)
output.icecast(%ffmpeg(format="adts",
    %audio(
      channels=2,
      samplerate=44100,
      codec="aac",
      b="48k",
      profile="aac_low"
    )),
  host = "icecast", port = 8001,
  password = "xxxx", mount = "xxxx", icy_metadata = "true", public = false,
  radio_processed)

# Output Stream - AAC (Unprocessed)
output.icecast(%ffmpeg(format="adts",
    %audio(
      channels=2,
      samplerate=44100,
      codec="aac",
      b="196k",
      profile="aac_low"
    )),
  host = "icecast", port = 8001,
  password = "xxxx", mount = "xxxx", icy_metadata = "true", public = false,
  radio)

Expected behavior Liquidsoap doesn't failover with OOM issues

Version details

Install method Docker

toots commented 2 years ago

Thanks for this report. I am starting to wonder if there could be something related to the virtualization at play here.

Could you try to add the following to your script:

thread.run(every=5., runtime.gc.full_major)

This is not a fix but an agressive use of the memory cleaning function. This should tell us if your memory usage is the result of the gc not collecting memory fast enough.

Thanks!

scottgrobinson commented 2 years ago

So it doesn't "appear" to have helped, but I'll get you some better stats in the next couple of days. I also noticed you offer prometheus integration which I'm just setting up for some other aspects but it seems to include some ocaml data so we'll see if there's anything useful in there....

image

toots commented 2 years ago

Ok, thanks. I have started a v2.0.4-preview branch with more memory allocation optimizations. Testing it, the OCaml memory allocations are pretty flat. There should be docker images for it here: https://hub.docker.com/repository/docker/savonet/liquidsoap-ci-build

scottgrobinson commented 2 years ago

Hey @toots - Awesome, thank you very much. So far, so good! It's only been running for a short amount of time, but I can see that the GC stats compared to the previous version are much improved!

I've got the following still included in the script - Would you suggest leaving these in there for now, or try taking them out before I let it bed in to get some good stats?

thread.run(every=5., runtime.gc.full_major)

runtime.gc.set(runtime.gc.get().{
  space_overhead = 20,
  allocation_policy = 2
})
scottgrobinson commented 2 years ago

@toots - Some improvements, but still seeing issues. I will attempt to run this without a docker container in the next few days if I have a chance but happy to hear any other suggestions you have...

2022/03/07 16:29:25 >>> LOG START
2022/03/07 16:29:23 [main:3] Liquidsoap 2.0.4
2022/03/07 16:29:23 [main:3] Using: bytes=[distributed with OCaml 4.02 or above] posix-time2=2.0.0 pcre=7.5.0 sedlex=2.5 menhirLib=20211128 curl=0.9.2 memtrace=v0.2.1.2 mem_usage=0.0.1 dtools=0.4.4 duppy=0.9.2 cry=0.6.6 mm=0.7.5 xmlplaylist=0.1.5 lastfm=0.3.3 ogg=0.7.1 ogg.decoder=0.7.1 vorbis=0.8.1 vorbis.decoder=0.8.1 opus=0.2.2 opus.decoder=0.2.2 speex=0.4.0 speex.decoder=0.4.0 mad=0.5.0 flac=0.3.0 flac.ogg=0.3.0 flac.decoder=0.3.0 dynlink=[distributed with Ocaml] lame=0.3.5 shine=0.2.2 frei0r=0.1.2 fdkaac=0.3.2 theora=0.4.0 theora.decoder=0.4.0 ffmpeg=1.1.1 bjack=0.1.6 alsa=0.3.0 ao=0.2.3 samplerate=0.1.6 taglib=0.3.9 ssl=0.5.9 magic=0.7.3 camomile=1.0.2 inotify=2.3 yojson=1.7.0 faad=0.5.0 soundtouch=0.1.9 portaudio=0.2.3 pulseaudio=0.1.4 ladspa=0.2.0 dssi=0.1.3 tsdl=v0.9.8 tsdl-ttf=0.3.2 tsdl-image=0.3.2 camlimages=4.2.6 cohttp-lwt-unix=5.0.0 prometheus-app=1.1 srt.constants=0.2.2 srt.types=0.2.2 srt.stubs=0.2.2 srt.stubs.locked=0.2.2 srt=0.2.2 lo=0.2.0 gd=1.0a5
2022/03/07 16:29:23 [clock:3] Using native (high-precision) implementation for latency control
2022/03/07 16:29:25 [frame:3] Using 44100Hz audio, 25Hz video, 44100Hz main.
2022/03/07 16:29:25 [frame:3] Video frame size set to: 1280x720
2022/03/07 16:29:25 [frame:3] Frame size must be a multiple of 1764 ticks = 1764 audio samples = 1 video samples.

image

image

scottgrobinson commented 2 years ago

This appears to be limited to Docker from my limited tests so far. I've spun up a brand new Azure instance running 20.04 and installed liquidsoap via ocaml, and running the same script and memmory hasn't increased at all since it's been running on 2.0.3....

I will leave it to bed in for a few more days, then put the docker container onto that instance without the rest of the fluff around streaming (icecast etc), and confirm that it's definetly in the docker domain.

toots commented 2 years ago

Thanks that's very helpful to know. I was suspecting something like this. I wonder what goes on here, I'll try to reproduce with a test program in a docker environment.

scottgrobinson commented 2 years ago

Interestingly, it doesn't seem to be increasing much in a manually created docker container.... (Same liquidsoap script as above, but with the addition of set("init.allow_root",true) due to my super quick hacky container) - It's been running for a few hours with pretty much no RAM increase, same as native OS. I'll report back stats in another 24 hours just to be sure....

Container build script

FROM ubuntu:focal

RUN apt update

ENV DEBIAN_FRONTEND=noninteractive

RUN apt -y install opam tzdata

RUN opam init --disable-sandboxing

RUN opam switch create 4.08.0

RUN opam --yes depext taglib mad lame vorbis cry samplerate ocurl liquidsoap prometheus-liquidsoap ffmpeg opus
RUN opam install -y taglib mad lame vorbis cry samplerate ocurl liquidsoap prometheus-liquidsoap ffmpeg opus

ADD ./run.sh /root/run.sh
RUN chmod +x /root/run.sh

RUN mkdir /root/hls

CMD ["/root/run.sh"]

run.sh

#!/bin/bash

eval $(opam env)

liquidsoap /root/script.liq
scottgrobinson commented 2 years ago

My "custom" container above (which would be running 2.0.3): image

The standard container (Running 2.04 preview): image

toots commented 2 years ago

Thanks this is very interesting. I'm gonna have a closer look very soon.

toots commented 2 years ago

I've been running your script with the latest v2.0.4-preview_arm64 and wasn't able to reproduce. We've pushed a lot of optimizations lately, any chance you could try again with the latest one as well? Thanks!

scottgrobinson commented 2 years ago

Testing now and I'll get back to you! Sorry for the delay. FYI - Graphs below running "my build" show the memory pretty stable, so hopefully we'll see something similar :)

(NB - The daily drop at 4am is expected due to a backup on the monitoring system)

image

toots commented 2 years ago

Looking forward to the feedback! Yeah, we should be able to get to the bottom of this if it isn't already fixed.

Thanks for your very helpful support!

scottgrobinson commented 2 years ago

Happy to help get to the bottom of it!

So far so good - I'll report back in a few more days once it's had time to bed in. It's been running 26 hours now and stats look happy/almost the same as the "custom" conatiner... (Red line = Changeover to 2.0.4-preview container)

image

scottgrobinson commented 2 years ago

Hey @toots - I'm going to say I think this is resolved now and this can probably be closed. I've not seen RAM increase at all since boot. I'm probably going to play around and see if I can remove the OCAML additions you suggested for GC collection to bring the CPU down again and see what kind of impact it has, but can leave them in anyway if they don't change much.

Thanks for your help resolving this - Much appreciated.

toots commented 2 years ago

Very happy to hear this. Let's release 2.0.4 soon now! Might have a couple more optims under my sleeve :-)