Closed scottgrobinson closed 2 years ago
Thanks for this report. I am starting to wonder if there could be something related to the virtualization at play here.
Could you try to add the following to your script:
thread.run(every=5., runtime.gc.full_major)
This is not a fix but an agressive use of the memory cleaning function. This should tell us if your memory usage is the result of the gc not collecting memory fast enough.
Thanks!
So it doesn't "appear" to have helped, but I'll get you some better stats in the next couple of days. I also noticed you offer prometheus integration which I'm just setting up for some other aspects but it seems to include some ocaml data so we'll see if there's anything useful in there....
Ok, thanks. I have started a v2.0.4-preview
branch with more memory allocation optimizations. Testing it, the OCaml memory allocations are pretty flat. There should be docker images for it here: https://hub.docker.com/repository/docker/savonet/liquidsoap-ci-build
Hey @toots - Awesome, thank you very much. So far, so good! It's only been running for a short amount of time, but I can see that the GC stats compared to the previous version are much improved!
I've got the following still included in the script - Would you suggest leaving these in there for now, or try taking them out before I let it bed in to get some good stats?
thread.run(every=5., runtime.gc.full_major)
runtime.gc.set(runtime.gc.get().{
space_overhead = 20,
allocation_policy = 2
})
@toots - Some improvements, but still seeing issues. I will attempt to run this without a docker container in the next few days if I have a chance but happy to hear any other suggestions you have...
2022/03/07 16:29:25 >>> LOG START
2022/03/07 16:29:23 [main:3] Liquidsoap 2.0.4
2022/03/07 16:29:23 [main:3] Using: bytes=[distributed with OCaml 4.02 or above] posix-time2=2.0.0 pcre=7.5.0 sedlex=2.5 menhirLib=20211128 curl=0.9.2 memtrace=v0.2.1.2 mem_usage=0.0.1 dtools=0.4.4 duppy=0.9.2 cry=0.6.6 mm=0.7.5 xmlplaylist=0.1.5 lastfm=0.3.3 ogg=0.7.1 ogg.decoder=0.7.1 vorbis=0.8.1 vorbis.decoder=0.8.1 opus=0.2.2 opus.decoder=0.2.2 speex=0.4.0 speex.decoder=0.4.0 mad=0.5.0 flac=0.3.0 flac.ogg=0.3.0 flac.decoder=0.3.0 dynlink=[distributed with Ocaml] lame=0.3.5 shine=0.2.2 frei0r=0.1.2 fdkaac=0.3.2 theora=0.4.0 theora.decoder=0.4.0 ffmpeg=1.1.1 bjack=0.1.6 alsa=0.3.0 ao=0.2.3 samplerate=0.1.6 taglib=0.3.9 ssl=0.5.9 magic=0.7.3 camomile=1.0.2 inotify=2.3 yojson=1.7.0 faad=0.5.0 soundtouch=0.1.9 portaudio=0.2.3 pulseaudio=0.1.4 ladspa=0.2.0 dssi=0.1.3 tsdl=v0.9.8 tsdl-ttf=0.3.2 tsdl-image=0.3.2 camlimages=4.2.6 cohttp-lwt-unix=5.0.0 prometheus-app=1.1 srt.constants=0.2.2 srt.types=0.2.2 srt.stubs=0.2.2 srt.stubs.locked=0.2.2 srt=0.2.2 lo=0.2.0 gd=1.0a5
2022/03/07 16:29:23 [clock:3] Using native (high-precision) implementation for latency control
2022/03/07 16:29:25 [frame:3] Using 44100Hz audio, 25Hz video, 44100Hz main.
2022/03/07 16:29:25 [frame:3] Video frame size set to: 1280x720
2022/03/07 16:29:25 [frame:3] Frame size must be a multiple of 1764 ticks = 1764 audio samples = 1 video samples.
This appears to be limited to Docker from my limited tests so far. I've spun up a brand new Azure instance running 20.04 and installed liquidsoap via ocaml, and running the same script and memmory hasn't increased at all since it's been running on 2.0.3....
I will leave it to bed in for a few more days, then put the docker container onto that instance without the rest of the fluff around streaming (icecast etc), and confirm that it's definetly in the docker domain.
Thanks that's very helpful to know. I was suspecting something like this. I wonder what goes on here, I'll try to reproduce with a test program in a docker environment.
Interestingly, it doesn't seem to be increasing much in a manually created docker container.... (Same liquidsoap script as above, but with the addition of set("init.allow_root",true) due to my super quick hacky container) - It's been running for a few hours with pretty much no RAM increase, same as native OS. I'll report back stats in another 24 hours just to be sure....
Container build script
FROM ubuntu:focal
RUN apt update
ENV DEBIAN_FRONTEND=noninteractive
RUN apt -y install opam tzdata
RUN opam init --disable-sandboxing
RUN opam switch create 4.08.0
RUN opam --yes depext taglib mad lame vorbis cry samplerate ocurl liquidsoap prometheus-liquidsoap ffmpeg opus
RUN opam install -y taglib mad lame vorbis cry samplerate ocurl liquidsoap prometheus-liquidsoap ffmpeg opus
ADD ./run.sh /root/run.sh
RUN chmod +x /root/run.sh
RUN mkdir /root/hls
CMD ["/root/run.sh"]
run.sh
#!/bin/bash
eval $(opam env)
liquidsoap /root/script.liq
My "custom" container above (which would be running 2.0.3):
The standard container (Running 2.04 preview):
Thanks this is very interesting. I'm gonna have a closer look very soon.
I've been running your script with the latest v2.0.4-preview_arm64
and wasn't able to reproduce. We've pushed a lot of optimizations lately, any chance you could try again with the latest one as well? Thanks!
Testing now and I'll get back to you! Sorry for the delay. FYI - Graphs below running "my build" show the memory pretty stable, so hopefully we'll see something similar :)
(NB - The daily drop at 4am is expected due to a backup on the monitoring system)
Looking forward to the feedback! Yeah, we should be able to get to the bottom of this if it isn't already fixed.
Thanks for your very helpful support!
Happy to help get to the bottom of it!
So far so good - I'll report back in a few more days once it's had time to bed in. It's been running 26 hours now and stats look happy/almost the same as the "custom" conatiner... (Red line = Changeover to 2.0.4-preview container)
Hey @toots - I'm going to say I think this is resolved now and this can probably be closed. I've not seen RAM increase at all since boot. I'm probably going to play around and see if I can remove the OCAML additions you suggested for GC collection to bring the CPU down again and see what kind of impact it has, but can leave them in anyway if they don't change much.
Thanks for your help resolving this - Much appreciated.
Very happy to hear this. Let's release 2.0.4
soon now! Might have a couple more optims under my sleeve :-)
Describe the bug Liquidsoap 2.0.3 under docker dies with OOM issues even with the runtime.gc.set "patch" applied. The container is currently limited to 2Gb for troubleshooting of this issue, however, higher limits also seem to have the same issue.
Graph here shows analysis of the container over the past 24 hours (same issues when going back further than that) - You can see the falls in memory usage when the OOM killer comes into play and restarts the container.
To Reproduce Run the below script on the latest docker container of Liquidsoap that has a 2Gb limit applied.
Expected behavior Liquidsoap doesn't failover with OOM issues
Version details
Install method Docker