Increased CPU usage over time leading to high latency, catchup, and resetting sources

dmccombs commented 2 days ago

Description

Over time releases of Liquidsoap 2.x steadily use more and more CPU until "Too much latency" and catchup warnings start filling the logs, leading to resetting active sources repeatedly. This happens after a couple of weeks of instances running, whereas it didn't happen on Liquidsoap 1.x in largely the same configuration running for years.

Logs are hundreds of MB in size after a month, but some sample log lines are:

2024/10/23 00:00:39 [clock.generic:2] We must catchup 45.31 seconds!
2024/10/23 00:00:40 [clock.generic:2] We must catchup 45.88 seconds!
2024/10/23 00:00:41 [clock.generic:2] We must catchup 46.53 seconds!
2024/10/23 00:00:41 [clock.generic.2:2] We must catchup 47.27 seconds!
2024/10/23 00:00:42 [clock.generic:2] We must catchup 47.22 seconds!
2024/10/23 00:00:42 [clock.generic.2:2] We must catchup 48.30 seconds!
2024/10/23 00:00:43 [clock.generic:2] We must catchup 47.95 seconds!
2024/10/23 00:00:43 [clock.generic.2:2] We must catchup 49.22 seconds!
2024/10/23 00:00:44 [clock.generic:2] We must catchup 48.48 seconds!
2024/10/23 00:00:44 [clock.generic.2:2] We must catchup 50.05 seconds!

2024/10/22 23:46:44 [clock.generic.2:2] Too much latency! Resetting active sources...
2024/10/22 23:49:01 [clock.generic.2:2] Too much latency! Resetting active sources...
2024/10/22 23:51:02 [clock.generic.2:2] Too much latency! Resetting active sources...
2024/10/22 23:53:34 [clock.generic.2:2] Too much latency! Resetting active sources...
2024/10/22 23:55:32 [clock.generic.2:2] Too much latency! Resetting active sources...
2024/10/22 23:56:46 [clock.generic.2:2] Too much latency! Resetting active sources...

Restarting the instances clears the issue for another couple of weeks, and resets CPU usage down to ~10%, before it climbs back to 100% over time.

These are metrics showing the state the system was in after a month of running, then the reset when liquidsoap was restarted: shot-2024-10-22_20-10-41

This has happened both when running 1 instance of Liquidsoap with 3 playlists in it, and also when running 3 separate instances of Liquidsoap with 1 playlist in each.

Here's one of the 3 scripts, the problem happens with or without the buffer() around the playlist in the output:

#!/usr/bin/liquidsoap
set("log.file.path", "/var/log/liquidsoap/echoesofbluemars.log")

set("server.socket", true)
set("server.socket.path", "/sockets/echoes_socket-bluemars")
set("server.socket.permissions", 432)

# Enable replaygain use to normalize volume
enable_replaygain_metadata ()

# Music
bmplaylist = playlist(reload_mode='watch', mode='randomize', prefetch=9, "/music/bm-playlist.m3u")

# Bluemars
output.icecast(%mp3,
        host = "localhost",
        port = 8000,
        password = "<REDACTED>",
        mount = "bluemars",
        fallible = true,
        name = "Echoes of BLUEMARS",
        description = "Music For the Space Traveler",
        genre = "Ambient",
        url = "http://www.echoesofbluemars.org",
        buffer(max=60.0, crossfade(replaygain(bmplaylist))))

Steps to reproduce

Run the above script for weeks and observe CPU usage increasing over time. The playlist used has 372 tracks on it, all with replaygain already computed and tagged in the files.

Expected behavior

Liquidsoap should use a steady amount of CPU over time, and not need restarts to continue to run normally.

Liquidsoap version

Liquidsoap 2.3.0+git@24f373365
Copyright (c) 2003-2024 Savonet team
Liquidsoap is open-source software, released under GNU General Public License.
See <http://liquidsoap.info> for more information.

Liquidsoap build config

* Liquidsoap version  : 2.3.0+git@24f373365

 * Compilation options
   - Release build       : false
   - Git SHA             : 24f373365
   - OCaml version       : 4.14.2
   - OS type             : Unix
   - Libs versions       : alsa=0.3.0 angstrom=0.16.0 ao=0.2.4 asetmap=0.8.1 asn1-combinators=0.2.6 astring=0.8.5 base=v0.16.3 base.base_internalhash_types=v0.16.3 base.caml=v0.16.3 base.shadow_stdlib=v0.16.3 base64=3.5.1 bigarray=[distributed with Ocaml] bigarray-compat=1.1.0 bigstringaf=0.9.1 bjack=0.1.6 bos=0.2.1 bytes=[distributed with OCaml 4.02 or above] ca-certs=v0.2.3 camlp-streams camomile.lib=2.0 cohttp=5.3.1 cohttp-lwt=5.3.0 cohttp-lwt-unix=5.3.0 conduit=6.2.3 conduit-lwt=6.2.3 conduit-lwt-unix=6.2.3 cry=1.0.3 cstruct=6.2.0 ctypes=0.22.0 ctypes-foreign=0.22.0 ctypes.stubs=0.22.0 curl=0.9.2 domain-name=0.4.0 domain_shims dssi=0.1.5 dtools=0.4.5 dune-build-info=3.16.0 dune-private-libs.dune-section=3.16.0 dune-site=3.16.0 dune-site.private=3.16.0 duppy=0.9.4 eqaf=0.9 eqaf.bigstring=0.9 eqaf.cstruct=0.9 faad=0.5.2 fdkaac=0.3.3 ffmpeg-av=1.2.0 ffmpeg-avcodec=1.2.0 ffmpeg-avdevice=1.2.0 ffmpeg-avfilter=1.2.0 ffmpeg-avutil=1.2.0 ffmpeg-swresample=1.2.0 ffmpeg-swscale=1.2.0 fileutils=0.6.4 flac=0.5.1 flac.decoder=0.5.1 flac.ogg=0.5.1 fmt=0.9.0 fpath=0.7.3 frei0r=0.1.2 gd=1.1 gen=1.1 gmap=0.3.0 hkdf=1.0.4 inotify=2.0-62-g5e58536 integers ipaddr=5.6.0 ipaddr-sexp=5.6.0 ipaddr.unix=5.6.0 irc-client irc-client-unix ladspa=0.2.2 lame=0.3.7 lastfm=0.3.4 lilv=0.2.0 liquidsoap-lang=2.3.0 liquidsoap-lang.console=2.3.0 liquidsoap_alsa=rolling-release-v2.3.x-1-g24f3733 liquidsoap_ao=rolling-release-v2.3.x-1-g24f3733 liquidsoap_bjack=rolling-release-v2.3.x-1-g24f3733 liquidsoap_builtins=rolling-release-v2.3.x-1-g24f3733 liquidsoap_core=rolling-release-v2.3.x-1-g24f3733 liquidsoap_dssi=rolling-release-v2.3.x-1-g24f3733 liquidsoap_faad=rolling-release-v2.3.x-1-g24f3733 liquidsoap_fdkaac=rolling-release-v2.3.x-1-g24f3733 liquidsoap_ffmpeg=rolling-release-v2.3.x-1-g24f3733 liquidsoap_flac=rolling-release-v2.3.x-1-g24f3733 liquidsoap_frei0r=rolling-release-v2.3.x-1-g24f3733 liquidsoap_gd=rolling-release-v2.3.x-1-g24f3733 liquidsoap_irc=rolling-release-v2.3.x-1-g24f3733 liquidsoap_ladspa=rolling-release-v2.3.x-1-g24f3733 liquidsoap_lame=rolling-release-v2.3.x-1-g24f3733 liquidsoap_lastfm=rolling-release-v2.3.x-1-g24f3733 liquidsoap_lilv=rolling-release-v2.3.x-1-g24f3733 liquidsoap_lo=rolling-release-v2.3.x-1-g24f3733 liquidsoap_mad=rolling-release-v2.3.x-1-g24f3733 liquidsoap_ogg=rolling-release-v2.3.x-1-g24f3733 liquidsoap_ogg_flac=rolling-release-v2.3.x-1-g24f3733 liquidsoap_optionals=rolling-release-v2.3.x-1-g24f3733 liquidsoap_opus=rolling-release-v2.3.x-1-g24f3733 liquidsoap_osc=rolling-release-v2.3.x-1-g24f3733 liquidsoap_oss=rolling-release-v2.3.x-1-g24f3733 liquidsoap_portaudio=rolling-release-v2.3.x-1-g24f3733 liquidsoap_posix_time=rolling-release-v2.3.x-1-g24f3733 liquidsoap_prometheus=rolling-release-v2.3.x-1-g24f3733 liquidsoap_pulseaudio=rolling-release-v2.3.x-1-g24f3733 liquidsoap_runtime=rolling-release-v2.3.x-1-g24f3733 liquidsoap_samplerate=rolling-release-v2.3.x-1-g24f3733 liquidsoap_sdl=rolling-release-v2.3.x-1-g24f3733 liquidsoap_sdl_log_level=rolling-release-v2.3.x-1-g24f3733 liquidsoap_shine=rolling-release-v2.3.x-1-g24f3733 liquidsoap_soundtouch=rolling-release-v2.3.x-1-g24f3733 liquidsoap_speex=rolling-release-v2.3.x-1-g24f3733 liquidsoap_sqlite=rolling-release-v2.3.x-1-g24f3733 liquidsoap_srt=rolling-release-v2.3.x-1-g24f3733 liquidsoap_ssl=rolling-release-v2.3.x-1-g24f3733 liquidsoap_stdlib=rolling-release-v2.3.x-1-g24f3733 liquidsoap_stereotool=rolling-release-v2.3.x-1-g24f3733 liquidsoap_theora=rolling-release-v2.3.x-1-g24f3733 liquidsoap_tls=rolling-release-v2.3.x-1-g24f3733 liquidsoap_vorbis=rolling-release-v2.3.x-1-g24f3733 liquidsoap_xmlplaylist=rolling-release-v2.3.x-1-g24f3733 liquidsoap_yaml=rolling-release-v2.3.x-1-g24f3733 lo=0.2.0 logs=0.7.0 logs.fmt=0.7.0 logs.lwt=0.7.0 lwt=5.7.0 lwt.unix=5.7.0 macaddr=5.6.0 mad=0.5.3 magic-mime=1.3.1 mem_usage=0.1.1 memtrace=0.2.3 menhirLib=20231231 metadata=0.3.0 mirage-crypto=0.11.3 mirage-crypto-ec=0.11.3 mirage-crypto-pk=0.11.3 mirage-crypto-rng=0.11.3 mirage-crypto-rng.unix=0.11.3 mm=0.8.5 mm.audio=0.8.5 mm.base=0.8.5 mm.image=0.8.5 mm.midi=0.8.5 mm.video=0.8.5 ocplib-endian ocplib-endian.bigstring ogg=0.7.4 ogg.decoder=0.7.4 opus=0.2.3 opus.decoder=0.2.3 osc osc-unix parsexp=v0.16.0 pbkdf portaudio=0.2.3 posix-base=5a7f328 posix-socket=5a7f328 posix-socket.constants=5a7f328 posix-socket.stubs=5a7f328 posix-socket.types=5a7f328 posix-time2=5a7f328 posix-time2.constants=5a7f328 posix-time2.stubs=5a7f328 posix-time2.types=5a7f328 posix-types=5a7f328 posix-types.constants=5a7f328 ppx_compare.runtime-lib=v0.16.0 ppx_hash.runtime-lib=v0.16.0 ppx_sexp_conv.runtime-lib=v0.16.0 prometheus=1.2 prometheus-app=1.2 ptime=1.1.0 ptime.clock.os=1.1.0 pulseaudio=0.1.6 re=1.11.0 result=1.5 rresult=0.7.0 samplerate=0.1.7 saturn_lockfree=0.4.1 sedlex=3.2 seq=[distributed with OCaml 4.07 or above] sexplib=v0.16.0 sexplib0=v0.16.0 shine=0.2.3 soundtouch=0.1.9 speex=0.4.2 speex.decoder=0.4.2 sqlite3=5.1.0 srt=0.3.1 srt.constants=0.3.1 srt.stubs=0.3.1 srt.stubs.locked=0.3.1 srt.types=0.3.1 ssl=0.7.0 stdlib-shims=0.3.0 stereotool=rolling-release-v2.3.x-1-g24f3733 str=[distributed with Ocaml] stringext=1.6.0 theora=0.4.1 theora.decoder=0.4.1 threads=[distributed with Ocaml] threads.posix=[internal] tls=0.17.4 tsdl=v1.0.0 tsdl-image=0.5 tsdl-ttf=0.6 unix=[distributed with Ocaml] unix-errno=52c6ecb unix-errno.errno_bindings=52c6ecb unix-errno.errno_types=52c6ecb unix-errno.errno_types_detected=52c6ecb unix-errno.unix=52c6ecb uri=4.4.0 uri-sexp=4.4.0 uri.services=4.4.0 vorbis=0.8.1 vorbis.decoder=0.8.1 x509=0.16.5 xmlm=1.4.0 xmlplaylist=0.1.5 yaml=3.2.0 yaml.bindings=3.2.0 yaml.bindings.types=3.2.0 yaml.c=3.2.0 yaml.ffi=3.2.0 yaml.types=3.2.0 zarith=1.13
   - architecture        : amd64
   - host                : x86_64-pc-linux-gnu
   - target              : x86_64-pc-linux-gnu
   - system              : linux
   - ocamlopt_cflags     : -O2 -fno-strict-aliasing -fwrapv -pthread -fPIC
   - native_c_compiler   : gcc -O2 -fno-strict-aliasing -fwrapv -pthread -fPIC -D_FILE_OFFSET_BITS=64
   - native_c_libraries  : -lm

 * Configured paths
   - mode              : posix
   - standard library  : /usr/share/liquidsoap/libs
   - scripted binaries : /usr/share/liquidsoap/bin
   - rundir            : /var/run/liquidsoap
   - logdir            : /var/log/liquidsoap
   - user cache        : $HOME/.cache/liquidsoap (override with $LIQ_CACHE_USER_DIR)
   - system cache      : /var/cache/liquidsoap (override with $LIQ_CACHE_SYSTEM_DIR)
   - camomile files    : /usr/share/liquidsoap/camomile

 * Supported input formats
   - MP3               : yes
   - AAC               : yes
   - Ffmpeg            : yes
   - Flac (native)     : yes
   - Flac (ogg)        : yes
   - Opus              : yes
   - Speex             : yes
   - Theora            : yes
   - Vorbis            : yes
   - WAV/AIFF          : yes (native)

 * Supported output formats
   - FDK-AAC           : yes
   - FFmpeg            : yes
   - MP3               : yes
   - MP3 (fixed-point) : yes
   - Flac (native)     : yes
   - Flac (ogg)        : yes
   - Opus              : yes
   - Speex             : yes
   - Theora            : yes
   - Vorbis            : yes
   - WAV/AIFF          : yes (native)

 * Tags
   - AAC               : yes
   - FFmpeg            : yes
   - FLAC (native)     : yes
   - Flac (ogg)        : yes
   - Native decoder    : yes
   - Vorbis            : yes

 * Input / output
   - ALSA              : yes
   - AO                : yes
   - FFmpeg            : yes
   - JACK              : yes
   - OSS               : yes
   - Portaudio         : yes
   - Pulseaudio        : yes
   - SRT               : yes

 * Audio manipulation
   - FFmpeg            : yes
   - LADSPA            : yes
   - Lilv              : yes
   - Samplerate        : yes
   - SoundTouch        : yes
   - StereoTool        : yes

 * Video manipulation
   - camlimages        : no (requires camlimages)
   - FFmpeg            : yes
   - frei0r            : yes
   - ImageLib          : no (requires imagelib)
   - SDL               : yes

 * MIDI manipulation
   - DSSI              : yes

 * Visualization
   - GD                : yes
   - Graphics          : no (requires graphics)
   - SDL               : yes

 * Additional libraries
   - FFmpeg filters    : yes
   - FFmpeg devices    : yes
   - inotify           : yes
   - irc               : yes
   - jemalloc          : no (requires jemalloc)
   - lastfm            : yes
   - lo                : yes
   - memtrace          : no (requires memtrace)
   - osc               : yes
   - ssl               : yes
   - sqlite3           : yes
   - tls               : yes
   - posix-time2       : yes
   - windows service   : no (requires winsvc)
   - YAML support      : yes
   - XML playlists     : yes

 * Monitoring
   - Prometheus        : yes

Installation method

From official packages in the release artifacts

Additional Info

Happy to provide any additional info that might help track this down, as it's prevented me from upgrading my production instances to Liquidsoap 2.x for over a year.

toots commented 2 days ago

Thanks for this report! I will prioritize fixing it.

I have an ongoing PR fixing some serious performance issues around source.drop, which is the operator used to compute data on the files to get replaygain information.

I was wondering:

Do you know if computation happens on the sources related to replaygain data? Typically you would see logs of the form:
```
2024/10/22 16:44:04 [source.drop:4] Start dropping source (ratio: 50.00x)
```
If that is the case then I would recommend trying again once https://github.com/savonet/liquidsoap/pull/4184 has landed.

I'll make sure to ping y'all again by them!

dmccombs commented 1 day ago

Thanks @toots!

Grepping the ~700MB of logs I have from this last run with this issue there's no instances of source.drop or dropping.

When I was first dealing with this problem I thought it might be due to replaygain being calculated at playtime for each track, so I ran the loudness tag command on all my mp3 files to calculate and tag them with the replaygain once, and I've no longer seen any messages about it being calculated in my logs, but that didn't help the problem unfortunately.

toots commented 1 day ago

Hi @dmccombs ok thanks and, damn!

I just merged the PR in question and pushed a new build with it to the rolling-release-v2.3.x branch.

I'm gonna try and see if I can reproduce your issue. How long does it take for you to start seeing the CPU increase?

dmccombs commented 1 day ago

@toots it's varied between days, weeks, or even a couple of months before I started seeing the log errors since I've been dealing with it, but that's across different 2.x versions and I hadn't been looking at CPU metrics directly to see when they started to climb before errors started showing up.

Last night was the first time I've restarted the 2.3.0 build I'm on since having the problem. I'll keep a closer eye on CPU use and let you know when I start seeing it climb again.

savonet / liquidsoap