savonet / liquidsoap

Liquidsoap is a statically typed scripting general-purpose language with dedicated operators and backend for all thing media, streaming, file generation, automation, HTTP backend and more.
http://liquidsoap.info
GNU General Public License v2.0
1.42k stars 130 forks source link

FR: Much needed COOL feature: Make crossfading work, add a dB trigger level to auto-find trigger point. Required for "real" crossfading and radio-segue programming. #3701

Open Moonbase59 opened 9 months ago

Moonbase59 commented 9 months ago

Note: This is a Work-in-Progress. This issue will be fleshed-out with more data, proposed solutions, etc. So please don’t yet comment (too much)!

@toots asked me to add this here, so it won’t get forgotten.

Is your feature request related to a problem? Please describe. We cannot currently do "real" crossfading or radio segues. Dumb time-based crossfading doesn’t work out in many cases, and "smart" isn’t so smart, really. We need some additional options like, for instance, SAM Broadcaster has (see Wiki). To eliminate gaps we need a "gap killer" (we already have blank.skip, blank.eat), and at least an additional dB level trigger to find the exact point where to start the next song, independent of the previous song has a soft or hard ending. In addition to that, we need fade-in and fade-out curves and times to be settable. (This should be possible already.)

For drops (jingle played over the crossfading transition), we must be able to trim silence at start and end of the drop, and position it within the crossfade window. Also, these "drops" should feature an annotation like "is overlay" or something, so these would never be faded.

Describe the solution you'd like Some SAM screenshots so you get the idea:

grafik
Example of SAM’s crossfade settings screen

grafik
"Gap killer" (like our blank.skip) with a few more settings, that gets used before the crossfade is invoked.

Describe alternatives you've considered Use other software. Which we don’t want, since Liquidsoap is great, both "above" and "under the hood".

Additional context

As yet unsorted snippets of my conversation with toots:

The coarse theory is you set up crossfading curves (we have that), a duration (we have that), fade-in and fade-out durations (we have that), and a dB level. So when the fading-out "old" source falls below the trigger level, the new song gets started. The only thing I can’t remember anymore is whether they actually start the new song at this point, or if the new song’s fade-in level must also have reached that dB level.

EDIT: See below. They actually use it on the "old" song (from the end), to find the point where to start the new song. Timings can still apply.

That would allow doing "real" crossfades and having the system find the transition point, without us having to rely on timing (unreliable) or pseudo power levels as in "smart crossfade". It would also help automating soft (long fade-out, silent) and hard (loud, say ending with percussion) song endings.

[…]

Above fading curve example isn’t really great for explaing the "level" thing, but that’s what we all strive for, I reckon, when I see all the crossfading issues by the more serious radio people . I guess you see what I mean, right? The only thing to find out is if the dB level should find the transition point in both directions, or just from the fading-out song. The blank_skip is probably good enough for now. It’s extremely rare that someone would want to have different levels to skip for start and end of song—in radio, we’d almost always start the new song at full volume (or use a 0.1s quick fade-in).

[…]

Last remark: About 20 years ago, I spoke to one of the SAM programmers, and he told me they actually apply the "gap killer" (blank.skip) first, then check for the dB level point of the fading-out song backwards (from the trimmed end), since the last portion of the song might have an unexpected "bump" that could trigger the next song fade-in prematurely (on long fade-outs).

toots commented 9 months ago

Cool. I think you mean blank.eat.

As I said in the conversation, these kind of feature are better suited for an offline treatment, i.e. something like a autofade: protocol that would process the request and detect and add crossfade points.

One thing that I am noticing is that everyone has their own opinion on how these should work. We probably are close to having all the tools we need. We can process a track frame-by-frame and compute its RMS (equivalent to dB level) and, for files, we can seek as close as we want to any points. These are pretty much all the tools we'd need.

Now what we need is to write down exactly what we want to implement so that:

  1. It is clear to everyone what the thing is computing
  2. We can test the implementation to make sure it actually does what we're thinking it does

So, back at our case. Let's say track A is ending and is followed by track B we need to:

Warblefly commented 9 months ago

I wonder if I could write some of my code in liquidsoap? On the other hand, I do my analysis at the time a track is introduced to my library, not in real time. It does, however, catch tracks with long but quiet ends such as "Bohemian Rhapsody". Cross-fading isn't employed: I use the natural level of each track (adjusted to a standard loudness according to R.128) to facilitate musical overlaps.

It chops silence from the front of a track, then works backwards from the end to find the final fade-out or end point.

https://github.com/Warblefly/TrackBoundaries

Moonbase59 commented 9 months ago

@warblefly: We value your scripts, and many of us probably know them. They’re great for preprocessing, but I guess what we need here is more on the lines of "on-the-fly", like SAM and some others can. Most users of AzuraCast and Liquidsoap aren’t radio pros and abhor pre-processing. They just want to "drop some songs in" and expect good results. We could reach that if we automatically (per frame dB/power level) find the spot where the next track should start. Or the exact crossing point.

But thanks for the pointer, many might be interested.

"backwards from the end" is actually important: On long fadeouts, you can have "bumps", and if we’d simply scan forward to find the transition (or next song start) point, we might start prematurely.

Moonbase59 commented 9 months ago

@toots Be aware that for follow-up RMS calculations we cannot simply take what’s in the frame but need to respect what has been set by previous amplify calculations (like ReplayGain). Most of us will probably use untouched FLAC data, with just added replaygain tags, and thus the RMS within frames can vary greatly.

Same applies to blank.eat, blank.skip and the like, and the proposed dB "kick-in" level.

toots commented 9 months ago

As I said earlier, computing the right crossdade parameters should be done as a pre-processing, for instace during the request resolution phase via a custom protocol.

Once computed, tho, this PR should adding initial fade in/out delay should finally give the full flexibility to create whatever required crossfade: https://github.com/savonet/liquidsoap/pull/3703

toots commented 9 months ago

302062190-024ae3ab-d0e4-4e9a-b19a-5f11896dd3b2

toots commented 9 months ago

I wonder if I could write some of my code in liquidsoap? On the other hand, I do my analysis at the time a track is introduced to my library, not in real time. It does, however, catch tracks with long but quiet ends such as "Bohemian Rhapsody". Cross-fading isn't employed: I use the natural level of each track (adjusted to a standard loudness according to R.128) to facilitate musical overlaps.

It chops silence from the front of a track, then works backwards from the end to find the final fade-out or end point.

https://github.com/Warblefly/TrackBoundaries

That would be awesome!

MaPePeR commented 9 months ago

I wrote a Python script to be used as a protocol to pretty much achieve this exact thing.

It uses ffmpeg to

This is somewhat similar to the SAM gap killer (though it doesn't detect blips, but that could also be possible by further parsing the output instead of just using the first detected silence).

To achieve the described SAM crossfading you probably need to apply some additional min/max calculation based on the cross_duration metadata, though.

The script is still a bit rough around the edges, but its a step into that direction, I think: https://gist.github.com/MaPePeR/51a19510b3849c631e82f15d5d130a0e

toots commented 9 months ago

I wrote a Python script to be used as a protocol to pretty much achieve this exact thing.

It uses ffmpeg to

  • first calculate the replaygain
  • Then apply replaygain and use silencedetect filter with dB limits to find

    • cue_in
    • cue_out
    • cross_duration
  • Parse silencedetect filter output. First detected silence starting at 0 for cue_in. Last detected silences without an end for cue_out and cross_duration.
  • Store found values with filepath and file modification timestamp in sqlite database
  • Annotate file with metadata

This is somewhat similar to the SAM gap killer (though it doesn't detect blips, but that could also be possible by further parsing the output instead of just using the first detected silence).

To achieve the described SAM crossfading you probably need to apply some additional min/max calculation based on the cross_duration metadata, though.

The script is still a bit rough around the edges, but its a step into that direction, I think: https://gist.github.com/MaPePeR/51a19510b3849c631e82f15d5d130a0e

Thank you for this!

I am personally agnostic about the right way to implement crossfade detection. However, if there is a consensus among users, I'd be happy to facilitate incorporating any solution into the system!

Warblefly commented 8 months ago

This is undoubtedly a positive move by all concerned!

I have updated my own script (cue_playlist.py), merely to keep the "long tail" (unless I have mistakenly not understood the new addition to liquidsoap) — and to take the new own code's correct extraction of EBU R.128 metadata, using FFmpeg's own filter to print frame metadata. My preference is for pre-processing, to avoid unexpected CPU usage on my old streaming machine, though I see the new code can control this (though I haven't quite understood how).

(The update also allows for the new crossfading mechanism — initial tests show it works perfectly for me.)

Indeed, very recent versions of FFmpeg have multi-threading code that now splurges stderr with error messages that do not start on new lines, and confuse my old code. Only correct use of frame metadata avoids this.

Pelican104 commented 8 months ago

I think there is a more fundamental problem with cross fading that might well be relevant to this discussion, and that is the seeming inability to play three items at once. I've been trying for some time to find a way to have a simple radio transition like this work.

Liquidsoap Crossfade

This represents probably 80% of transitions that we do in music radio in the UK but seems impossible to achieve with liquidsoap.

The crossfade operator does successfully start the station ID when SONG1 reaches it's liq_start_next marker but SONG 2 will not start until SONG1 has completely finished.

Liquidsoap Crossfade 2

The liq_start_next marker on the ID is not respected.

Maybe I've missed something fundamental here?

I think if the above issue could be solved then it would make for much better sounding transitions. It's the one thing that currently stops us from putting liquidsoap into production for any music radio service. We have used liquidsoap for many years for several speech stations where three item overlaps aren't a requirement, but have never been able to get decent crossfades working for music radio because of the above issue.