Audio transcoding and replaygain

J4gQBqqR commented 1 year ago

I am not very falimiar with ffmpeg, correct me if I am wrong.

https://github.com/sentriz/gonic/blob/6c37966c2aa8d5bc9ceff1d6d20b7a0d5fe8dee4/transcode/transcode.go#L51 Does this line needs to be updated to aresample=128000?

Opus and OpusRG has the same configuration. Opus128 and Opus128RG has the same configuration. Is this intentional? If I am understanding it correctly, current behavior of Opus, OpusRG, Opus128, and Opus128RG are: normalize audio level on the server side and also ~~pass in~~ erase all the original replaygain tags.

I was expecting Opus and Opus128 to pass-in the original replagain idv3 metadata tag to my client without renormalize audio level on the server side and have myself do replaygain on the client side. Whereas OpusRG and Opus128RG will do replaygain and normalize on the server side and erase all the replaygain related idv3 metadata tag before sending the audio stream to my client.

sentriz commented 1 year ago

it definitely looks like we should have aresample=128000 for the 128Kbps profiles, so I just updated that

@spijet do you have any info on the replaygain settings? i'm not sure how the RG stuff works :D

maybe related #106 @tordenflesk

J4gQBqqR commented 1 year ago

The thing is, I stream the Opus128RG to my DSub with Replaygain turned on. It is obvious to me that some tracks are loader/quieter than others. This is because I have Beets scanned my library and tagged all my tracks with replaygain metadata. Apparently that FFmpeg or somewhere else is preserving the replaygain tag that I have in my tracks. The normalized opus stream gets replay gain applied on them again by DSub and that kills the purpose of Normalization.

I have the replaygain setting turned on in DSub because I have multiple Gonic servers. Not all server uses transcoding. On servers that do not use transcoding, I rely on my Beet's tagged replaygain metadata.

Edit: I took a look at the transcoded file's metadata on my PC. The flac container's opus file does not have replaygain tag in it. The tags are stripped. We are good here. The loud/quiet issue might be problem with DSub.

However, we still have an issue where the command of Opus vs OpusRG and Opus128 vs Opus128RG are identical. If we have identical options, why not delete one of them to make the transcode dropdown menu more concise?

spijet commented 1 year ago

it definitely looks like we should have aresample=128000 for the 128Kbps profiles, so I just updated that

IIRC aresample sets a sampling rate and not the bitrate. Since Opus audio is always 48kHz, there's no reason to set it to something other than 48000 or an exact multiple of it (which helps to make ReplayGain adjustments more accurate). In this case you've told ffmpeg to resample the source audio to 128kHz (which is not a multiple of either 44.1 or 48 kHz), apply the RG adjustments to it and then resample it back to 48kHz. :D

In my case, recent Gonic releases work as expected and I think I see no double-processing of RG tags in clients I use. Will re-check the tags when I have some free time. :)

J4gQBqqR commented 1 year ago

it definitely looks like we should have aresample=128000 for the 128Kbps profiles, so I just updated that

IIRC aresample sets a sampling rate and not the bitrate. Since Opus audio is always 48kHz, there's no reason to set it to something other than 48000 or an exact multiple of it (which helps to make ReplayGain adjustments more accurate). In this case you've told ffmpeg to resample the source audio to 128kHz (which is not a multiple of either 44.1 or 48 kHz), apply the RG adjustments to it and then resample it back to 48kHz. :D

In my case, recent Gonic releases work as expected and I think I see no double-processing of RG tags in clients I use. Will re-check the tags when I have some free time. :)

I have rechecked and edited my comments above. There is no need to recheck on your side. However, we still have an issue where the command of Opus vs OpusRG and Opus128 vs Opus128RG are identical. If we have identical options, why not delete one of them to make the transcode dropdown menu more concise?

spijet commented 1 year ago

I took a look at the transcoded file's metadata on my PC. The flac container's opus file does not have replaygain tag in it. The tags are stripped. We are good here. The loud/quiet issue might be problem with DSub.

Oh, good to know. Does DSub have any RG adjustments, like "Apply this gain to files that lack RG info"? I keep forgetting that I have this set to -11dB in foobar2000, so RG-adjusted files sound ultra-quiet, which surprises me every time. :)

Also, you may want to set a different RG target volume for your specific needs. For example, Opus128Car profile uses a volume target that is 15dB louder than the standard, to keep the final volume in-line with other usual audio sources in car multimedia.

spijet commented 1 year ago

Actually, @sentriz, non-RG Opus profiles are clearly misconfigured — there should be no RG-related options in them at all. See the original transcoding PR for a reference.

Well, assuming that we want to use the "non-RG" label to describe profiles that don't do anything about RG (i.e. not applying/discarding it), and the "RG" label for ones that do RG processing before sending the audio to the client. :)

J4gQBqqR commented 1 year ago

Oh, good to know. Does DSub have any RG adjustments, like "Apply this gain to files that lack RG info"? I keep forgetting that I have this set to -11dB in foobar2000, so RG-adjusted files sound ultra-quiet, which surprises me every time. :)

No, DSub does not do active replaygain, it only does it passively, meaning, reading replaygain idv3 tags.

Also, you may want to set a different RG target volume for your specific needs. For example, Opus128Car profile uses a volume target that is 15dB louder than the standard, to keep the final volume in-line with other usual audio sources in car multimedia.

I have the same issue. The Opus<128>RG profile has very quiet sound than my Beets' output. Usually, my Beets replaygain tag output sounds good at volume 15 in my car. With Gonic's normalized RG transcoding, I have to go to volume 35 to get the same sound level. This will make my Google map very loud.

The thing is, why OpusCar has resampling but OpusRG does not have resampling? The FFmpeg transcoding command does not seem consistent in logic.

J4gQBqqR commented 1 year ago

Actually, @sentriz, non-RG Opus profiles are clearly misconfigured — there should be no RG-related options in them at all. See the original transcoding PR for a reference.

Well, assuming that we want to use the "non-RG" label to describe profiles that don't do anything about RG (i.e. not applying/discarding it), and the "RG" label for ones that do RG processing before sending the audio to the client. :)

Non-RG should have no do renormalization and should preserve any existing replaygain idv3 tag.

sentriz commented 1 year ago

you're right! how does look?

var (
    MP3   = NewProfile("audio/mpeg", "mp3", 128, `ffmpeg -v 0 -i <file> -ss <seek> -map 0:a:0 -vn -b:a <bitrate> -c:a libmp3lame -f mp3 -`)
    MP3RG = NewProfile("audio/mpeg", "mp3", 128, `ffmpeg -v 0 -i <file> -ss <seek> -map 0:a:0 -vn -b:a <bitrate> -c:a libmp3lame -af "volume=replaygain=track:replaygain_preamp=6dB:replaygain_noclip=0, alimiter=level=disabled, asidedata=mode=delete:type=REPLAYGAIN" -metadata replaygain_album_gain= -metadata replaygain_album_peak= -metadata replaygain_track_gain= -metadata replaygain_track_peak= -metadata r128_album_gain= -metadata r128_track_gain= -f mp3 -`)

    OpusCar = NewProfile("audio/ogg", "opus", 96, `ffmpeg -v 0 -i <file> -ss <seek> -map 0:a:0 -vn -b:a <bitrate> -c:a libopus -vbr on -af "aresample=96000:resampler=soxr, volume=replaygain=track:replaygain_preamp=15dB:replaygain_noclip=0, alimiter=level=disabled, asidedata=mode=delete:type=REPLAYGAIN" -metadata replaygain_album_gain= -metadata replaygain_album_peak= -metadata replaygain_track_gain= -metadata replaygain_track_peak= -metadata r128_album_gain= -metadata r128_track_gain= -f opus -`)
    Opus    = NewProfile("audio/ogg", "opus", 96, `ffmpeg -v 0 -i <file> -ss <seek> -map 0:a:0 -vn -b:a <bitrate> -c:a libopus -vbr on -f opus -`)
    OpusRG  = NewProfile("audio/ogg", "opus", 96, `ffmpeg -v 0 -i <file> -ss <seek> -map 0:a:0 -vn -b:a <bitrate> -c:a libopus -vbr on -af "volume=replaygain=track:replaygain_preamp=6dB:replaygain_noclip=0, alimiter=level=disabled, asidedata=mode=delete:type=REPLAYGAIN" -metadata replaygain_album_gain= -metadata replaygain_album_peak= -metadata replaygain_track_gain= -metadata replaygain_track_peak= -metadata r128_album_gain= -metadata r128_track_gain= -f opus -`)

    Opus128Car = NewProfile("audio/ogg", "opus", 128, `ffmpeg -v 0 -i <file> -ss <seek> -map 0:a:0 -vn -b:a <bitrate> -c:a libopus -vbr on -af "aresample=128000:resampler=soxr, volume=replaygain=track:replaygain_preamp=15dB:replaygain_noclip=0, alimiter=level=disabled, asidedata=mode=delete:type=REPLAYGAIN" -metadata replaygain_album_gain= -metadata replaygain_album_peak= -metadata replaygain_track_gain= -metadata replaygain_track_peak= -metadata r128_album_gain= -metadata r128_track_gain= -f opus -`)
    Opus128    = NewProfile("audio/ogg", "opus", 128, `ffmpeg -v 0 -i <file> -ss <seek> -map 0:a:0 -vn -b:a <bitrate> -c:a libopus -vbr on -f opus -`)
    Opus128RG  = NewProfile("audio/ogg", "opus", 128, `ffmpeg -v 0 -i <file> -ss <seek> -map 0:a:0 -vn -b:a <bitrate> -c:a libopus -vbr on -af "volume=replaygain=track:replaygain_preamp=6dB:replaygain_noclip=0, alimiter=level=disabled, asidedata=mode=delete:type=REPLAYGAIN" -metadata replaygain_album_gain= -metadata replaygain_album_peak= -metadata replaygain_track_gain= -metadata replaygain_track_peak= -metadata r128_album_gain= -metadata r128_track_gain= -f opus -`)

    PCM16le = NewProfile("audio/wav", "wav", 0, `ffmpeg -v 0 -i <file> -ss <seek> -c:a pcm_s16le -ac 2 -f s16le -`)
)

spijet commented 1 year ago

No, DSub does not do active replaygain, it only does it passively, meaning, reading replaygain idv3 tags.

But it does apply it, right? I.e. if you feed it an Opus file with, say, "-6.0dB track gain" found in tags — it will apply this -6.0dB gain to the output?

Usually, my Beets replaygain tag output sounds good at volume 15 in my car.

I'm not sure about the RG loudness target Beets uses when adding RG tags, but I can say with some confidence that foobar2000 uses loudness target of -18dB LUFS (Loudness Unit, Full [Digital] Scale), and iTunes/Apple Music uses a target of -16dB LUFS. I used foobar2000 to fill up and sort my media library, so my whole library is tagged to meet the -18dB target. With this target, audio processed with OpusCar profile will have the peak volume at -3dB LUFS, which is really close to the digital limit of 0dB LUFS (see also: "brickwalling" and "Loudness War" 😁) and should sound almost as loud as any other audio source found in a car.

Non-RG should have no do renormalization and should preserve any existing replaygain idv3 tag.

Yep, that was the original idea behind these profiles. Non-RG for all the usual cases (where you either don't want to do RG or your client does it for you) and RG for this one case where you do want to have RG when using a non-RG-capable client.

sentriz commented 1 year ago

what if we update the profiles like above comment, and add this to wiki (still need to explain car)

name	format	default bitrate (kilobits/s)	description
`mp3`	mp3	128 kb/s	transcode to mp3 at 128 kb/s
`mp3_rg`	mp3	128 kb/s	transcode to mp3 at 128 kb/s, force replay gain, strip replay gain tags
`opus`	opus	96 kb/s	transcode to opus at 96 kb/s (see recommended bitrates)
`opus_rg`	opus	96 kb/s	transcode to opus at 96 kb/s, force replay gain, strip replay gain metadata
`opus_car`	opus	96 kb/s	same as above, but with a higher target baseline gain. good for listening in loud enironments like a car
`opus_128`	opus	128 kb/s	transcode to opus at 128 kb/s (see recommended bitrates)
`opus_128_rg`	opus	128 kb/s	transcode to opus at 128 kb/s, force replay gain, strip replay gain metadata
`opus_128_car`	opus	128 kb/s	same as above, but with a higher target baseline gain. good for listening in loud enironments like a car

spijet commented 1 year ago

@sentriz looks much better now! But you still need to revert that aresample=128000 in Opus128Car profile back to aresample=96000 to get rid of yet another non-exact resample step.

Also, what do you think about renaming the *Car profiles to something like RGCar or maybe RGLoud to emphasise that these profiles use RG processing too?

Also[2]: aresample=96000 is not "transcode at 96kb/s", it's "transcode with 2X oversampling". :) It's used to make replaygain adjustments more accurate.

J4gQBqqR commented 1 year ago

Also, why does the CAR profile need resampling meanwhile RG profile does not need resampling?

@sentriz In your table, the description section, 128kbps should be described as 128 instead of 96.

spijet commented 1 year ago

Also, why does the CAR profile need resampling meanwhile RG profile does not need resampling?

See the "Also[2]" in my previous comment. :)

sentriz commented 1 year ago

brilliant thanks guys. just updated the table and the profiles https://github.com/sentriz/gonic/blob/master/transcode/transcode.go#L33

how does it look now?

spijet commented 1 year ago

Maybe "target" would be a better term instead of "baseline" here, but otherwise it's perfect! 👍

J4gQBqqR commented 1 year ago

But it does apply it, right? I.e. if you feed it an Opus file with, say, "-6.0dB track gain" found in tags — it will apply this -6.0dB gain to the output?

Yes, it applies replaygain correctly when there is one in idv3 tag

I'm not sure about the RG loudness target Beets uses when adding RG tags, but I can say with some confidence that foobar2000 uses loudness target of -18dB LUFS (Loudness Unit, Full [Digital] Scale), and iTunes/Apple Music uses a target of -16dB LUFS. I used foobar2000 to fill up and sort my media library, so my whole library is tagged to meet the -18dB target. With this target, audio processed with OpusCar profile will have the peak volume at -3dB LUFS, which is really close to the digital limit of 0dB LUFS (see also: "brickwalling" and "Loudness War" 😁) and should sound almost as loud as any other audio source found in a car.

targetlevel: A number of decibels for the target loudness level for files using REPLAYGAIN_ tags. Default: 89.

r128targetlevel: The target loudness level in decibels (i.e. + 107) for files using R128 tags. Default: 84 (Use 83 for ATSC A/85, 84 for EBU R128 or 89 for ReplayGain 2.0.)

89DB is -14 LUFS if I understand it correctly.

spijet commented 1 year ago

Default: 89

These levels are probably specified in dB SPL (Sound Pressure Level) scale. If we convert the levels I posted to SPL, we get 89 for foobar2000/RG2.0 and 91 for iTunes/Apple Music.

This also means that Car/LoudRG profiles have a target volume of 104 dB SPL, which is surely a lot. :) Still not as loud as -0.0dB LUFS RMS of the worst examples of the Loudness War (looking at you, "Death Magnetic"!) :D

J4gQBqqR commented 1 year ago

Quote here: RG1 is calibrated to a pink noise reference signal with a RMS level 14 dB below a full-scale sinusoid. This reference signal is used to establish a reference level. ReplayGain will apply no gain or attenuation to the reference signal or any program material which has the same loudness measurements as the reference signal.

BS-1770 defines a loudness scale for program material. The units of BS.1770 loudness measurements are in Loudness Units [relative to] Full Scale (LUFS). LUFS can be treated like decibels.

In order to maintain backwards compatibility with RG1, RG2 uses a -18 LUFS reference, which based on lots of music, can give similar loudness compared to RG1.

J4gQBqqR commented 1 year ago

According to the specification, RG1.0 recommends -14 DB and RG2.0 recommends -18 LUFS DB. 6DB is somewhat quiet? I think we can call Opus128Car as Opus128Normal and Opus128RG as Opus128Quiet in a sense.

spijet commented 1 year ago

I think we can call Opus128Car as Opus128Normal and Opus128RG as Opus128Quiet in a sense.

Not really. The RG profiles apply the track gain, which results in final audio being about as loud as the RG2.0 target level of -18 dB LUFS or 89 dB SPL, just as dictated by the RG2.0 standard.

Car/RGLoud profiles boost the volume by +15dB, resulting in target loudness being around -3dB LUFS or 104dB SPL, which is A LOT louder than the standard (to put it simply, a +6dB gain is the same thing as multiplying every sample value by 2).

The real reason RG profiles may sound so quiet in a car is that everything else (all other audio sources, I mean) is compressed. Not volume-normalized or peak-limited, but really aggressively compressed to bring the average volume and/or the RMS value as close to -0.0dB FS as possible. The reasoning behind this approach is pretty simple — heavily compressed audio will sound louder when played back on a cheap and/or low-power system, and many people perceive "loud" as "better quality".

In my case (iOS, iSub client and Opus128RG profile) I usually play my music in car at volume 12-14, and to get a similar loudness level from the usual radio I have to turn the volume down to 5-8. I don't use the Car profile on this client because I also use it to listen to the same music via headphones, so I don't have to either listen to volume-boosted tracks or cache everything I listen to twice.

As for loudness comparisons between RG* profiles and applying ReplayGain in the client, the RG profile should be exactly as loud as a non-RG profile on an RG-capable client, so in case of DSub they should sound the same. Car/RGLoud profile would obviously be much louder than either of them.

J4gQBqqR commented 1 year ago

Not really. The RG profiles apply the track gain, which results in final audio being about as loud as the RG2.0 target level of -18 dB LUFS or 89 dB SPL, just as dictated by the RG2.0 standard.

Say if I have a track without any replaygain idv3 tag in it, will the RG profiles still apply normalization by having FFmpeg scanning that track and calculate it?

spijet commented 1 year ago

Why would you have such tracks in your library? :) Judging from the FFmpeg docs, it will only apply the gain adjustment if the selected gain tag exists. Not sure if the same can be said about the preamp gain though.

Also, I completely forgot that the "ordinary" RG profiles have a +6dB preamp gain too. Actually, I don't remember exactly why I did this. :D

J4gQBqqR commented 1 year ago

Why would you have such tracks in your library? :) Judging from the FFmpeg docs, it will only apply the gain adjustment if the selected gain tag exists. Not sure if the same can be said about the preamp gain though.

Also, I completely forgot that the "ordinary" RG profiles have a +6dB preamp gain too. Actually, I don't remember exactly why I did this. :D

I have a nicely organized library that has replaygain tags in all tracks. Tagging idv3 takes a lot of time and energy.

I have several not-well-organized libraries that are not tagged neatly. Some are children's songs, some are holiday songs, you know, I never bother to tag them.

Then in DSub, I still can hear loud/quiet songs even with the RG profile. I guess this is exactly the reason. FFmpeg cannot do active normalization, it only does passive normalization, just like DSub. If this statement is true, I will never get correct gain on those tracks which are not tagged, neither on the gonic server side nor the DSub client side.

This is so sad. I once thought this RG profile is a silver bullet.

If this is the case, hi @sentriz, could you also mention in the documentation that RG profile will not work on tracks without the idv3 tags? This will definitely avoid confusion like "why is the RG profile not working".

spijet commented 1 year ago

FFmpeg cannot do active normalization, it only does passive normalization, just like DSub.

It can do active normalization, but you'll need to use a different audio filter instead of volume. dynaudionorm, loudnorm or some others might work for you. Keep in mind that these will likely produce less-accurate results than ReplayGain, but that might actually be what you're looking for.

If this is the case, hi @sentriz, could you also mention in the documentation that RG profile will not work on tracks without the idv3 tags? This will definitely avoid confusion like "why is the RG profile not working".

That would be redundant, as use of ReplayGain (the technology itself) actually requires scanning the music to set the volume adjustment tags ahead-of-time, so that they could be used for playback later.

J4gQBqqR commented 1 year ago

I guess I am in need of #244 to do dynaudionorm or loudnorm.

spijet commented 1 year ago

You can always add custom profiles in the source code and build your own image. :)

J4gQBqqR commented 1 year ago

Thank you guys for the help. I will close the issue as completed. Will probably wait for #244.

sentriz commented 1 year ago

@J4gQBqqR @spijet if you guys would be willing to cook me up a profile that uses dynaudionorm filter, I'd be happy to add it to the official image

spijet commented 1 year ago

@sentriz I'll try to fiddle around with both filters and make a profile tomorrow. For now it's time to get some sleep. :D

sentriz commented 1 year ago

cool! thanks very much for your help :)

J4gQBqqR commented 1 year ago

ffmpeg -i input.flac -map 0:a:0 -vn -b:a 128k -c:a libopus -vbr on -af "loudnorm" -metadata replaygain_album_gain= -metadata replaygain_album_peak= -metadata replaygain_track_gain= -metadata replaygain_track_peak= -metadata r128_album_gain= -metadata r128_track_gain= -f opus output.ogg

Field to change: input.flac, 128k, output.ogg Comment: I found that there is no need to use aresample filter option, the loudnorm filter will automatically resample to 192kHz and opus will automatically resample back to 48kHz. @sentriz @spijet

spijet commented 1 year ago

Thank you @J4gQBqqR! I'll try to test it out this week and let you guys know if it normalizes the audio properly. Sorry I didn't do it on the weekend, the COVID vaccine has hit me hard. 🤒

spijet commented 1 year ago

I did some quick tests before work, and found that loudnorm without any parameters produces a quieter result than ReplayGain. That's because this filter normalizes audio to integrated loudness target of -24dB and to peak levels of -2dB.

I played around with some options and tested them on a couple of tracks. Below I'll add waveform pics and dynamic range meter results for the original track and 3 sets of loudnorm options.

Original unmodified track (RG track gain -4.0dB)

![waveform_original](https://user-images.githubusercontent.com/14221126/200493268-02b86d14-a853-473c-aae6-589e1248172a.png) ![dr_original](https://user-images.githubusercontent.com/14221126/200493285-aeba17dc-7830-4028-b0b6-86af95ada074.png)

loudnorm without explicit options (so, I=-24.0, TP=-2.0)

![waveform_loudnorm_simple](https://user-images.githubusercontent.com/14221126/200493565-7f22b91b-aac6-45bf-a7f3-bd8ed3c75cd5.png) ![dr_loudnorm_simple](https://user-images.githubusercontent.com/14221126/200493576-a406e0af-d9e5-470a-a970-46d772fdb6a4.png)

loudnorm (I=-5.0, TP=-2.0)

![waveform_loudnorm_i_5](https://user-images.githubusercontent.com/14221126/200493795-b96906b4-069b-4719-ab7b-b896e848dd1d.png) ![dr_loudnorm_i_5](https://user-images.githubusercontent.com/14221126/200493816-8d7d192b-9fe8-4b95-9de7-73746be6da08.png)

loudnorm (I=-5.0, TP=-1.0)

![waveform_loudnorm_i_5_tp_1](https://user-images.githubusercontent.com/14221126/200493930-b83350d3-c455-46bc-97e9-8ba8b5733cb7.png) ![dr_loudnorm_i_5_tp_1](https://user-images.githubusercontent.com/14221126/200493939-d5df0e58-d299-48ce-bf62-2f20bc8a116e.png)

As you can see, setting I to -5.0dB makes the track louder and introduces some significant changes to the dynamic range, and raising TP to -1.0dB makes it even more severe. To get the most severe behaviour, you need to use -af "loudnorm=I=-5:TP=-1.0".

Bonus points if you can guess the track from the waveform! :)

sentriz commented 1 year ago

very nice! what's the target loudness of the normal _rg profiles? can we adjust these I and R values to match? such that we get similar loudness for the _rg and new _rg_auto or whatever profiles. maybe there's a better name too

spijet commented 1 year ago

Honestly I'd rather not, and it won't be similar anyway, as ReplayGain changes the audio volume statically (i.e. applies a single gain adjustment to the whole track and preserves the dynamic range (the volume difference between loud and quiet parts of the audio). Loudnorm, on the other hand, is fiddling with gain dynamically roughly every 20ms, making the whole track louder and (partially) killing the dynamic range in the process.

In our case, RG profiles apply +6dB preamp gain, which results in target volume of ~-12dB, and RG_Loud ones apply +15dB preamp gain, which results in target volume of ~-3dB. I assume that loudnorm=I=-5 should be close to RG_Loud profiles in terms of overall loudness, but I'll have to test and compare.

J4gQBqqR commented 1 year ago

If this is the case, is it possible to have ffmpeg tag metadata with calculated RG and send in original wave shape unmodified (but transcoded)? Let the player do the loudness adjustment work.

spijet commented 1 year ago

@J4gQBqqR IIRC ffmpeg has some kind of an "RG-scanner" audio filter, but I'm not sure if it adds the resulting gain to metadata. Moreover, to scan a track with ReplayGain you'll have to through the whole track first, which can hurt the streaming experience (for uncached tracks at least).

sentriz commented 1 year ago

some kind of an "RG-scanner" audio filter

ah! that's exactly what i thought the audionorm filter was doing. i was wondering why there was no delay / scan step :grin:

in that case i'm not even sure the loudnorm filter even makes sense - do people really want to change the dynamic range of their music?

edit does this look better? https://trac.ffmpeg.org/wiki/AudioVolume#PeakandRMSNormalization

J4gQBqqR commented 1 year ago

That is what I thought too. I thought only dynaudionorm filter will change dynamic range by doing normalization on time chunk, meanwhile loudnorm will go through the whole track first. I cannot speak for everyone, but I would not like to change dynamic range of tracks.

spijet commented 1 year ago

@sentriz some people like their music to be LOUD, that's why there are so many dynamic compression / volume boost stuff available in the wild. :)

@J4gQBqqR, then the only way (besides RG) would be peak normalization, which may not yield the results you're looking for, especially if applied "on the fly".

I found the filter I mentioned earlier, it's called "replaygain" (simple enough). The problem is that it seems to report the peak and gain values to standard output?

J4gQBqqR commented 1 year ago

Maybe one can use tools like awk to retrieve the output and use | to pipe it over to another transcode command to tag it. This will have overheads on the server definitely. Fortunately player like Dsub can stream configured number of tracks in advance. So on the player side, gapless playback is not detrimented.

spijet commented 1 year ago

This will require Gonic to start ffmpeg twice — once to scan the track and capture the resulting gain, and then again to actually transcode. To use pipes we'd have to use a shell.

sentriz commented 1 year ago

yep you see that kind stuff. and with eg ffmpeg -pass 1, ffmpeg -pass 2

i suppose we're back to https://github.com/sentriz/gonic/issues/244

spijet commented 1 year ago

So, I tested the same track with RG and "Loud RG" setups we already use and, as I expected, the "Loud RG" preset yields a result that is almost exactly as loud as loudnorm=I=-5, while maintaining the proper dynamic range. The usual RG preset is quieter, as it should be. :)

Also note that "Loud RG" yields a smaller dynamic range, as reported by foo_dr meter, because in case of this particular track there are two almost full-scale peaks (see the original waveform here, the beginning of the second half of the track). These peaks get overblown after applying RG gain with the preamp gain we use. Again, this is expected, and that's why we use aresample and alimiter filters here: aresample helps us detect "true peaks" (AKA inter-sample peaks), so we can process them properly, and alimiter compresses these peaks softly, so we don't get any audible clipping.

spijet commented 1 year ago

Aaaand here are the waveforms and DR reports:

Ordinary RG profile (RG track gain -4.0dB, profile's preamp gain +6dB)

![waveform_rg](https://user-images.githubusercontent.com/14221126/201459803-eef47157-a0d0-4dea-880f-7341f8cc93f3.png) ![dr_rg](https://user-images.githubusercontent.com/14221126/201459807-1f39c41d-d622-4795-8017-a765a8205abc.png)

"Loud RG" preset (RG track gain -4.0dB, profile's preamp gain +15.0dB)

![waveform_rg_loud](https://user-images.githubusercontent.com/14221126/201459857-591e3fc6-74ef-48ad-a221-9da3ef0013c1.png) ![dr_rg_loud](https://user-images.githubusercontent.com/14221126/201459864-14892753-ea40-4e80-8480-70bca902a881.png)

I can upload the original track and all Opus-encoded results from my tests to my website, if you're interested, but that is illegal, you know. 😁

Etran-H commented 1 year ago

I think we should use ebur128 to get input LUFS.

spijet commented 1 year ago

I believe that live loudness scanning is out of scope of Gonic, just like ReplayGain scanning of all files in the music library (which would actually produce more accurate results) is.

(comment edited to remove garbage markup from GitHub emails)

sentriz / gonic

Audio transcoding and replaygain #250