umputun / feed-master

Pulls multiple podcast feeds (RSS) and republishes as a common feed, properly sorted and podcast-client friendly.
https://feed-master.umputun.dev
MIT License
116 stars 26 forks source link

Enhancement: Normalize audio after downloading. #119

Open aplsms opened 1 year ago

aplsms commented 1 year ago

Dear Umputun, Could you please add an "audio normalization" option to the Youtube DLP postprocessing?

Different youtube channel has a different audio level. it is inconvenient to change the audio volume for each podcast.

Unfortunately, add --postprocessor-args "-filter:a loudnorm" is not working, because of ERROR: Postprocessing: Filtering and streamcopy cannot be used together.

Thank you, Andrii

umputun commented 1 year ago

this is an interesting idea, however are you sure this is about normalization? As far as I know, yt normalizes all videos in cases if it is quieter than the recommended -13 to -15 LUFS

I have checked a few audio files and none one them had any max peak issues, for example:

ffmpeg -i ~/Downloads/6ef3c20cac37d55ff5d4c5e437bce27e26d7d8eb.mp3 -af "volumedetect" -vn -sn -dn -f null /dev/null

ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
  built with Apple clang version 14.0.3 (clang-1403.0.22.14.1)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/6.0 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-neon
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
Input #0, mp3, from '/Users/umputun/Downloads/6ef3c20cac37d55ff5d4c5e437bce27e26d7d8eb.mp3':
  Metadata:
    artist          : Yulia Latynina
    album           : Юлия Латынина
    genre           : podcast
    date            : 20230827T162439
    title           : Вл. Осечкин  @MrGulagunet. Волан-де-борт. Раздел империи Пригожина: поделится ли Путин с генералами?
  Duration: 01:15:50.11, start: 0.025057, bitrate: 155 kb/s
  Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 155 kb/s
    Metadata:
      encoder         : Lavc59.37
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to '/dev/null':
  Metadata:
    artist          : Yulia Latynina
    album           : Юлия Латынина
    genre           : podcast
    date            : 20230827T162439
    title           : Вл. Осечкин  @MrGulagunet. Волан-де-борт. Раздел империи Пригожина: поделится ли Путин с генералами?
    encoder         : Lavf60.3.100
  Stream #0:0: Audio: pcm_s16le, 44100 Hz, stereo, s16, 1411 kb/s
    Metadata:
      encoder         : Lavc60.3.100 pcm_s16le
size=N/A time=01:15:50.06 bitrate=N/A speed= 667x     0x
video:0kB audio:783820kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_volumedetect_0 @ 0x600001e3c160] n_samples: 401315840
[Parsed_volumedetect_0 @ 0x600001e3c160] mean_volume: -24.9 dB
[Parsed_volumedetect_0 @ 0x600001e3c160] max_volume: -0.4 dB
[Parsed_volumedetect_0 @ 0x600001e3c160] histogram_0db: 46
[Parsed_volumedetect_0 @ 0x600001e3c160] histogram_1db: 244
[Parsed_volumedetect_0 @ 0x600001e3c160] histogram_2db: 1637
[Parsed_volumedetect_0 @ 0x600001e3c160] histogram_3db: 8987
[Parsed_volumedetect_0 @ 0x600001e3c160] histogram_4db: 53549
[Parsed_volumedetect_0 @ 0x600001e3c160] histogram_5db: 126076
[Parsed_volumedetect_0 @ 0x600001e3c160] histogram_6db: 170275
[Parsed_volumedetect_0 @ 0x600001e3c160] histogram_7db: 245923

In this case max_volume: -0.4 dB so normalization won't do anything here. However mean_volume: -24.9 dB means the audio is poorly compressed (if any). By compression I mean audio compression, aka dynamic range compression - the process reducing the difference between loud and quiet parts.

If this is the case - I doubt if this process is easy to do and if we can figure how to do it - most likely it will take a lot of time and resources. I have found this article describing how to achieve it with ffmpeg. If anyone want to add optional post-processing filter with all this, I could consider merging the change if the implementation is not too complicated.