Audio normalization - Githubissues

zeyus commented 1 year ago

Is your feature request related to a problem? Please describe.

Not all podcasts are recorded with the same volume / headroom, meaning when going from one podcast to the next, sometimes sudden volume adjustments are required, including when it might be inconvenient to physically handle the phone.

Describe the solution you'd like

Include a feature to normalize track audio, and potentially other fx/filters

Describe alternatives you've considered

Using a system wide audio fx driver (e.g. Viper4Android, JamesDSP), unfortunately, support for newer android versions is flaky, and often these solutions require root, or are extremely difficult for non-technical users. I had some limited success with V4A, but it only works with bluetooth earbuds if I first start playing through the speaker, then connect to bluetooth, this is pretty janky.

Additional context

I've forked this repo, and planned on working on this feature myself (I still want to) and I have started looking at the Android implementation, but there's a problem with the underlying player (react-native-track-player -> KotlinAudio) not providing access to the AudioSessionID.

This probably won't even be an option until this PR for AndroidAuto is merged into KotlinAudio https://github.com/doublesymmetry/KotlinAudio/pull/88 which provides an MediaSession callback. I wouldn't know where to start for the iOS implementation.

Once the upstream KotlinAudio (and hopefully, RNTP) provide access to the session ID, it should be somewhat easy to implement audiofx.DynamicsProcessing on the audio :)

lovegaoshi commented 1 year ago

dynamicsProcessing seems interesting:O I did mine with ffmpeg btw. IMO this should be a podcast server feature to provide a gain value (paid feature opportunity if DNE) and then TrackPlayer.setVolume, similar to how youtube does it, and is implemented in innertune as such.

zeyus commented 1 year ago

dynamicsProcessing seems interesting:O I did mine with ffmpeg btw. IMO this should be a podcast server feature to provide a gain value (paid feature opportunity if DNE) and then TrackPlayer.setVolume, similar to how youtube does it, and is implemented in innertune as such.

Yeah I think it could be quite nice, as long as the end of the chain is a limiter then it won't distort.

If it's done on the server, it would require either live reencoding, or pre encoding all the tracks at the normalization level that people select. The advantage is you could do normalization across the track for real rather than dynamic processing, the disadvantage is that changing a setting would require the user to rebuffer a new stream rather than allowing realtime effect changes such as toggling Bass Boost. Also transcoding the streams or caching every file is going to require a fair bit of server processing and storage

Either way, now that I've thought about thia a little more, it might make the most sense to add effect processing in RNTP (or at least a wrapper to allow access) I say this because:

there already is an (iOS only) pitch changing effect in RNTP
it would be possible to implement similar effects into all the supported platforms
any code using RNTP can provide (albeit basic) audio processing for their users if they wish (if not there's no penalty)

Btw @lovegaoshi I made a PR into your PR 😅

lovegaoshi commented 1 year ago

Ah, I saw you meant something like https://ffmpeg.org/ffmpeg-all.html#dynaudnorm and I was talking about merely a r128gain single loudness normalization value based on perceived loudness or peak. It does sound like a job for an on the fly audio processor but ffmpeg still seems like the only tool for this?

For a simpler single r128gain value, I recently did a cloud r128gain feature to make users upload their processed r128gain values to a mongodb backend I put together with vercel: https://github.com/lovegaoshi/azusa-player-mobile/pull/155/files Seeing podverse actually caches the episodes, specifically r128gain normalization is trivial if podverse doesn't mind bundling ffmpeg. Alternatively a free oci or gci processing r128gain will do as well, as the only egress data is just a bunch of numbers and data ingress is free.

mitchdowney commented 1 year ago

@lovegaoshi I'm not really familiar with the concepts y'all are discussing, but I want to add that ideally any libraries we use are FOSS, or else F-Droid Store will reject (or flag) Podverse.

If a FOSS option isn't possible, we can remove it from the F-Droid build, but that's a last resort obviously.

podverse / podverse-rn

Audio normalization #1902