[Solution] Fade in/out effect for MP3s merged into an M4B

devnoname120 commented 1 year ago

I spent quite a bit of time and attempts to figure out how to add a fade in/out effect between MP3s merged into an M4B. I share my solution here for future visitors. Note that this solution could easily be natively integrated in m4b-tool but my schedule is very busy and unfortunately I don't have the bandwidth to do a pull request.

My requirements:

No re-encoding. The fade in/out effect should be applied on-the-fly between the decoding step and the reencoding step to avoid degrading the quality.
Preserve all metadata. This means that I can't use FFmpeg to re-encode and glue the files together at the same time. The final lossless merge operation ffmpeg -f concat -f copy run by m4b-tool is required to preserve them.

My solution:

find . -iname '*.mp3' -print0 | xargs -0 -I{} -P 8 ffmpeg -i {} -f lavfi -i anullsrc -max_muxing_queue_size 9999 -map_metadata 0 -strict experimental -movflags +faststart -vn -y -ab 196k -ar 44100 -ac 2 -acodec libfdk_aac -filter_complex '[0]afade=t=in:d=1:curve=tri[a]; [1]atrim=0:0.7[t]; [a][t]acrossfade=d=0.7:o=1:c1=tri:c2=nofade' -f mp4 {}.m4b

m4b-tool merge -vvv --debug --no-conversion --include-extensions=m4b --output-file="merged.m4b" .

Note: For the conversion step I directly use a FFmpeg command (ffmpeg -i {} -f lavfi -i […]) instead of m4b-tool for two reasons: 1) m4b-tool silently ignores the --ffmpeg-param for the Fraunhofer FDK AAC (libfdk_aac) codec (!) because m4b-tool directly runs ffmpeg instead of using the Ffmpeg.php executable abstraction.

Note however that --ffmpeg-param is properly applied when using the native FFmpeg AAC Encoder (aac) codec. I use the Fraunhofer FDK AAC codec as it has a better encoding quality for a given bitrate compared to the native aac encoder. 2) The --ffmpeg-param option of m4b-tool indiscriminately applies to both the conversion step (when using the native FFmpeg AAC Encoder) and the merge step (no matter what). This is due to the fact that they both use the Ffmpeg.php executable abstraction.
But we don't want to do that as it would apply the fade filter twice!
Additionally, the FFmpeg parameters -f concat -c copy used by m4b-tool for the merge aren't compatible with FFmpeg filters. Removing these options would both force a re-encoding (which degrades the sound quality) and drop the individual metadata of each converted file (they are preserved thanks to the -f concat -c copy options).

Explanations: The interesting parts are the following options in the first line. They add a fade-in + fade-out effect losslessly without an extra re-encoding step thanks to a filtergraph:

-f lavfi -i anullsrc

-filter_complex '[0]afade=t=in:d=1:curve=tri[a]; [1]atrim=0:0.7[t]; [a][t]acrossfade=d=0.7:o=1:c1=tri:c2=nofade'

Detailed break down for the curious:

ffmpeg is provided with two stream inputs:
- The mp3 file: -i {}.
- A libavfilter input virtual device (-f lavfi) that just inputs silent audio (-i anullsrc). Check Step 3 to see why we need it.
Step 1: [0]afade=t=in:d=1:curve=tri[a] adds a fade-in effect at the start of the decoded file.
- [0] is used as the input of the afade filter command. It corresponds to the first input passed to FFmpeg. Note that we can't use this filter for the fade-out at the end of the input stream as we would need to provide an absolute time offset in the stream, which we can't calculate within the filter pipeline. (Filter streams are non-rewindable and the afade filter command doesn't support relative time offsets to the end of the stream).
- t=in for a fade-in effect. Since no start time st is specified, the effect applies at the beginning of the file.
- d=1 means that the fade-in effect has a total duration of 1 second.
- curve=tri to select a triangular linear fade-in transition function.
- [a] to direct the output of this step to a named stream a.
Step 2: [1]atrim=0:0.7[t] cuts the anullsrc virtual silent stream to last 0.7 seconds. It needs to have the same duration as the one specified by the d parameter of acrossfade in the next filter.
- [1] is the input of the atrim filter command. This corresponds to the second input passed to FFmpeg, here anullsrc.
- 0:0.7 is the trim window. Here the trim will only keep the 0.7 seconds of the silent anullsrc stream.
Step 3: [a][t]acrossfade=d=0.7:o=1:c1=tri:c2=nofade adds a cross fade effect at the end of the decoded stream [a] + start of the second stream [t]. I use a trick (detailed below) to make it only add a fade-out effect at the end of [a] without changing its duration.
- [a] is used as the first input of acrossfade. It corresponds to the output of the first step i.e. the decoded file stream with a fade-in effect at the start.
- [t] is used as the second input of acrossfade. It corresponds to the output of the trim in the second step i.e. a silent stream with a duration of 0.7 seconds.
- d=0.7 is the duration of the fade-out effect. It's important for it to be equal to the atrim length of Step 2.
- If it's longer than the atrim step then the effect won't be applied at all (the second input stream needs to have a duration that is at least as long as the crossfade effect).
- If it's shorter than the atrim step, then a silence is added to the end of the output stream. We don't want that to increase the duration of the stream and add a silent section at the end, but instead only add a fade-out effect.
- o=1 means that the two streams should overlap during the cross-fade (fade out the first stream and fade in the second stream at the same time). This is the main trick of this filter_complex pipeline.
- During the last 0.7 seconds, [a] fades out while [t] fades in at the same time.
- By the time the 0.7s [a] fade-out is done, the [t] silent stream fade-in is also over (because we trim it to 0.7s which is also the cross-fade duration).
- Overall only a fade-out effect is applied as the second stream [t] is silent so the fade-in of [t] doesn't affect the output stream (a no-op).
- c1=tri to select a triangular linear fade-out transition for the first stream.
- c2=nofade to select an identity curve for the fade-in transition of the second stream. The choice of this curve shouldn't matter at the [t] stream is silent anyway.
- The output of acrossfade is the output of the whole filter pipeline.

sandreas commented 1 year ago

Phew, thank you for this huge and detailed investigation.

I (personally) do not have ANY use case for this - fading in does indeed modify the audio in a way I never would like to have it. Furthermore I don't think this is really an issue... more like a detailed guide to achieve something.

The --ffmpeg-param thing was a quick and dirty approach to provide some extended feature, but it was a really, REALLY bad idea. It causes more issues than it solves in my opinion.

What I should have done instead was to provide a small plugin api to modify commands before they are getting executed. Example:

// my-plugin.php
m4btool_register_command_plugin(function(array $command, CommandContext $context) {
    if(in_array("ffmpeg", $command, true)) {
        return $command;
    }
    // modify command as you wish
    // ....
   // then return it
   return $command
});

And then running

m4b-tool merge --command-plugin="my-plugin.php" ....

What do you think? Would this be better for your use case?

devnoname120 commented 1 year ago

Hmm I think that a plugin API would still have a learning curve and wouldn't be very convenient for one-off solutions. Just like with the --ffmpeg-param you would need to understand which commands m4b-tools runs and in which order. You'd additionally have to figure out how you should patch the array making sure that you only apply the changes at the right steps of the process.

A plugin API could definitely be useful if you plan on welcoming plugin contributions. But then it would require substantial effort to maintain these plugins considering that they would patch the command (not necessarily nicely in nice and future-proof ways).

In my case the hardest was to figure out where/how the ffmpeg commands were built, that --ffmpeg-param didn't behave the way I assumed it would, and finally deciding that it would just be less effort to add an echo right before the ffmpeg commands get executed so that I can just grab the commands and modify them manually.

I think that a great starting point would be to print the ffmpeg commands that m4b-tool runs (maybe by default to make them easier to discover). People who want custom behaviors could just use --dry-run, modify the ffmpeg commands, and manually run them. If they want to contribute the feature back to m4b-tool they can add a new option and do a PR.

What do you think?

sandreas commented 1 year ago

I think that a great starting point would be to print the ffmpeg commands that m4b-tool runs (maybe by default to make them easier to discover). People who want custom behaviors could just use --dry-run, modify the ffmpeg commands, and manually run them. If they want to contribute the feature back to m4b-tool they can add a new option and do a PR.

Oh that is easy. Just use --debug. Maybe it would be nice to have ONLY the commands printed, so an option with --command-logfile or something may be the solution for this.

devnoname120 commented 1 year ago

@sandreas Does it also work with FDK AAC? iirc the command was built differently but I'm not sure if the debug log works anyway or not.

sandreas / m4b-tool

[Solution] Fade in/out effect for MP3s merged into an M4B #225