y-brehm / waveAlign

Python repo for audio loudness matching in order to end the loudness war between DJs.
3 stars 0 forks source link

Process .mp3 files without length modification #33

Closed maosi100 closed 3 weeks ago

maosi100 commented 2 months ago

Processing of .mp3 files leads to an output file with modified length

SimonZimmer commented 1 month ago

I looked a bit into this and we might have a serious problem here.

When an mp3 file is encoded, the encoder will use zero-padding in front and in the back of the audio-stream. This is done to fulfill the requirements of the different psychoacoustic processors used for data-reduction (frame-size boundary alignment, analysis buffer-zones). These zero-paddings may be different for each mp3-encoder and it's setting. What makes it worse is that one decoder might be able to read a file that was encoded with a different library as long as it supports the general mp3 standard.

It means that mp3 files in general do not guarantee timing accuracy when re-encoding.

I validated this using a mp3 file that was encoded and re-encoded with ffmpeg using the same encoder settings. When decoding the files in Reaper, they are perfectly aligned. But using Rekordbox, they are not. Since we cannot look inside of Rekordbox' decoder, there is no way of knowing what decoder they use.

A wild hypothesis: The current standard of mp3 encoding seems to be libmp3lame (LAME) https://lame.sourceforge.io/ which uses a granule_length (frames for mp3 processing) of 576 samples. Curiously, the offset that Rekordbox' decoder introduces is exactly 1152 (2*granule_length). So it might be that Rekordbox uses a libmp3lame decoder with 2 granules as a buffer-zone.

All in all there are 2 solutions to this IMO: A) Recognize that the mp3 format doesn't allow for absolute timing-accuracy and thus, drop support for it. B) Employ a patch, in which we append 1152 samples of almost-0 values to the beginning of a file

I implemented B) in this branch and tested it successfully with about 20 mp3s. But I'm very sceptical about using it, since we can't know for sure if my hypothesis sticks. Even if it does, rekordbox updates might just break this.

If anybody has some time, maybe you can try and use this project: mp3gain to process some mp3s and see if this retains the mp3 timing in rekordbox. If so we might be able to use it somehow.

y-brehm commented 1 month ago

Yesterday evening I did some testing with 39 mp3s from various sources. Unfortunately the results are not good:

Screenshot 2024-06-12 at 16 53 07

What I did for testing:

Additionally: Even for the tracks that I marked green in the screenshot there seems to be a very small offset between the Memory-Cue location and the beat grid. This is no surprise to me, since @SimonZimmer found the padding amount by try and error.

@SimonZimmer You suggested to test this by comparing the waveAligned tracks to the original ones in Recordbox without placing Cues. If I do so, all tracks are re-analyzed, which places the grid for all tracks at the same points again (at least our processing does not seem to interfere with the beat grid analyzation of Recordbox). However, in a real world scenario this does not help, since those Cue-Points are saved at an absolut position.

SimonZimmer commented 1 month ago

@y-brehm

@SimonZimmer You suggested to test this by comparing the waveAligned tracks to the original ones in Recordbox without placing Cues. If I do so, all tracks are re-analyzed, which places the grid for all tracks at the same points again

Why are all tracks re-analyzed? If I import tracks, then wavealign them and import the newly outputted files into rekordbox again, only the newly imported files are re-analyzed, which is how it should be IMO.

Your result doesn't really surprise me, as mentioned in my initial comment here. However, it does surprise me that you reached a different result than @maosi100 and me. Is there a timing-difference between memory-cues and hot-cues? @maosi100 How was your testing procedure?

Anyways, I'm not sure if this matters since your test is also a real-time scenario. So, unfortunately, this proves to me that rekordbox does not have a constant decoding offset.

Next step:

If anybody has some time, maybe you can try and use this project: mp3gain to process some mp3s and see if this retains the mp3 timing in rekordbox. If so we might be able to use it somehow.

y-brehm commented 1 month ago

Why are all tracks re-analyzed? If I import tracks, then wavealign them and import the newly outputted files into rekordbox again, only the newly imported files are re-analyzed, which is how it should be IMO.

Sorry this was formulated a little bit unclear. Your description is exactly what I meant. But simply checking if the analyzation of the aligned files matches the analyzation of the unaligned files does not suffice as a real world test, because of the mentioned absolut timings of cue points. The only thing you can see with this test is if the beginning of a song matches or not and if waveAlign interferes with the general beat-detection algorithm.

I'm surprised however, that both of you did not realize any timing difference. Should I send you the mp3s for additional testing? My Recordbox version is 6.7.1. What did you guys use for testing?

maosi100 commented 1 month ago

Rekordbox Version: 6.6.10 Branch: bug/mp3_issues Python version: 3.9.16 using pyenv Dependencies: pip install -r requirements.txt

SimonZimmer commented 1 month ago

Small update: I tested all of the other implementations listed in https://github.com/forart/HyMPS/blob/main/Audio/Treatments.md#normalizing-

namely: https://github.com/dharmendrasha/ffmpeg-normalize#readme https://github.com/jxmked/Audio-Normalizer https://github.com/ConstruKction/DivaNormalize#readme https://github.com/yogeshsherawat/audio-normalizer#readme https://github.com/Type-Delta/FFnorm?tab=readme-ov-file#scan-files-loudness

They all use ffmpeg, so the purpose of testing was to sanity-test my own implementation. As expected, all of these have similar time-alignment issues.

mp3gain (one of the few references that don't actually re-encode the mp3 files using ffmpeg) also has the issue according to @maosi100

Because of the encoder/decoder chaos and timing accuracy just not being something that the mp3-format considers at all, our options are running out here.

Let's see if there is some insight to be gained from Platinum 10 as a last resort.

maosi100 commented 1 month ago

State of current testing affairs: waveAlign current main branch:

waveAlign current mp3 bugfix branch:

mp3normalizer

mp3gain express

Platinum Notes 10

SimonZimmer commented 1 month ago

@maosi100 Ok nice, mp3gain express seems to be the new hope-bearer then 🙏