iver56 / audiomentations

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
https://iver56.github.io/audiomentations/
MIT License
1.76k stars 183 forks source link

Mp3Compression artifacts when abs(audio).max() > 1.0 #272

Closed btamm12 closed 1 year ago

btamm12 commented 1 year ago

When audio segment's amplitude exceeds +/- 1, there are audible Mp3Compression artifacts. This is unexpected behavior because audiomentations uses pydub under the hood and when I run pydub directly, there are no artifacts. I have provided some samples in the attached zip file: audio_samples.zip

In my opinion, there are two fixes for this:

  1. handle this case in apply_pydub() before converting to int16 [link] or the function convert_float_samples_to_int16() [link] itself, e.g.,
    def convert_float_samples_to_int16(y):
       """Convert floating-point numpy array of audio samples to int16."""
       if not issubclass(y.dtype.type, np.floating):
           raise ValueError("input samples not floating-point")
       eps = 1e-7
       maxval = abs(y).max()
       if maxval + eps > 1.0:
           y /= (maxval + eps)
       return (y * np.iinfo(np.int16).max).astype(np.int16)
  2. warn the user in the documentation of Mp3Compression that float values must be in [-1,1] to avoid compression artifacts.

Edit: eps is probably not necessary in fix (1).

Thanks in advance!

iver56 commented 1 year ago

Thanks for reporting this! That is indeed not ideal, so there's room for improvement

Until this gets fixed, please use pydub directly or make sure that your float values don't exceed [-1, 1], ref the bullet point in the known limitations section in the docs https://iver56.github.io/audiomentations/#known-limitations

iver56 commented 1 year ago

A possible solution could be to have the Mp3Compression transform check the most extreme sample. If it exceeds 1, peak-normalize the audio before encoding to mp3, and then post-gain-compensate the decoded audio.

iver56 commented 1 year ago

The fix for this is included in the v0.30.0 release

https://pypi.org/project/audiomentations/0.30.0/