We want to add support for writing data-compressed audio formats. However, there are some obstacles associated with reliably writing compressed audio from python

Avoid MP3s

I would like to avoid mp3s because mp3 compression introduces enough delay to make files out-of-sync. It's not suitable for multitrack audio or audio that needs to loop seamlessly. We are working with multitracks (producer model) so this is a non-starter.

Viable alternatives:

.ogg (vorbis or opus)
.opus (opus)

We need a python library that can write either format reliably in-memory. Tragically, librosa and torchaudio are not good at this. Using the disk is too slow for the scale that we need.

Option 1. Python `soundfile`

The python soundfile package can write .ogg files with the vorbis codec, but

You have no control over the quality/bitrate
You have to watch out for this bug (which is carefully handled in our current numpy_to_ogg implementation)
I believe there is opus support int he pipeline (but I need to double check this)

Option 2. Python `pydub`

Depends on ffmpeg
You have to use a version of ffmpeg that is compiled with support for the needed codecs
The ffmpeg version installed with conda in our klay-beam docker containers are not compiled with the needed codecs, necessitating apt install or compiling ffmpeg from scratch in the docker image.

Testing

This PR adds support for testing that files encoded with lossy formats worked correctly. Once this is finalized we are setup to reliably pursue one of the options above.

The current implementation uses a few different mechanisms for testing that audio file encoding. These mechanisms are ready for review.

I'm wondering where you got the various hard-coded tolerance / divergence values from that you are using, especially in test_wav_file, test_mp3_file, and test_ogg_file

All the hard coded values are based on my empirical observations about values that reflected known-working condition.

For wav files, we could manually calculate the expected delta between floating point audio and discrete 16/24 bit PCM audio. That seems like overkill to me, especially when it's the compressed file format writers that come with the most risk. Can you think of a better way to do this? If not, I think the current implementation is a reasonable balance of safety/effort...but I welcome suggestions.

The main danger with the numpy_to_mp3 and numpy_to_wav has to do with environment dependencies such as ffmpeg and libsndfile. If we (for example) build a docker container that has a older version of libsndfile, then numpy_to_ogg will fail silently and just write digital black to our output audio file. I want to be able run our tests in ALL our docker images before publishing them. That gives us some protection against nasty surprises when writing millions of audio files.

With that in mind, the next steps I'm imagining after confirming these tests are satisfactory:

Choose a method for writing data-compressed audio (see top of this PR)
Run all these tests during the docker build process before publishing to DockerHub

klay-music / klay-beam

Audio file creation tests. #72

Avoid MP3s

Option 1. Python `soundfile`

Option 2. Python `pydub`

Testing

klay-music / klay-beam

Audio file creation tests. #72

Avoid MP3s

Option 1. Python soundfile

Option 2. Python pydub

Testing

Option 1. Python `soundfile`

Option 2. Python `pydub`