klay-music / klay-beam

Our Apache Beam Transforms and Pipelines
1 stars 0 forks source link

Audio file creation tests. #72

Closed CharlesHolbrow closed 10 months ago

CharlesHolbrow commented 10 months ago

We want to add support for writing data-compressed audio formats. However, there are some obstacles associated with reliably writing compressed audio from python

Avoid MP3s

I would like to avoid mp3s because mp3 compression introduces enough delay to make files out-of-sync. It's not suitable for multitrack audio or audio that needs to loop seamlessly. We are working with multitracks (producer model) so this is a non-starter.

Viable alternatives:

We need a python library that can write either format reliably in-memory. Tragically, librosa and torchaudio are not good at this. Using the disk is too slow for the scale that we need.

Option 1. Python soundfile

The python soundfile package can write .ogg files with the vorbis codec, but

Option 2. Python pydub

Testing

This PR adds support for testing that files encoded with lossy formats worked correctly. Once this is finalized we are setup to reliably pursue one of the options above.

The current implementation uses a few different mechanisms for testing that audio file encoding. These mechanisms are ready for review.

CharlesHolbrow commented 10 months ago

I'm wondering where you got the various hard-coded tolerance / divergence values from that you are using, especially in test_wav_file, test_mp3_file, and test_ogg_file

All the hard coded values are based on my empirical observations about values that reflected known-working condition.

For wav files, we could manually calculate the expected delta between floating point audio and discrete 16/24 bit PCM audio. That seems like overkill to me, especially when it's the compressed file format writers that come with the most risk. Can you think of a better way to do this? If not, I think the current implementation is a reasonable balance of safety/effort...but I welcome suggestions.

The main danger with the numpy_to_mp3 and numpy_to_wav has to do with environment dependencies such as ffmpeg and libsndfile. If we (for example) build a docker container that has a older version of libsndfile, then numpy_to_ogg will fail silently and just write digital black to our output audio file. I want to be able run our tests in ALL our docker images before publishing them. That gives us some protection against nasty surprises when writing millions of audio files.

With that in mind, the next steps I'm imagining after confirming these tests are satisfactory: