slhck / ffmpeg-normalize

Audio Normalization for Python/ffmpeg
MIT License
1.25k stars 117 forks source link

Get outputs as arrays rather than have them saved as files? #153

Closed cheulyop closed 3 years ago

cheulyop commented 3 years ago

Is it possible to return outputs as arrays in the memory rather than save them as files on disk?

I've been using ffmpeg-normalize by importing it as a package in another script by modifying the __main__.py a little instead of using it on the command line. For that, I would like to have normalized audios returned as arrays rather than written as files on disk, as that would allow applying further enhancement (e.g., noise suppression) steps after the normalization.

However, although I've looked through the codebase, I couldn't quite figure at exactly which line raw normalized audio signal is written to an output path, as I would like to pull it out and use it in my script on the fly.

It is possible I'm missing what's there already or just not understanding how FFmpeg works.

For now, I'm getting around this by having the output written to a temporary file and deleting it later, but this makes everything slower as I need to read the normalized audio file back into the memory.

Here are some code snippets of what I'm doing. In __main__.py I've modified the main function as below to bypass the cli.

def main(sr, input_file, output_file):
    ffmpeg_normalize = FFmpegNormalize(sample_rate=sr)
    ffmpeg_normalize.add_media_file(input_file, output_file)
    ffmpeg_normalize.run_normalization()

This modified main function is called in another script where I clean and enhance some speech audios.

def clean_speech(example):
    speech, sr = example['speech'], example['sampling_rate']

    # create tempoary file to store speech data
    with tempfile.NamedTemporaryFile() as tmp:
        # write speech to temp file as wav
        tin, tout = tmp.name, tmp.name + '.wav'
        sf.write(tin, speech, sr, format='WAV', subtype='PCM_16')

        # apply loudness normalization and noise suppression
        normalize(sr, tin, tout)
        suppressed = suppress(tout, save_suppresed=False)
        print(suppressed)
        os.remove(tout)  # delete temporary file

    ...
    # return example with modified speech audio array
    return example
slhck commented 3 years ago

at exactly which line raw normalized audio signal is written to an output path

This isn't the case, actually. ffmpeg will do the entire processing, basically amounting to:

ffmpeg -i input.wav -filter:a loudnorm output.wav

My program assumes you want to write to a file, which is necessary to cover all possible file formats. If by "array" you mean receiving the raw audio samples as they are being encoded, you need to change a few things.

You have to force ffmpeg to generate a known output sample format (e.g. little-endian signed 16-bit PCM audio, via -c:a pcm_s16le), and you have to change the output file format accordingly (e.g. to s16le).

So the command would be:

ffmpeg -i input.wav -filter:a loudnorm -c:a pcm_s16le -f s16le  -

Now ffmpeg is passing the encoded stream to stdout, which you can read from subprocess.

The problem with this is that it'd require:

  1. Rewriting the entire code for running the commands to not write intermediate files in case of two-pass normalization.
  2. Changing the command-line utilities to ingest stdout and provide that to an outside calling program

I see this as a nice use case which I don't want to implement (PRs that retain existing functionality are welcome though), but I hope this gives you an idea on what to do.

slhck commented 3 years ago

Also, I am not sure what the issue is with using temporary files. You won't gain much in terms of efficiency, unless you are severly limited in terms of IO, in which case reading/writing a file takes up more time than the actual encoding.

cheulyop commented 3 years ago

@slhck This helps a lot, thank you. I wanted to simply be able to read/write audio files one less time, as I have several GiB of files to process.

However, given your explanation, it seems the cost of rewriting the whole code would be too high compared to the benefit of spending several minutes less in the preprocessing step.

Although not currently, I suppose the overhead of using temporary files would be a problem (like you said) if we were to normalize some audios on resource-constrained environments like mobile devices or tiny cloud instances. Nonetheless, there'd probably be a walkaround.

Anyhow, thank you for this amazing work 👍

slhck commented 3 years ago

Yes, I understand. It's certainly doable, but it'd require some work and API redesign. If you have any further questions don't hesitate to ask!