Closed cheulyop closed 3 years ago
at exactly which line raw normalized audio signal is written to an output path
This isn't the case, actually. ffmpeg will do the entire processing, basically amounting to:
ffmpeg -i input.wav -filter:a loudnorm output.wav
My program assumes you want to write to a file, which is necessary to cover all possible file formats. If by "array" you mean receiving the raw audio samples as they are being encoded, you need to change a few things.
You have to force ffmpeg to generate a known output sample format (e.g. little-endian signed 16-bit PCM audio, via -c:a pcm_s16le
), and you have to change the output file format accordingly (e.g. to s16le
).
So the command would be:
ffmpeg -i input.wav -filter:a loudnorm -c:a pcm_s16le -f s16le -
Now ffmpeg is passing the encoded stream to stdout, which you can read from subprocess
.
The problem with this is that it'd require:
I see this as a nice use case which I don't want to implement (PRs that retain existing functionality are welcome though), but I hope this gives you an idea on what to do.
Also, I am not sure what the issue is with using temporary files. You won't gain much in terms of efficiency, unless you are severly limited in terms of IO, in which case reading/writing a file takes up more time than the actual encoding.
@slhck This helps a lot, thank you. I wanted to simply be able to read/write audio files one less time, as I have several GiB of files to process.
However, given your explanation, it seems the cost of rewriting the whole code would be too high compared to the benefit of spending several minutes less in the preprocessing step.
Although not currently, I suppose the overhead of using temporary files would be a problem (like you said) if we were to normalize some audios on resource-constrained environments like mobile devices or tiny cloud instances. Nonetheless, there'd probably be a walkaround.
Anyhow, thank you for this amazing work 👍
Yes, I understand. It's certainly doable, but it'd require some work and API redesign. If you have any further questions don't hesitate to ask!
Is it possible to return outputs as arrays in the memory rather than save them as files on disk?
I've been using
ffmpeg-normalize
by importing it as a package in another script by modifying the__main__.py
a little instead of using it on the command line. For that, I would like to have normalized audios returned as arrays rather than written as files on disk, as that would allow applying further enhancement (e.g., noise suppression) steps after the normalization.However, although I've looked through the codebase, I couldn't quite figure at exactly which line raw normalized audio signal is written to an output path, as I would like to pull it out and use it in my script on the fly.
It is possible I'm missing what's there already or just not understanding how FFmpeg works.
For now, I'm getting around this by having the output written to a temporary file and deleting it later, but this makes everything slower as I need to read the normalized audio file back into the memory.
Here are some code snippets of what I'm doing. In
__main__.py
I've modified the main function as below to bypass the cli.This modified main function is called in another script where I clean and enhance some speech audios.