A somewhat lightweight wrapper around Encodec that enables dynamic streamed reading, seeking, metadata, GPU support, streaming from URL, and arbitrary bitrates.
By nature, the ECDC format was originally intended to be encoded and decoded in a single step. This makes it efficient, but makes streaming incredibly difficult. This program attempts to mitigate the issues associated with the format:
Example GPU usage trend during decoding (note the linearly increasing gap between each window):
Python must first be installed.
Encodec must be installed (pip install git+https://github.com/facebookresearch/encodec
).
FFmpeg or a similar PCM-handling framework should be installed for most additional features (https://ffmpeg.org).
Usage (arguments in parentheses are optional):
Get ECDC info: ecdc_stream.py -i <file-or-url>
Decode ECDC->PCM: ecdc_stream.py (-ss <seek-start> -to <seek-end> -b <buffer-size> -g <cuda-device>) -d <file-or-url>
Encode PCM->ECDC: ecdc_stream.py (-b <bitrate> -n <song-name> -s <source-url> -g <cuda-device>) -e <file>
The program takes streamed inputs and outputs as PCM (s16le 48000r 2c) via stdin and stdout respectively, making it easy to integrate as a subprocess.
This is similar to, and intentionally designed to be compatible with ffplay -f s16le -ac 2 -ar 48k (-i) -
and similar programs.
A simple use case for playing a .ecdc file without needing to process all data would be py ecdc_stream.py -d <ecdc_file> | ffplay -f s16le -ac 2 -ar 48k -i -
.
Encoding any song to .ecdc can be done via ffmpeg -i <song> -f s16le -ac 2 -ar 48k - | py ecdc_stream.py -b <bitrate> -e <file>
.
Input file may be a URL (which is automatically assumed to be a raw filestream). Extraction from HTML-based websites (such as YouTube) is not supported.
If not specified, the cuda-device automatically takes a random GPU if possible, falling back to CPU inference otherwise.
The -i
"info" mode of the program outputs a yaml-style list as follows (example):
When encoding (-e
), the quality may be controlled by the bitrate (-b
) parameter. This defaults to 24k, but may accept any float above 0. Although the officially supported bitrates are 1.5k, 3k, 6k, 12k, and 24k, the wrapper will automatically resample the audio to make use of the next matching bitrate. This allows use of unsupported bitrates such as 0.1k, 3.5k, 8k, 28k, and 128k, but please keep in mind that extremely low and high bitrates will come with diminishing returns, and for any quality 48k and above it is recommended to simply use opus instead.
When decoding (-d
), the initial window size may be increased by specifying the bufsize (-b
) parameter. This defaults to 1, which starts with a window size of 1 (lowest latency, Õ(2n)
time complexity), increasing by 1 each time (amortised constant latency, Õ(n + 2sqrt(n))
time complexity). A value of 2 would start with a window size of 2 (slightly higher latency, Õ(3n/2)
time complexity), increasing by 2 each time (Õ(n + sqrt(n))
time complexity), and so on.
Õ(n)
immediately, but has the drawback of much higher latency particularly on weaker hardware or longer files. This option is mostly intended to function as a slightly more efficient way to directly decode and convert without needing to stream.O(sqrt n)
, meaning memory consumption is not typically a concern with encoding/decoding through Encodec. While running however, the PyTorch libraries may use up to 1GB, hence the conservative estimate.