Open ElijahHamilton opened 6 months ago
Most speech codecs downsample audio to 16khz or 8khz. You can do the same. At 8khz mono QOA needs about 25 kbits/s (8000 sample * 3 bits + some overhead for the frame headers). That's as low as it goes with the "official" version.
There's an experimental_1bit branch of QOA here that uses just 1 bit per sample. Quality is... quite bad, but this would get you to ~8kbit/s (assuming 8khz). With 2 bits per sample you'd get acceptable quality at ~16kbits. If that's something you want to do entirely depends on your use-case.
If you need better quality and have enough compute, go with Opus. If you need even lower bitrates, try Codec2 (this goes as low as 0.7kbit/s).
Made some more experiments:
Results:
As you can hear it get pretty noisy. The effect is worsened by the low samplerate (i.e. 1bit at 44khz sound way better than at 8khz), but it's still good enough to be intelligible.
Thanks! I've been looking for an open-source codec that is this customizable.
Most speech codecs downsample audio to 16khz or 8khz. You can do the same. At 8khz mono QOA needs about 25 kbits/s (8000 sample * 3 bits + some overhead for the frame headers). That's as low as it goes with the "official" version.
There's an experimental_1bit branch of QOA here that uses just 1 bit per sample. Quality is... quite bad, but this would get you to ~8kbit/s (assuming 8khz). With 2 bits per sample you'd get acceptable quality at ~16kbits. If that's something you want to do entirely depends on your use-case.
If you need better quality and have enough compute, go with Opus. If you need even lower bitrates, try Codec2 (this goes as low as 0.7kbit/s).
Maybe adjusting the predictor length would result in less noise?
Sure, you can try. In my tests (at least with 3bps) I found 4 coefficients to be the sweet spot. Longer ones tend to become unstable quickly; shorter ones don't predict much at all.
You can also experiment with how fast the predictor adapts to the signal (int delta = residual >> 5;
and prediction >> 14;
here). For the 1 and 2 bit variants I have chosen a slower adaptation (i.e. higher shift values) - but mostly because I've been testing with 8khz audio. For lower samplerates the difference between samples is higher, so a slower adaptation worked better.
Just try a bunch of things and check the reported PSNR when encoding with qoaconv
:]
Is it possible for QOA to achieve lower bitrates for speech? like 8kbit/s or 16kbits?