lower bit-rates? - Githubissues

phoboslab / qoa

The “Quite OK Audio Format” for fast, lossy audio compression

MIT License

767 stars 42 forks source link

lower bit-rates? #42

Open ElijahHamilton opened 6 months ago

ElijahHamilton commented 6 months ago

Is it possible for QOA to achieve lower bitrates for speech? like 8kbit/s or 16kbits?

phoboslab commented 6 months ago

Most speech codecs downsample audio to 16khz or 8khz. You can do the same. At 8khz mono QOA needs about 25 kbits/s (8000 sample * 3 bits + some overhead for the frame headers). That's as low as it goes with the "official" version.

There's an experimental_1bit branch of QOA here that uses just 1 bit per sample. Quality is... quite bad, but this would get you to ~8kbit/s (assuming 8khz). With 2 bits per sample you'd get acceptable quality at ~16kbits. If that's something you want to do entirely depends on your use-case.

If you need better quality and have enough compute, go with Opus. If you need even lower bitrates, try Codec2 (this goes as low as 0.7kbit/s).

phoboslab commented 6 months ago

Made some more experiments:

Results:

wav, 44khz, 16bits per sample (704 kbits/s): https://phoboslab.org/files/temp/male_speech_44khz_16bit.wav
wav, 8khz, 16bits per sample (128 kbits/s): https://phoboslab.org/files/temp/male_speech_8khz_16bit.wav
qoa, 8khz, 3bits per sample (24 kbits/s): https://phoboslab.org/files/temp/male_speech_8khz_3bit.wav
qoa, 8khz, 2bits per sample (16 kbits/s): https://phoboslab.org/files/temp/male_speech_8khz_2bit.wav
qoa, 8khz, 1bits per sample (8 kbits/s): https://phoboslab.org/files/temp/male_speech_8khz_1bit.wav

As you can hear it get pretty noisy. The effect is worsened by the low samplerate (i.e. 1bit at 44khz sound way better than at 8khz), but it's still good enough to be intelligible.

ElijahHamilton commented 6 months ago

Thanks! I've been looking for an open-source codec that is this customizable.

ElijahHamilton commented 6 months ago

Most speech codecs downsample audio to 16khz or 8khz. You can do the same. At 8khz mono QOA needs about 25 kbits/s (8000 sample * 3 bits + some overhead for the frame headers). That's as low as it goes with the "official" version.

There's an experimental_1bit branch of QOA here that uses just 1 bit per sample. Quality is... quite bad, but this would get you to ~8kbit/s (assuming 8khz). With 2 bits per sample you'd get acceptable quality at ~16kbits. If that's something you want to do entirely depends on your use-case.

If you need better quality and have enough compute, go with Opus. If you need even lower bitrates, try Codec2 (this goes as low as 0.7kbit/s).

Maybe adjusting the predictor length would result in less noise?

phoboslab commented 6 months ago

Sure, you can try. In my tests (at least with 3bps) I found 4 coefficients to be the sweet spot. Longer ones tend to become unstable quickly; shorter ones don't predict much at all.

You can also experiment with how fast the predictor adapts to the signal (int delta = residual >> 5; and prediction >> 14; here). For the 1 and 2 bit variants I have chosen a slower adaptation (i.e. higher shift values) - but mostly because I've been testing with 8khz audio. For lower samplerates the difference between samples is higher, so a slower adaptation worked better.

Just try a bunch of things and check the reported PSNR when encoding with qoaconv :]