ikbencasdoei / godot-voip

godot-voip is a Godot Engine addon which makes it very easy to setup a real-time voice-chat system in your Godot game. This addon also includes a demo project.
https://godotengine.org/asset-library/asset/425
MIT License
194 stars 13 forks source link

Use Opus #16

Open troy-lamerton opened 3 years ago

troy-lamerton commented 3 years ago

Opus is the industry standard for VOIP audio.

This is because it is an upgrade over Speex & AAC - decoded Opus sounds better at the same bitrate. Source: https://en.wikipedia.org/wiki/File:Opus_quality_comparison_colorblind_compatible.svg

See Reduz's opinion:

Opus is mostly useful in our conext for VOIP as its a good general purpose replacement for Speex, but Godot will never supply VOIP out of the box given the large and diverse amount of ways to do it. https://github.com/godotengine/godot/pull/47114#issuecomment-801426795

If there is concerns about including libopus or some other binary, Android and iOS added an Opus decoder several years ago, and I can make an api to access these decoders. I expect that Windows/Mac/Linux also have an opus decoder available through a platform api.

Calinou commented 3 years ago

This is a pure GDScript add-on, and integrating Opus support would greatly complexify the add-on due to the need to use GDNative.

This add-on currently doesn't seem to be using any kind of compression, but the high-level multiplayer API itself can be configured to use various kinds of lossless compression including Zstandard.

troy-lamerton commented 3 years ago

I will leave this issue open to track adding compression.

the high-level multiplayer API itself can be configured to use various kinds of lossless compression including Zstandard.

That sounds great. Someone famaliar with godot's APIs can take over here.

Ping me for any questions about compressing the data. I doubt that Z standard will give the best result for audio data.

Calinou commented 3 years ago

That sounds great. Someone famaliar with godot's APIs can take over here.

This is now the default since https://github.com/godotengine/godot/pull/38313 was merged, although something like Zstandard will likely fare better on larger packets.

I agree that Zstandard compression on audio data isn't ideal (it's still lossless), but it's all we can do without access to real-time audio encoding from Godot. I haven't checked which sample rate, precision and sending mode (mono/stereo) this plugin uses, but there are likely optimizations possible here too (at the cost of quality).

troy-lamerton commented 3 years ago

Next step would be to make a proposal for what we need. Maybe there is already a proposal in the works for what I'm asking about? It's not practical for me to do this all myself.

Calinou commented 3 years ago

Next step would be to make a proposal for what we need. Maybe there is already a proposal in the works for what I'm asking about? It's not practical for me to do this all myself.

I did a quick search and couldn't find any proposals about BackBufferCopy.

ikbencasdoei commented 3 years ago

Sorry for the late reply. The plugin converts all audio to mono before sending. For precision it uses the gdscript PoolRealArray which stores 32bit floats (not sure if there's a better alternative). However sample rate and compression are defined project wide so I cannot change them. Also any audio processing in gdscript is very expensive anyways so I'm definitely considering converting some parts of the plugin to gdnative.

unfa commented 2 years ago

Yeah, having an extension that could encode/decode Opus audio streams seems like a necessity to properly implement voice chat.

Trasmitting uncompressed audio is incredibly wasteful. For voice communication 16-bit 16 KHz PCM should be almost perfect if appropriate low-pass filter is applied at ~7 kHz. With some dynamic range compression, gating and more agressive lowpass filter even 8-bit PCM at 8 KHz would probably work, and it'd save a lot of bandwidth, but it's still wasteful and would sound really bad compared to an Opus stream that would use less networking and sound much better in comparison in exchange for a little CPU time.