Proper physical treatment of sound waves for improved spatial audio

Krzmbrzl commented 1 month ago

Context

Spatial audio (also called positional audio) in Mumble uses a somewhat simplistic approach to simulating how an observer receives sound from different locations etc.

Description

In order to maximize the realism of positional audio, Mumble should properly take physical effects into account. These are

Use of proper head-related transfer functions (HRTFs). This simulates how sound waves are affected by traveling through the physical object that is the human being or more specifically, its head, of the person hearing that sound. This includes frequency changes (which are themselves frequency-dependent) as well as time- and phase- and volume-shifts of the sound waves. In other words, this also incorporates interaural delay.
Properly account for doppler-shifts arising from the (relative) movements of the sound source and the sound receiver.
Take the virtual surroundings into account. This of course requires knowledge of some kind of 3D world map in order to be able to map from positions to physical surroundings. This would allow to have effects for
- Reverb
- Occlusion
- Attenuation

HRTF could be achieved by making use of the OpenAL (open audio library) ecosystem of which there exists an LGPL licensed variant called OpenAL-Soft. Potentially, OpenAL-Soft could also be used as a regular cross-platform audio library which would then make our own platform-specific backend implementations obsolete (which would greatly reduce the maintenance effort). Whether or not this is a viable direction is not yet clear though.

The speed of entities required for Doppler shifts can be obtained via "numeric differentiation", i.e. taking two position updates and then checking how much the entity has moved in the given amount of time.

For the environmental effects, data about the physical surroundings of the entities is required. In order to obtain these, the plugin API could be extended to allow plugins to provide this kind of data.

Note that there exists also a Qt component for handling spatial audio that might be a viable option for us to use. However, ideally we wouldn't introduce a Qt dependency on the audio level processing.

Mumble component

Client

OS-specific?

No

Additional information

Online resources:

Related issues:

6532
2324
5934
1933
3234

Hiradur commented 1 month ago

The speed of entities required for Doppler shifts can be obtained via "numeric differentiation", i.e. taking two position updates and then checking how much the entity has moved in the given amount of time.

Please note that Mumble would have to know the ingame unit in which the movement speed is measured. This could be a parameter that a game-plugin could set. But even then I think It could cause some weird artifacts, e.g. if a player teleports from one end of a map to another (huge difference between positions in a singe time step). Some upper limit to safe guard around this would make sense.

For the environmental effects, data about the physical surroundings of the entities is required. In order to obtain these, the plugin API could be extended to allow plugins to provide this kind of data.

This would be one way to do it, Creative chose another for EAX Voice: back when EAX was popular, it received the environmental data from the game through the sound card driver and applied that to the microphone input stream so that the processed stream was available to any VoIP software. I don't think that OpenAL Soft supports this at the moment and it would only work for games using EFX or EAX provided by OpenAL but it wouldn't require any work on Mumble's side.

Here are some examples of EAX Voice: https://www.youtube.com/watch?v=30fTc5t5QNU https://www.youtube.com/watch?v=wxIYNG4TQ7U

Krzmbrzl commented 1 month ago

Mumble would have to know the ingame unit in which the movement speed is measured.

I would argue that the already require the positional data to be in meters and since the respective audio is realtime, it would make sense for the time to be measured in seconds as well.

In order to account for games with very fast movement (e.g. cars or even spaceships) the plugin could set a speed multiplier in order to keep the Doppler effect on a sane level.

Some upper limit to safe guard around this would make sense.

Absolutely!

Creative chose another for EAX Voice

Interesting approach. Never heard of it. It sounds very convenient though.

davidebeatrici commented 1 month ago

I was aware of EAX, but not EAX Voice. That feature is/was cool!

I already had a technique like that in mind, but the issue (as usual) is supporting specific games. In theory we could gather data directly from the audio library if a known/documented one is used, but otherwise it's going to be hard unless somebody has already reverse engineered the internals.

QmwJlHuSg9pa commented 1 month ago

Your best bet would probably be to speak to the maintainer of openal-soft directly; kcat has made strides in recent years towards integrating EAX support into the project.

mirh commented 1 month ago

This would be one way to do it, Creative chose another for EAX Voice

I mean.. that's just a matter of different "places" where the mic effects are implement/offer the mic effects . But game-side there is no difference into a "predisposition" being required.

And in this sense, while openal integration could certainly smooth out things for the games using it, I'm somewhat worried that the others with some/degree of generalization may instead be penalized by going higher level (though openal could still be super useful to implement HRTF and whatnot spatial)

but it wouldn't require any work on Mumble's side.

That sound card driver thing by Creative? Of course not, it works for everybody. But we don't control RTKVHD64 or AtihdWT6. So, either you find a way to implement this in an APO (I'm not even sure it is possible, given that they would still have to poke inside game processes) or openal will have to expose this information to the rest of the system some way.

Such ~frontend conundrum that would then stack with the one I was left with for the backend at https://github.com/kcat/openal-soft/issues/415#issuecomment-2308399677

kcat has made strides in recent years towards integrating EAX support into the project.

EAX *has* been integrated, nearly 3 years ago in one big PR already.

will-ca commented 2 weeks ago

I would argue that the already require the positional data to be in meters and since the respective audio is realtime, it would make sense for the time to be measured in seconds as well.

Would it make sense for distance/position be unitless, and instead allow specifying a speed of sound parameter? (Default 340s^-1, equivalent to meters at STP.)

Though IDK if that'll affect wavelength-dependent effects.

Krzmbrzl commented 2 weeks ago

Would it make sense for distance/position be unitless, and instead allow specifying a speed of sound parameter? (Default 340s^-1, equivalent to meters at STP.)

Not sure what problem this would try to solve though 🤔

mumble-voip / mumble