kcat / openal-soft

OpenAL Soft is a software implementation of the OpenAL 3D audio API.
Other
2.2k stars 532 forks source link

Support for Mumble positional audio #415

Closed Hiradur closed 3 years ago

Hiradur commented 4 years ago

Mumble [1] is a FOSS voice chat software. One particular feature of it is that it can be hooked up to games/applications and transmit positional data of a player's avatar so that all participants hear speech from other players as if it was coming from another player's avatar. Currently there exist two options that I'm aware of to make a game compatible with this feature:

I'm now wondering if it would be possible and make sense to integrate Mumble positional audio support into OpenAL Soft. As an audio library it already has access to the most important data. The Mumble protocol has fields for additional metadata which could for example be EFX parameters to play speech with EFX effects (although this would require support in the Mumble client, e.g. through a OpenAL backend which has been proposed but not included yet [3]). This would enable something similar to the EAX voice feature demonstrated here [4].

Besides positional data contextual information is often used with Mumble to e.g. separate positional audio for competing teams in the same chat room. This would be data OpenAL Soft wouldn't have access to and thus wouldn't be able to provide. IMO this wouldn't be a dealbreaker, many supported games don't support this feature anyway [5].

Integrating Mumble positional audio support in OpenAL Soft would however make positional audio in Mumble available to many games at once without further modification.

[1] https://www.mumble.info/ [2] https://wiki.mumble.info/wiki/Link [3] https://github.com/mumble-voip/mumble/issues/1933 [4] https://www.youtube.com/watch?v=30fTc5t5QNU [5] https://wiki.mumble.info/wiki/Games#Supported_games

kcat commented 4 years ago

I'm not sure what OpenAL Soft would do for this to help Mumble integration. OpenAL only knows the position of active sound sources (or sound sources that the engine may play when necessary), but not game entities like players that may produce sound at some point. What would OpenAL need to do to help?

Hiradur commented 4 years ago

A single Mumble client only needs to know the position of the avatar of the player or more precisely the listener's position. The positions of other participants of a chat room are received from the other clients. The Mumble client then knows the position of its corresponding player as well as the positions of all other participants and uses this to render each participant's speech with positional audio.

If you look at the source code for the Mumble Link plugin [1] it may become a bit clearer.

[1] https://wiki.mumble.info/wiki/Link

kcat commented 4 years ago

I see. But still, it seems to want information OpenAL simply doesn't have:

// Identifier which uniquely identifies a certain player in a context (e.g. the ingame name).
wcsncpy(lm->identity, L"Unique ID", 256);
// Context should be equal for players which should be able to hear each other positional and
// differ for those who shouldn't (e.g. it could contain the server+port and team)
memcpy(lm->context, "ContextBlob\x00\x01\x02\x03\x04", 16);

OpenAL doesn't have any way to generate unique IDs for everyone in a given game session, nor a context blob to indicate who should hear the given player/listener as positional. More generally, I'm also seeing a lack of synchronization with the given example code, how is whoever reads the shared memory supposed to avoid reading a partial update?

Hiradur commented 4 years ago

As far as I know identity and context are optional. If you leave them empty the positional audio should still work. This is what I was talking about in

Besides positional data contextual information is often used with Mumble to e.g. separate positional audio for competing teams in the same chat room. This would be data OpenAL Soft wouldn't have access to and thus wouldn't be able to provide. IMO this wouldn't be a dealbreaker, many supported games don't support this feature anyway [5].

More generally, I'm also seeing a lack of synchronization with the given example code, how is whoever reads the shared memory supposed to avoid reading a partial update?

Good question, I don't know if this case is handled properly.

I realized that EFX support as I was talking about in the first post wouldn't be possible without a change to the protocol. For some reason I thought the context field could be used for this but having two players in different EFX environments would disable positional audio because the context field would no longer match. So EFX support would require a change to the Mumble protocol. There is a related issue about enhancing the positional audio feature [1].

I forgot to mention that there is a helper tool [2] to test the positional audio feature of Mumble.

[1] https://github.com/mumble-voip/mumble/issues/3234 [2] https://github.com/mumble-voip/mumble-pahelper

kcat commented 4 years ago

As far as I know identity and context are optional. If you leave them empty the positional audio should still work.

From what the wiki says, the context is needed to know who on a server should hear you positionally. If some generic context is given, it would always match everyone else so everyone on the same Mumble server using OpenAL Soft would hear each other positionally, even when playing different games (or different instances or levels/maps of the same game).

But either way, there's another issue that if an app has Mumble support itself while OpenAL Soft also tries to access Mumble, that would create a conflict. The only way it could possible work is if the app knows about Mumble to tell OpenAL it doesn't use Mumble. That, on top of the apparent synchronization issue.

I realized that EFX support as I was talking about in the first post wouldn't be possible without a change to the protocol. For some reason I thought the context field could be used for this but having two players in different EFX environments would disable positional audio because the context field would no longer match.

For this kind of thing, it needs to be tied into the game logic. EFX is more than just applying a reverb to what the listener hears. An app generally may only use a preset reverb environment that it occasionally changes (if any at all), but it's entirely possible for it to use a few that it dynamically updates to create a more detailed audio scene. You also don't want to use a reverb per-source, but have a small set number of active reverbs (1 to 3 or 4 at most, associated with particular environments near the listener) that sources feed into, with filters to obstruct sounds given the map geometry and materials. OpenAL Soft doesn't know how sound sources apply to reverb environments except by what the app tells it to do at a given point in time, so the app would have to control the properties of the source that's playing the voice stream.

Hiradur commented 4 years ago

From what the wiki says, the context is needed to know who on a server should hear you positionally. If some generic context is given, it would always match everyone else so everyone on the same Mumble server using OpenAL Soft would hear each other positionally, even when playing different games (or different instances or levels/maps of the same game).

A simple solution would be to only have players playing the same game and map in a particular chatroom. Not convenient but not too bad either.

But either way, there's another issue that if an app has Mumble support itself while OpenAL Soft also tries to access Mumble, that would create a conflict. The only way it could possible work is if the app knows about Mumble to tell OpenAL it doesn't use Mumble.

This also came to my mind. One possible solution would be to deactivate Mumble positional audio by default and only turn it on with a setting in alsoft.ini. I think if a game has alsoft.ini next to it executable this file takes precedence over the other alsoft configuration files. This mechanic could be used to enable Mumble positional audio for specific games.

For this kind of thing, it [EFX] needs to be tied into the game logic

I understand. Well, it was worth a try.

mirh commented 3 years ago

Positional audio "identifcaiton" is a game logic aspect, not an audio api one. I'm still scrambling for a sense here. (besides, there's an awfully small number of openal games too)

Hiradur commented 3 years ago

To do positional audio, Mumble needs to know the player's position in the virtual world. This information is also used by OpenAL to determine the listerner's position. So I guess games hand off the same vectors to both OpenAL and Mumble and I figured if that information was redundant then OpenAL Soft might as well integrate Mumble positional audio support. This would eliminate the need to have to integrate Mumble positional audio support into every game or write Mumble plugins that extract the necessary information from the game's memory.

However, at least the following information would be missing in OpenAL:

Additionally, positional information would be incorrect if the listener's position doesn't equal the avatar's position (e.g. when the listener's position is bound to the camera in third person view or cinematics).

I'll close this issue since only the minimal feature set of Mumble's positional audio could possibly be supported and there might even be a discrepancy between the listener's position and the avatar's position.

mirh commented 2 months ago

there might even be a discrepancy between the listener's position and the avatar's position.

Ironically enough, that's already a normal problem to handle even without the whole "voip" aspect (so I don't think that should be a problem more than it isn't already for normal audio output, which has in fact some knobs to try to put up with it).

Anyhow, just wanted to point out that I just realized that at least some old EAX game.. did use to have some kind of "predisposition" about voice transmission (even though how it worked is very unclear). https://www.youtube.com/watch?v=30fTc5t5QNU https://web.archive.org/web/20070601013325/http://www.soundblaster.com/eax/abouteax/eax5ahd/eax5_2.asp

Hiradur commented 1 month ago

@mirh There is a similar video of Thief2x where EAX Voice has probably been used. I guess the Creative driver just applies currently active EAX effects to the input stream.

mirh commented 1 month ago

Yes, that totally seems to be it and it should have been compatible with every EAX game (you could control the strength with the "Mic environment FX" slider in the X-Fi control panel).

But then EAX voice is also this supposed “3D Voice Over IP” thing? And it really seems like on point to this feature request. It's presumable that it could only have worked with the in-game voice chat (even though there's no hard evidence and they could have as well hacked.. uh, what was even in vogue back then? TS2 and ventrilo?) but to be sure in 2024 only mumble would make sense and could even hope to do it.

I couldn't find any report of such thing ever being used though, assuming it even ever shipped in the drivers in the first place. Worse, there aren't exactly many EAX 5 multiplayer games that shipped after 2005 (UT2004, BF2, BF2142, Q4 and FSW?) unless they helped to implement the thing in some godforsaken asian MMO or they backported the feature to the eax 3/4 headers. So.. @bibendovsky do you know anything about this?

bibendovsky commented 1 month ago

Yes, that totally seems to be it and it should have been compatible with every EAX game (you could control the strength with the "Mic environment FX" slider in the X-Fi control panel).

But then EAX voice is also this supposed “3D Voice Over IP” thing? And it really seems like on point to this feature request. It's presumable that it could only have worked with the in-game voice chat (even though there's no hard evidence and they could have as well hacked.. uh, what was even in vogue back then? TS2 and ventrilo?) but to be sure in 2024 only mumble would make sense and could even hope to do it.

I couldn't find any report of such thing ever being used though, assuming it even ever shipped in the drivers in the first place. Worse, there aren't exactly many EAX 5 multiplayer games that shipped after 2005 (UT2004, BF2, BF2142, Q4 and FSW?) unless they helped to implement the thing in some godforsaken asian MMO or they backported the feature to the eax 3/4 headers. So.. @bibendovsky do you know anything about this?

Could be an internal API. I don't see in the specification mentioning of EAX Voice, VoIP or similar features.

Hiradur commented 1 month ago

But then EAX voice is also this supposed “3D Voice Over IP” thing?

The way how the 3D Voice over IP feature is described, it might actually refer to a game-engine feature decoupled from Creative's driver or EAX. They might mean the following: If a game has an internal 3D VoIP feature (like many do) and you additionally enable EAX Voice, the other players can hear your voice with EAX effects since they are simply applied to your microphone stream.

Should I reopen this issue and rename it Add support for environmental voice processing similar to EAX Voice since this would be something that OpenAL Soft (and DSOAL) could do? Actually, it might be better to create a new issue for this instead.

mirh commented 1 month ago

The feature sounds stated separately, and they underline how it allows proper directionality of communications (whether "talk too loud your enemies will hear you" is true or just rhetorical is unclear, but at least it should work with squad mates).

It's true though that this might just be a game totally internal thing then (and this may even be why, for once, they aren't subfixing the feature with the n-th ®).. But was that the case, then shouldn't it already work with just eax enabled in openal-soft/dsoal? @ThreeDeeJay could you have some of your server minions try older multiplayer games with a X-fi for a confirmation?

And maybe from that answer it should also depend how the far simpler EAX voice gets implemented (which for as much as cheap, trashy and exaggerated it may be.. it does sound as a cool idea eventually). Like, of course it shouldn't be too hard to implement inside of openal-soft capture.. but then of course the question is how could mumble even access it (can the game context be shared? can you implement an interface that can be universally hooked for every game? would you eventually still need a dedicated plugin for each one?)

ThreeDeeJay commented 1 month ago

@ThreeDeeJay could you have some of your server minions try older multiplayer games with a X-Fi for a confirmation?

@mirh I'd be nicer about the people doing favors 👀💦 but sure, I can ask around since we were talking about this just the other day. I thought this feature was exclusive to EAX 5.0 games but Raven Shield is EAX 3.0 and Thief 2 is EAX 2.0 so is the point to find out whether this works in any EAX with voice chat or something more specific?

I really like the positional 3D voice chat idea, but tbh hearing myself would probably get annoying real fast so I'd rather just hear the subtle reverb/echo without the "direct sound", while teammates obviously hear both direct sound and reverberation.

I can't wait for Mumble to get OpenAL. The positional surround mix is atrocious (massive leaking that just seems like stereo repeat with subtle attenuation, which results in awful positioning) and the headphones mode just sounds like crossfeed. And it's not any better in-game either. Sector's Edge is the only game I know that pulled off the whole shebang (proximity, 3D HRTF, reverb, occlusion, etc).

On a side note, apparently UT2004 requires an OpenAL patch to fix voice chat, so I wonder if it could be fixed on OpenAL Soft's end. @kcat Have you looked into this by any chance?

kcat commented 1 month ago

I thought this feature was exclusive to EAX 5.0 games but Raven Shield is EAX 3.0 and Thief 2 is EAX 2.0 so is the point to find out whether this works in any EAX with voice chat or something more specific?

Depending on how the effects are being applied, it should be able to work with any EAX version. If all it's doing is using capture to record voices from users, it would simply play what it's capturing as any other sound. I don't know what "EAX Voice" is, whether it's some hardware feature or something added to some middleware, but the idea of playing back voice chatter in 3D with environmental effects is not much different than any other 3D sound. Just capture in mono, and stream it back out as a mono buffer (and stream it to other players for voice chat), which can be treated as any other mono sound. The only difference is the audio source being a capture device or a stream from another player instead of a file, all the same 3D effects can still apply. You could even filter out the direct path of the local player only, to only hear your own reverberation in game while other players get both the direct path and reverb of your voice. Though that would sound odd if you make recordings, your direct capture isn't being mixed into the recording so the video would also only have the reverb and not your direct voice.

On a side note, apparently UT2004 requires an OpenAL patch to fix voice chat, so I wonder if it could be fixed on OpenAL Soft's end. @kcat Have you looked into this by any chance?

Those are some odd changes. Allows modifying a buffer while it's attached to a source (out of spec, and pretty dangerous). Adds a buffer query to get a source ID that it's attached to, even though a buffer can be attached to multiple sources. Not sure where that comes from since it doesn't seem to be used... maybe an old query that was being contemplated but never made official, and UT2k4 used despite never being official? And also a couple hacks to always return 9 queued buffers for non-playing sources, and return all buffers processed for sources in an AL_INITIAL state. And for some reason reduces the requested number of capture samples by 1/4th (which only affects the minimum number of samples the device will be guaranteed to hold; it can still hold more, and it doesn't change how often the app can read the samples, it would only potentially make the device overrun quicker).

mirh commented 1 month ago

I thought this feature was exclusive to EAX 5.0 games but Raven Shield is EAX 3.0 and Thief 2 is EAX 2.0 so is the point to find out whether this works in any EAX with voice chat or something more specific?

The "Microphone Environment FX" part of eax voice is going to work with any game that has EAX reverb, that is already mentioned (on EAX4+ it should also support Multi-Environment® but I digress).

What I was really bewildered about was the other feature, which is the so called "3D voice over ip". Of very interesting note that supposedly it may only have worked in LAN matches.

I can't wait for Mumble to get OpenAL.

I would hold your horses for that. While I guess that's one way things could be smoothed out, that doesn't seem required or even necessarily preferable. Even though now that I think to it, having some kind of openal-awareness could probably simplify a lot writing plugins (at least for those games that use it, that is).

The positional surround mix is atrocious

https://github.com/mumble-voip/mumble/issues/1933 Mhh, ok, I see there's really a lot that they have to improve.

On a side note, apparently UT2004 requires an OpenAL patch to fix voice chat

That's interesting and might or might not be related.. A patch straight from creative certainly sounds like the place you could expect this. On the other hand I couldn't find much inside the v3369 ALaudio.dll, except of checking for ALC_EXT_CAPTURE (of course) and then the existence of UseSpatializedVoice and SpatializedVoiceRadius settings.

EDIT: uh, btw, are these quirks still a thing?

ThreeDeeJay commented 1 month ago

@kcat Then was that patch just a bunch of desperate (even some useless?) attempts to fix it or just the adequate solution to an insane OpenAL implementation? I'm not sure if this also affects other libraries, though. So could OpenAL Soft implement a proper fix or would that break other games, unless the patch only applied to UT2004.exe or a game_compat flag? I'm just worried we might need to stick to that old patch for proper OpenAL and avoid VC crashes in that game

@mirh I hope this applied the other players' reverb locally instead of transmitting the voice with the reverb baked-in, because I doubt possibly lossy compression would've done reverb quality any favors

kcat commented 1 month ago

@kcat Then was that patch just a bunch of desperate (even some useless?) attempts to fix it or just the adequate solution to an insane OpenAL implementation?

Don't know. I imagine at least some of it is trying to work around bugs in the game that relying on non-standard behavior. But how much of it is actually necessary, I can't say (I don't imagine reducing the sample count in alcCaptureOpenDevice is).

So could OpenAL Soft implement a proper fix or would that break other games, unless the patch only applied to UT2004.exe or a game_compat flag? I'm just worried we might need to stick to that old patch for proper OpenAL and avoid VC crashes in that game

The changes would very likely break other games, as it assumes a buffer queue size for non-playing sources, and returns incorrect processed buffer counts for AL_INITIAL sources. And opens up the possibility of invalid memory access if a buffer is loaded with new samples while in use.

mirh commented 1 month ago

@zenakuten

zenakuten commented 1 month ago

The patch for UT2004 voice chat was highly specific to UT2004 patch 3369. Since the game is a closed source commercial app, there was not much hope with fixing the bugs inside the engine. It is absolutely a hack and should not be used with other games. It was made by creating a debug build of OpenAL, setting breakpoints in Visual Studio, and inspecting the audio buffers and hard coding some values based on what the UT2004 engine was doing vs what was expected inside OpenAL. The UT2004 engine was expecting and ignoring a certain buffer size, which is where the hardcoded '9' comes from. It was also using byte size instead of sample size when allocating a buffer, which is where the divide by 4 came from. It was bad coding in UT2004, which somehow worked in older versions of OpenAL, but were broken since.

ThreeDeeJay commented 1 month ago

@zenakuten Was it still using bad code with the X-Fi/EAX 5.0 patch by Creative? https://web.archive.org/web/20060716054224/http://images.soundblaster.com/downloads/SBXF_UTPATCH_US_3369.exe

zenakuten commented 1 month ago

Not sure, I never tested with Creative's build. The voice chat issue was only with the 64 bit build of UT2004. I'm not sure Creative released a 64 bit build. Voice chat worked fine with the 32 bit version.

mirh commented 1 month ago

Oh, I thought that 3369 was the creative's patch build number, instead it's just the last one generically. In that case fellas, not sure how to tell you.. But it's not like the sources of your closed commercial program are that hard to find. As far as voice is concerned I could for example notice that they aren't enabling PACK_SPEEX_TO_EIGHT_BYTES in win64 builds.


Back to us though, a lightbulb turned on in my mind. And after browsing the code (to be sure it isn't the one of the EAX patch, but still), well.. it really seems like UT2004 did have voice spatialization ready already in the base game, provided that you enabled that ALAudio setting I mentioned in my previous message.

Though I believe that the server only signals it can accept the extra information, if the actor's "active room" is local. Something that in turn I think should correspond to bAllowLocalBroadcast (let alone that somehow its "local voice chat channel to broadcast to all players in the immediate vicinity" description was translated as "enable spatialization on server" in korean). As you can see Engine.VoiceChatReplicationInfo also holds a lot of pretty pertinent entries. Ok nevermind, it seems like the boolean that should have satisfied the check for the only place where VOICE_AllowSpatialization gets ever set on the server was never implemented. So, given it sounds so stupid to write all of this and then leaving it as dead code, I can only remotely assume/hope that the X-Fi patch was needed to finally unlock this power (note: LocalBroadcastRange and DefaultBroadcastRadius should still work in the local channels, even without "directionality"). This might also explain why I couldn't find anything caring for LAN play (except audio codec selection, but that shouldn't matter). Somebody please help testing the hypothesis please >__>

With all this said then (and assumed for the sake of the argument) it appears that 3D voip is independent from eax and eax voice. Conversely, the fact that EAX voice could integrate into some more or less pre-existing voip pipeline means they are really interacting with games as opposed to just keeping everything in-driver (and that they weren't joking when they said enemies could hear you because your voice is added inside the game). Like, they make it seems like they attach "effects primitives" to the transmission (of the normal voice) which are then rendered on each near supporting client.

Could it really be that you can insert in the sound stage extra sound sources from the outside (and safely)? It seems so crazily advanced, and even altogether odd tbh (because we'd be talking about the client deciding what data to send to the server). ..or it could just be the mother of all vendor locks and marketing bullshit? (i.e. it's not that you need EAX4 because the "environmental properties" have to be decoded by the receiver and you need multi-environment not to disrupt the effects you are already listening to.. but it's just that creative decided and pretended so) Seriously, test plz.

Fun fact: there's no mention whatsoever about EAX5.0 into the X-Fi patch (while there's a reference to a certain CISACTAudioDrv)