kcat / dsoal

A DirectSound DLL replacer that enables surround sound, HRTF, and EAX support via OpenAL Soft
GNU Lesser General Public License v2.1
377 stars 50 forks source link

Bring native hardware OpenAL support for DSOAL #99

Open Svyatpro opened 8 months ago

Svyatpro commented 8 months ago

As I noticed native OpenAL with supported hardware significantly reduces latency in titles that are using DirectSound. It was possible to improve latency using Creative Alchemy in many DSound games but unfortunately Alchemy does not include 64-bit binaries.

It could be very good if DSOAL will acquire native OpenAL support.

mirh commented 8 months ago

I'm not sure what you are trying to suggest, using ct_oal for some supposed advantage?

Svyatpro commented 8 months ago

I'm not sure what you are trying to suggest, using ct_oal for some supposed advantage?

I guess yes, the same way as Alchemy does. It definately has its benefits comparing to software DirectSound/OpenAL

mirh commented 8 months ago

I'm not sure you know what you are talking about if, among other things, you are bunching together software directsound and software openal. Not that I can claim to have measured or compared latency (and I know there's still a bit of room for improvements) but are you actually talking about tests or just vibes?

kcat commented 8 months ago

The main problem is that Creative's hardware drivers don't feature much beyond standard OpenAL (the biggest extensions they have is EAX and EFX). And standard OpenAL has some shortcomings that makes wrapping DirectSound difficult; namely, updating buffer samples during playback. The app has to flag which dsound buffers won't dynamically update its buffer data (which the app can't even specify prior to DS8), and without being told, the assumption is that it can update at any time. The hardware drivers don't support AL_EXT_STATIC_BUFFER or any other extension that allows writing into the buffer during playback, so it would have to do something like use a streaming buffer queue for each source, which isn't very robust or efficient.

There are also issues with multiple contexts that most implementations don't support, and no safe, efficient way to deal with multiple asynchronous devices.

I'm not sure how much, or even if, ALchemy tries to deal with these issues, or how many game-specific hacks or internal/non-public interface calls it may use to work around issues and implement the more troublesome DSound features.

mirh commented 8 months ago

FWIW these were your average per-game options, I don't remember then if they couldn't have something more customized

Buffers=4
Duration=25
MaxVoiceCount=128
DisableDirectMusic=FALSE
Svyatpro commented 8 months ago

I'm not sure you know what you are talking about if, among other things, you are bunching together software directsound and software openal. Not that I can claim to have measured or compared latency (and I know there's still a bit of room for improvements) but are you actually talking about tests or just vibes?

I am that hardcore gamer so I know what I am saying. The latency is noticably decreased using Alchemy in CS1.6 and CS GO. The settings for Source and GldSrc engines are: Buffers=5 Duration=10 MaxVoiceCount=128 DisableDirectMusic=FALSE

ThreeDeeJay commented 8 months ago

FWIW these were your average per-game options, I don't remember then if they couldn't have something more customized

Buffers=4
Duration=25
MaxVoiceCount=128
DisableDirectMusic=FALSE

@mirh Apparently there's a few more if using the Native OpenAL Renderer (ct_oal.dll driver):

;Number of buffers to use (2-10, default is 4)
Buffers=4

;Buffer duration (5-50, default is 25)
Duration=25

;Maximum number of voices to support (32-128, default is 128)
MaxVoiceCount=128

;Disable DirectMusic support (default is False)
DisableDirectMusic=False

;Disable native ALchemy support (default is False)
;Setting this to true will always use internal "Creative Software 3D Library"
DisableNativeAL=False

;Logging (only works in native mode)
;LogDirectSound=True
;LogDirectSound2D=True
;LogDirectSound2DStreaming=True
;LogDirectSound3D=True
;LogDirectSoundListener=True
;LogDirectSoundEAX=True
;LogDirectSoundTimingInfo=True
;LogStarvation=True

And there's an old DLL hex edit to enable verbose logging, though I'm not sure if it's different than the options above and I forgot where I found it (maybe Vogons).

mirh commented 8 months ago

Yes, I know, I was there too. But the other options have nothing to do with functinoality.

I am that hardcore gamer so I know what I am saying.

Compared to instead sticking dsoal?

Svyatpro commented 8 months ago

I guess wrapping sound to a hardware renderer is a non-standard way the most current gamers do not actually pay attention because it is a super hardcore trick. Unfortunately I have to stick to DSOAL because it is only DSound wrapper that support OpenAL and 64-bit. If there will be mmdevapi OpenAL wrappers it could be nice but I am not that good programmer to make such a project.

ThreeDeeJay commented 8 months ago

I guess wrapping sound to a hardware renderer is a non-standard way the most current gamers do not actually pay attention because it is a super hardcore trick.

Even if they did pay attention, most g*mers don't even own a hardware OpenAL-capable sound card lol Are there any post-X-Fi sound cards that use hardware instead of software emulation when using ALchemy/OpenAL? It's probably more practical to just upgrade the CPU if that's the bottleneck or hope newer games use GPU acceleration, which is apparently already supported by Steam Audio, though I'm not sure if the GPU is used by the Counter Strike 2 implementation:

Use multi-core CPU and GPU acceleration. Steam Audio can use multi-core CPUs to accelerate simulation of environmental effects. Steam Audio can also use supported GPUs for accelerating simulation and rendering of environmental effects, while ensuring that its audio workloads do not adversely impact other GPU workloads or frame rate stability. https://valvesoftware.github.io/steam-audio/doc/unity/index.html

Even then, apparently Escape from Tarkov had horrendous performance even with Steam Audio (which they even eventually replaced with Oculus Audio🤢)

Svyatpro commented 8 months ago

I guess wrapping sound to a hardware renderer is a non-standard way the most current gamers do not actually pay attention because it is a super hardcore trick.

Even if they did pay attention, most g*mers don't even own a hardware OpenAL-capable sound card lol Are there any post-X-Fi sound cards that use hardware instead of software emulation when using ALchemy/OpenAL? It's probably more practical to just upgrade the CPU if that's the bottleneck or hope newer games use GPU acceleration, which is apparently already supported by Steam Audio, though I'm not sure if the GPU is used by the Counter Strike 2 implementation:

Use multi-core CPU and GPU acceleration. Steam Audio can use multi-core CPUs to accelerate simulation of environmental effects. Steam Audio can also use supported GPUs for accelerating simulation and rendering of environmental effects, while ensuring that its audio workloads do not adversely impact other GPU workloads or frame rate stability. https://valvesoftware.github.io/steam-audio/doc/unity/index.html

Even then, apparently Escape from Tarkov had horrendous performance even with Steam Audio (which they even eventually replaced with Oculus Audio🤢)

I don't know what other gamers have I just think adding support for native OpenAL for DSOAL would be a great feature.

mirh commented 8 months ago

Everybody else in here is probably half of the "old games audio" scene you might even find out, you don't need explanation of importance. And "mmdevapi" is WASAPI, which I'm telling you openal-soft is already using. So, good if you can tell there are delays, but try not to be victim of the XY problem.

Svyatpro commented 8 months ago

No I meant wrapping WASAPI to OpenAL. Actually making mmdevapi.dll wrapper

mirh commented 8 months ago

This doesn't make any sense whatsoever.

Svyatpro commented 8 months ago

This doesn't make any sense whatsoever.

It makes. It offloads CPU because Creative made very good DSP processor but Microsoft abandoned hardware offloading for unknown reason.

mirh commented 8 months ago

Microsoft abandoned in-kernel audio processing for very obvious and sensible reasons Creative just made the *only* DSP processors still on the gaming market after sabotaging the entire competition CPUs in the year of the lord 2024 don't need audio getting offloaded from them in order to shine And wrapping fucking WASAPI with an application api like openal is bonkers (not just in the sense it's useless or pointless, but that it's logically inconsistent like asking a fish to fly). Let alone if your priority was just latency, which as is covered in my first link out the box is predominantly affected by the very conservative values periods and period_size have by default.

Svyatpro commented 8 months ago

So then why do we need network card offloading feature if we have powerful CPUs nowadays? I guess the sound latency on native OpenAL is lower because it has fast DMA access and much less overhead.

ThreeDeeJay commented 8 months ago

You can still get lower latency with OpenAL Soft(ware) using the WASAPI exclusive fork (works with apps/games using OpenAL and also DirectSound via DSOAL). I doubt DSOAL interfacing with OpenAL hardware would result in significantly lower latency, but ASIO probably would, if it's ever implemented. I'm not sure how WASAPI exclusive impacts CPU load, though. 🤔

kcat commented 8 months ago

There isn't an "mmdevapi.dll" as such. An app makes a call to COM (CoCreateInstance) to create the primary IMMDeviceEnumerator interface (which in turn is used to create IMMDevice and the rest from there), and COM uses the system registry to look up which DLL to load and create the interfaces with. There may technically be an mmdevapi.dll, but it's an implementation detail the app doesn't see. Those interfaces communicate with the system audio service (which is a separate system process) to handle the actual audio playback/capture.

It makes no sense to wrap mmdevapi/WASAPI over OpenAL, since the two have vastly different jobs. Even if you could override the COM query for the IMMDeviceEnumerator interface to return a custom implementation, OpenAL is primarily for 3D game audio, whose main task is panning sounds in 3D space and applying spatial and environmental effects for a game, while WASAPI is for system audio, whose main task is mixing/remixing discrete channels from the apps running on the system, and applying filters, for output. Forcing one to do the job of the other won't have good results.

Svyatpro commented 8 months ago

Thanks for explaining WASAPI internal mechanism btw ;-)

mirh commented 8 months ago

It seems a bit hasty (if not preposterous) to bring up ASIO and wasapi exclusive when even just the pre-existing knobs in alsoftrc weren't used.

ThreeDeeJay commented 8 months ago

Yeah, I know ASIO is still a bit far-fetched but ijs WASAPI exclusive is out there to try in case it can reach lower latency with better stability since high CPU load may cause skips when periods is set too low. Also, I noticed that if I set periods=1 and period_size=96, shared mode seems to reset it to the default values:

[ALSOFT] (II) Pre-reset: Stereo, *Int16, *48000hz, 96 / 192 buffer
[ALSOFT] (II) Post-reset: 7.1 Surround, Int16, 48000hz, 480 / 1056 buffer
[ALSOFT] (II) Post-start: 7.1 Surround, Int16, 48000hz, 480 / 1056 buffer

but exclusive mode doesn't, although it ends up using double / quadruple the value specified in the ini, not sure if that's a bug or that's just how it works 🤔

[ALSOFT] (II) Pre-reset: *Stereo, *Int16, *48000hz, 96 / 192 buffer
[ALSOFT] (II) Post-reset: Stereo, Int16, 48000hz, 192 / 384 buffer
[ALSOFT] (II) Post-start: Stereo, Int16, 48000hz, 192 / 384 buffer
mirh commented 8 months ago

Yes, of course it resets. https://github.com/kcat/openal-soft/blob/cb24fe6f28865dd82ad67f485a0ecee51e6fe2bb/alc/alc.cpp#L1036-L1037 And of course running more often and having less time to sleep raise cpu time requirements (speaking of which, disabling C-states may help with that latency). But even putting aside the potential wonders of IAudioClient3, what I meant was that it's useless to bring up audio backends when this whole thread is the most classic mistaking of "I have X problem" with "I want Y solution".