Open RennMekka opened 5 months ago
Hi.
The 16 channels for the 3rd order ambisonic mix that OpenAL Soft uses internally aren't 16 discrete outputs/speaker feeds. It needs to be decoded before being played back, and actually for a proper reproduction of a full 3rd order ambisonic mix, you need more than 16 speakers (you need more speakers than ambisonic channels to properly play it back) and those speakers need to be arranged in a suitable layout to sound good. The internal decoders OpenAL Soft has is for outputting to the supported channel configurations; mono, stereo, quad, 5.1, 6.1, and 7.1 (along with an experimental 7.1.4 output, and an uncommon 3D7.1 setup). It can also decode using HRTF (binaural/headphones), or convert to stereo-compatible UHJ. Alternatively, the ambisonic mix can be output directly without decoding or converting it, but that requires a suitable output API and an external decoder if it's to be listened to.
The primary problem with outputting the internal ambisonic mix on Windows is that WASAPI seems limited to 8 channels for 7.1 output. Which isn't enough for even 2nd order ambisonics that requires 9 channels, let alone the 16 channels for 3rd order. To provide the ambisonic channels to ASIO, you'd need JACK installed as well as a custom build of OpenAL Soft with the JACK backend enabled. Or alternatively, make a build of the latest Git version with the PortAudio backend enabled, which should be able to output to ASIO. You'll need the proper development stuff installed to build OpenAL Soft with support for JACK or PortAudio if you go that route.
If there's free public development headers and such available to build for using ASIO, I can think about maybe creating a backend to use it directly. Or maybe I can enable JACK or PortAudio for future Windows binary builds (lower priority than the normal Windows audio APIs so they have to be expressly selected in the config file) to have a way to output to ASIO. But to reiterate, the ambisonic mix needs to be decoded before being played on speakers, with an appropriate decoder designed for your particular speaker arrangement, regardless of how it gets to ASIO.
Thank you for the fast reply!
for a proper reproduction of a full 3rd order ambisonic mix, you need more than 16 speakers (you need more speakers than ambisonic channels to properly play it back)
Oh, then I've got something seriously wrong. I thought that the order defines the amount of speakers directly (3rd order = 3+1^2 = 16 speakers). So the current speaker presets all use 2nd order, but with way less than 9 "channels" as I normally would assume with a hexagonic for example (less than 6). What's the formula for this, how many speakers one can (or has to) address with full 3rd order 16 "channels"?
But to reiterate, the ambisonic mix needs to be decoded before being played on speakers, with an appropriate decoder designed for your particular speaker arrangement
So the problem is not, that I could not talk to ASIO, the problem ist the decoder itself. I thought that the decoder algorythm currently implemented is universal and that I only need a proper *.ambdec file for any kind of speaker matrix, where it defines the angle and distance for each speaker. I just realize, that's not that easy. So you have to write a separate decoder for each output config?
Oh, then I've got something seriously wrong. I thought that the order defines the amount of speakers directly (3rd order = 3+1^2 = 16 speakers).
The order defines the number of channels in the ambisonic signal, but those channels relate to the spherical harmonics of the soundfield, not individual output speakers. Technically the number of ambisonic channels is the minimum number of discrete output feeds necessary to retain the complete soundfield, but it's not good enough for listening to (you can decode to and encode from an appropriate 16 channel layout for 3rd order without loss, but when listened to you'll get audible "holes" in the output where certain directions will have noticeably reduced quality compared to others, because of the way the sound interacts with our ears/brain).
For example, full 1st order ambisonics uses 4 channels ((1+1)^2 = 2*2 = 4
), but to play it back properly with full sphere reproduction you need at least 6 speakers spaced out regularly, though it's more common to use an 8-channel cube layout for 1st order. The general rule of thumb is that more speakers provide better reproduction (within reason), which is why even though 6 speakers is suitable for full 1st order, 8 is considered better.
So the current speaker presets all use 2nd order, but with way less than 9 "channels" as I normally would assume with a hexagonic for example (less than 6). What's the formula for this, how many speakers one can (or has to) address with full 3rd order 16 "channels"?
The internal decoders (except for 7.1.4 and 3D7.1) are horizontal only. They ignore the height-related channels which cuts down on the number of ambisonic channels being decoded, reducing the number of output speakers needed for playback. For horizontal-only ambisonics, the channel calculation is order*2 + 1
, so 1st order has 3 channels, 2nd order has 5, and 3rd order has 7. In this way, horizontal-only 1st order can be properly played back over a 4-speaker quad layout, for example, which wouldn't otherwise be enough for full 1st order. There's a similar trick to cut down on the number of effective ambisonic channels by using mixed order, which reduces the vertical "resolution" compared to horizontal by dropping some but not all height-related channels (with a speaker arrangement and decoder that properly compensates for their exclusion).
As for figuring out how many speaker outputs you need to properly play back a given ambisonic order, I'm not sure if there's any hard and fast rules to it. Generally more is better, but avoid going overboard.
I thought that the decoder algorythm currently implemented is universal and that I only need a proper *.ambdec file for any kind of speaker matrix, where it defines the angle and distance for each speaker.
The ambdec file is the decoder. It specifies how to take the ambisonic channels produced by OpenAL Soft, and mix them for each output channel on a given output configuration. Currently OpenAL Soft can only handle ambdec files for the supported surround output configurations (quad, 5.1, etc), basically overriding the internal decoder for specific output configurations. If you want something different, you'll need to have OpenAL Soft output the ambisonic mix to an external decoder instead.
I do intend to add the ability to specify more generic output configurations for OpenAL Soft, to allow for decoders that output to a general assortment of channels unrelated to known configurations (e.g. a generic 8-channel output that's not 7.1, or a generic 16-channel output), but it's not possible for now.
Thank you for all the detailed descriptions!
I do intend to add the ability to specify more generic output configurations for OpenAL Soft, to allow for decoders that output to a general assortment of channels unrelated to known configurations (e.g. a generic 8-channel output that's not 7.1, or a generic 16-channel output), but it's not possible for now.
3D-surround is moving in that direction anyway. The max config for 3D-Spatial (Dolby, Microsoft) at the moment is 8.1.4.4. To hear sounds from below and above is crucial in the future of gaming, I believe. I don't think that it's necessary to have the max configuration, considering the expense/usage ratio, the 6.1.3.3 respectively the 6.1.4.4 should be sufficient (360°/60°=6 speakers, 60° is maximum angle between speakers, and for top and bottom I hope 3 speakers are enough too, because I don't like to mount too many height speakers). Do you think that we can have such a configuration (without HDMI and Receiver) in OpenAL-Soft in the not to near future?
3D-surround is moving in that direction anyway. The max config for 3D-Spatial (Dolby, Microsoft) at the moment is 8.1.4.4.
AFAIK, modes like 8.1.4.4 only work through headphones/virtual surround sound. From what I remember reading, anyway. Setting up bottom/ground/floor speakers would be pretty difficult for most people, likely even more difficult than ceiling speakers, since people sit pretty close to the floor and things like the sofa and end tables would get in the way (either preventing proper speaker placement, or blocking line-of-sight with the speaker). It's a lot easier to simulate it over headphones with HRTF, than using physical speakers.
From a software perspective, the main issue with configurations like that is lack of system support. Most audio APIs don't have labels for bottom speakers so there's no way to properly address them. Some APIs may let you access output channels without position labels, which leaves the order unspecified (there's no way to say which channel is which, leaving it to the user to fiddle with connections to assign them correctly). So far, the only audio API I'm aware of that allows specifying the appropriate bottom channels is the spatial audio extensions for WASAPI (not even base WASAPI can), which OpenAL Soft only has experimental support for.
That said, it wouldn't be hard for OpenAL Soft to at least support it internally. Such a decoder would even be better and more stable than the 7.1.4 one it currently has. But the availability of using it will be very limited.
@kcat I'm also interested in higher ambisonics output, but for storing it in WAV/AMB format via WaveWriter to use an external decoder/player like Virtual Home Theater or maybe just to have a more precise sound field when visualizing it to see if a game's actually using height or checking if there are reversed axes. Like here's a 1OA recording I made a while ago:
https://github.com/user-attachments/assets/85588878-d9a6-4224-b091-d19a8617dfe5
By the way, is it possible to output to multiple backends simultaneously? Because it'd be neat to be able to hear the game audio while we're recording HOA.
Also, I know this is probably far-fetched so I'm not sure if it should be a whole new issue, but how feasible would it be to record into an object-based format? Not necessarily a completely new format, since there's already LAF (someone's working on an OpenAL Soft-based player though we can already use Cavern), IAMF and MPEG-H 3D (which supports HOA though I'm not sure if it's a free format)
make a build of the latest Git version with the PortAudio backend enabled, which should be able to output to ASIO. You'll need the proper development stuff installed to build OpenAL Soft with support for JACK or PortAudio if you go that route.
~~What would we need to do to get OpenAL Soft to output via ASIO besides compiling with -DALSOFT_REQUIRE_PORTAUDIO=ON
?
perhaps this would come in handy?~~
Nevermind, figured it out: https://github.com/kcat/openal-soft/issues/682#issuecomment-2240910728
If there's free public development headers and such available to build for using ASIO, I can think about maybe creating a backend to use it directly.
Something like this? Apparently it just requires submitting a license agreement though
For ASIO support, download the ASIO SDK from Steinberg at http://www.steinberg.net/en/company/developer.html . The SDK is free, but you will need to set up a developer account with Steinberg. https://www.portaudio.com/docs/v19-doxydocs/compile_windows.html
By the way, is it possible to output to multiple backends simultaneously? Because it'd be neat to be able to hear the game audio while we're recording HOA.
No, unfortunately. Can't output to multiple devices in one backend either. Best you could do is set up some kind of virtual splitter cable that can forward the samples to the ambisonic decoder device, and passes a copy to another virtual device that outputs to a file on disk. Or maybe recording software like OBS can capture the audio from the app (making sure it doesn't try to remix/upmix/downmix the channels and keeps them as-is in the recording).
Also, I know this is probably far-fetched so I'm not sure if it should be a whole new issue, but how feasible would it be to record into an object-based format?
It would take a fair bit of work. The mixer would have to be redesigned to allow individual sources/source channels to output to their own channels instead of mixing into the main ambisonic buffer, somehow also passing along the position data. Voice/channel limits would also become an issue, something OpenAL Soft hasn't really had to worry about but these object-based audio systems do.
It's something to keep in mind, but until a practical option becomes available, I don't know if the effort would be worth it.
Not necessarily a completely new format, since there's already LAF (someone's working on an OpenAL Soft-based player though we can already use Cavern)
That is extremely interesting. Particularly the point about "A PCM block is created for each second of audio. [...] For object-based exports, extra PCM tracks are used for object movement data." Does that mean the dynamic object positions only update once a second? That doesn't seem correct, given the atmos demo (and that video's conversion/rendering of it) seems to update more frequently/smoothly. Unless the idea is the sound moves over the course of that second, which would make it more practical to deal with, though pose problems for outputting it in real-time. Quick fine movements would be lost, and there may be a delay in the position updates.
Still, LAF looks like a promising format that OpenAL Soft could play back in some fashion, at least (that video mentions it used their "wip audio renderer based on OpenAL Soft", which I'm not sure what precisely that means; a modified OpenAL Soft, or the guts of OpenAL Soft were ripped out and put into something else, or something on top of an unmodified OpenAL Soft?). And though outputting dynamic channels isn't currently practical, OpenAL Soft could output static channel layouts, including esoteric ones like 3D7.1, 7.1.4.4, etc, since it deals with positions instead of labels. And if Atmos and similar formats are known well enough to convert to it to LAF, then... hmm...
Some LAF samples (and/or Atmos samples that can be converted to LAF) would be nice to look at and play around with.
Something like this? Apparently it just requires submitting a license agreement though
Looks like it could be a problem for the LGPL. Or at least a problem for people making their own OpenAL Soft builds that want support for it.
@kcat here's response from the LAF dev:
The updated positions are written to extra PCM tracks, which means, the update frequency is dependent on the sample rate. One such track holds positions for 16 objects (for the next 16 or less, another track is created, and so on), and one update takes 3 samples (X, Y, Z coords with the precision of the selected bit depth). This results in an object update every 48 samples, so 1000 updates every second at 48 kHz, which is 32 times faster than Atmos.
The updates happening every second are related to track availability blocks. They disable PCM export for tracks containing only silence for the next second. Objects playing audio are re-enabled. This saves a lot of space when dealing with hundreds or thousands of objects that rarely play audio.
For a reference implementation of exporting already created playback environments, see LimitlessAudioFormatEnvironmentWriter.cs.
BTW forgot to mention someone else made an object-based audio scene editor/player, also using OpenAL Soft (app+sample, WASD to move, QE for height or enter/exit reverb area) that uses an XML-based format to store metadata for listener and emitter positions over time and even environment effects, which I think would get lost in translation to LAF.
Some LAF samples (and/or Atmos samples that can be converted to LAF) would be nice to look at and play around with.
Here's Universe Fury and Test Tones. Generally, LAF files are converted from Dolby Atmos (Master or Digital Plus, though not TrueHD), which you can get from here or here and convert to LAF using Cavernize+ffmpeg then you can play it in Cavern
Looks like it could be a problem for the LGPL. Or at least a problem for people making their own OpenAL Soft builds that want support for it.
This person suggests it's ok to distribute binaries with ASIO as long as the SDK itself isn't included (in the codebase or release binaries, I presume), but not according to this GPL project. However, FlexASIO downloads the SDK during CI and after complying with license requests, it seems to be in the clear, though it's (modified) MIT. So are flexible licenses the reason why some projects are allowed to distribute binaries of ASIO-based projects? Or perhaps all it takes is to reach out to Steinberg to get explicit permission while adhering to their license requests? I found interesting info in other projects like JUCE, jamulus and GrandOrgue.
WASAPI Exclusive might be easier to implement since there's already a proof of concept and fixes, though in my experience, it seems to have similar latency to ASIO, but it's more prone to crackling at low buffer sizes.
BTW Have you looked into WDM/Kernel Streaming? I think it's what ASIO uses so we might as well just cut the middle-man 🤔
@kcat here's https://github.com/VoidXH/Cavern/issues/186#issuecomment-2241539703:
Hmm, that seems a bit of an obtuse way to encode the position over time, interleaving it between each sample frame. Seems rather limiting (:P) as well, if you wanted to use proper compression like FLAC or Opus instead of raw PCM samples (which any user-oriented format should handle) since you presumably don't want to compress position vectors and you can't put them between PCM frames when it's compressed like that. Also unfortunate that the format doesn't specify sizes for headers; I guess you just have to hope a custom header won't have the words TAGS
or HEAD
for some reason (and you have to read all the custom headers, even the ones you don't handle, since you have no way to know where the HEAD
marker will appear). There's also no accommodation for ensuring the sample data is aligned, making reading less efficient.
The more I look at it, the less I really like the format without some updates to support proper compression and have more robust structure.
As for playback with OpenAL Soft, it seems the "proper" way to do it would be with a loopback device where you control the number of samples rendered between updates to source positions. Otherwise, the best that can be done is to keep positions buffered as they're read in, and continuously check the source offset and set the proper positions based on the current time being processed for each source. Pretty crude and inexact, prone to late or missed movements, but it's probably good enough for just listening to most things. More sophisticated options for specifying source movement at set times would be needed for being more precise and accurate with normal playback.
This person suggests it's ok to distribute binaries with ASIO as long as the SDK itself isn't included (in the codebase or release binaries, I presume), but not according to this GPL project. However, FlexASIO downloads the SDK during CI and after complying with license requests, it https://github.com/dechamps/FlexASIO/issues/222#issuecomment-2032060881, though it's (modified) MIT. So are flexible licenses the reason why some projects are allowed to distribute binaries of ASIO-based projects?
Perhaps. I think it would probably depend on what exactly is in the headers, and whether there's any free/libre alternatives to the official ASIO headers. You can build against non-free headers and still be GPL-compliant, otherwise you couldn't distribute builds of (L)GPL code made with MSVC using MS's headers (the system library exception I don't think says anything about headers, just the DLLs), but MinGW comes with a mostly complete and compatible free/libre alternative to the official C/C++ Windows SDK so users have options to build the project without relying on non-free stuff, even if it's not the same exact/official SDK.
BTW Have you looked into WDM/Kernel Streaming? I think it's what ASIO uses so we might as well just cut the middle-man 🤔
I didn't think that was still a thing after Windows XP. If it is, it's probably internal Windows stuff and not stable (as in API/ABI stable) across versions, and why there are things like ASIO and WASAPI exclusive mode to reduce the overhead and provide a stable interface.
Perhaps. I think it would probably depend on what exactly is in the headers, and whether there's any free/libre alternatives to the official ASIO headers.
Sadly I've looked at several projects and I'm yet to find a FLOSS alternative to ASIO.
You can build against non-free headers and still be GPL-compliant, otherwise you couldn't distribute builds of (L)GPL code made with MSVC using MS's headers (the system library exception I don't think says anything about headers, just the DLLs), but MinGW comes with a mostly complete and compatible free/libre alternative to the official C/C++ Windows SDK so users have options to build the project without relying on non-free stuff, even if it's not the same exact/official SDK.
It seems the key here is Steinberg's license directly conflicting with GPL, like in the case of Audacity:
If ASIO support were distributed in Audacity builds this would either violate Steinberg's licence agreement if the code were included, or conversely would violate Audacity's GPL Licence if the code were withheld. There are persistent rumours of Steinberg opening up licensing, but without any apparent movement. https://manual.audacityteam.org/man/asio_audio_interface.html
So if FLOSS isn't a viable option, would OpenAL Soft's license allow not including ASIO SDK code/headers in the repo while still providing binaries that do contain ASIO (added only during CI)? Perhaps @dechamps could comment on whether it'd be feasible to add ASIO here like in FlexASIO 👀
I didn't think that was still a thing after Windows XP. If it is, it's probably internal Windows stuff and not stable (as in API/ABI stable) across versions, and why there are things like ASIO and WASAPI exclusive mode to reduce the overhead and provide a stable interface.
ASIO4ALL uses it and supports "any Windows OS since Win98SE". Also it has a C++ API and sample
ASIO4ALL is a hardware independent low latency ASIO driver for WDM audio devices. It uses WDM Kernel-Streaming and sometimes even more sophisticated methods to achieve its objectives. https://asio4all.org/about/
Audio Repeater can also use Kernel Streaming to get low latency when using a virtual audio device for HeSuVi
The primary problem with outputting the internal ambisonic mix on Windows is that WASAPI seems limited to 8 channels for 7.1 output.
I don't think that's true. I remember dechamps/FlexASIO#100 from a few years back where it was demonstrated that WASAPI is able to open a 10-channel output device just fine, and you can check that for yourself using emulated software WDM drivers such as Virtual Audio Cable. IIRC that is true even in shared mode, as long as the channel count is correctly configured in the Windows control panel settings for the audio device.
Of course, the bigger issue is that actual WDM audio devices that expose this many channels seem to be extremely rare - most many-channel audio interfaces typically expose pairs of channels as Windows audio endpoints, such that you end up with N/2 stereo endpoints, not a single N-channel endpoint. With these devices, typically the only way to get them to expose a single device with N channels is to use ASIO.
Or maybe I can enable JACK or PortAudio for future Windows binary builds (lower priority than the normal Windows audio APIs so they have to be expressly selected in the config file) to have a way to output to ASIO.
I would strongly recommend against enabling ASIO support in PortAudio due to PortAudio/portaudio#696. Especially in a library. You would run the risk of compromising the integrity of the application process, even if the user does not intend to use ASIO at all. I would strongly recommend using ASIO directly instead, without going through PortAudio.
BTW Have you looked into WDM/Kernel Streaming? I think it's what ASIO uses so we might as well just cut the middle-man
It's not really correct to say that "ASIO uses WDM/KS". Some ASIO drivers may be implemented using WDM/KS internally (most notably ASIO4ALL and FlexASIO in WDM/KS mode, but it's conceivable some manufacturer-specific ASIO drivers do that as well), but there is nothing stopping anyone from developing an ASIO driver that interfaces with the hardware in a completely different way.
BTW Have you looked into WDM/Kernel Streaming?
I didn't think that was still a thing after Windows XP.
It is definitely still a thing - it is what the Windows Audio Engine (which runs in user mode) uses to talk to the underlying WDM audio device driver. It forms the interface between user mode and kernel mode for audio.
If it is, it's probably internal Windows stuff and not stable (as in API/ABI stable) across versions
It's complicated. Nowadays, Microsoft mostly sees WDM/KS as the interface between the Windows Audio Engine (i.e. WASAPI) and the WDM audio device driver. Microsoft does not advertise nor recommend WDM/KS as an application API, but in practice there is nothing stopping an application from using WDM/KS directly. I doubt Microsoft will break applications using WDM/KS, because that would mean breaking ASIO4ALL, which is quite popular, therefore such a move would piss off quite a lot of Windows users.
The WDM/KS ABIs are documented and very stable, because it is the ABI that WDM audio device drivers have to implement. Microsoft cannot break these ABIs without breaking audio drivers. That said, there have been cases in the past where Microsoft introduced new WDM/KS "modes" (the most prominent example being WaveRT, introduced in Vista), and over time new WDM audio device drivers dropped support for pre-Vista and ended up only supporting WaveRT, breaking any WDM/KS client code that did not support WaveRT. These things take many years to happen though.
BTW Have you looked into WDM/Kernel Streaming?
I wouldn't recommend attempting to write a WDM/KS client. WDM/KS is an atrociously complicated and error-prone API with lots of device-specific sharp edges (e.g. workarounds for driver bugs) that require extensive hardware testing to sort out. To give an idea, the PortAudio WDM/KS client is 7 KLOC and despite that is still quite buggy/unreliable. I would strongly recommend using WASAPI Exclusive instead, which is basically just a wrapper around WDM/KS in practice (I mean that quite literally - if you use WASAPI Exclusive, your app ends up making WDM/KS calls directly from the application process behind the scenes).
So if FLOSS isn't a viable option, would OpenAL Soft's license allow not including ASIO SDK code/headers in the repo while still providing binaries that do contain ASIO (added only during CI)? Perhaps @dechamps could comment on whether it'd be feasible to add ASIO here like in FlexASIO 👀
I am not a lawyer so am just speculating here, but my understanding is you are bound to Steinberg licensing terms as soon as you use the ASIO SDK in any way. I believe one possible way to work around that could be to not use the ASIO SDK at all but instead enumerate and call ASIO drivers directly. My understanding (again, I am not a lawyer) is that this is protected activity in many legal jurisdictions, where providing interoperability is a valid legal excuse for using APIs you don't own (see also this famous case). I know of several ASIO Host Applications that do not use the ASIO SDK anywhere in their build process: they just enumerate drivers in the same way the ASIO SDK would (it's basically just listing some entries in the Windows Registry), and then instantiate ASIO driver COM classes directly and call the relevant methods using their own declarations in their own headers. NAudio is one such example. This is not particularly difficult to do: the code in the ASIO SDK is very small and trivial to reimplement.
I am not a lawyer so am just speculating here, but my understanding is you are bound to Steinberg licensing terms as soon as you use the ASIO SDK in any way. I believe one possible way to work around that could be to not use the ASIO SDK at all but instead enumerate and call ASIO drivers directly. My understanding (again, I am not a lawyer) is that this is protected activity in many legal jurisdictions, where providing interoperability is a valid legal excuse for using APIs you don't own (see also this famous case).
Yeah, as far as I understand (also as a non-lawyer), APIs aren't copyrightable. As long as new headers and such can be created in a clean room manner, whatever license there may be with the original SDK isn't relevant to code that doesn't use it and uses clean alternatives instead. And projects like wineasio provide a FLOSS driver implementation that the headers would provide access to, so it wouldn't be the case that the interface could only be used for proprietary/closed source implementations either.
What could be an issue is using the ASIO names/trademarks where not otherwise necessary, though. MESA used to have a similar issue, where even though they were effectively an OpenGL implementation, implementing their own headers and libs with gl*
functions and GL_
macros as needed for API/ABI compatibility, they couldn't actually say they were an OpenGL implementation without permission by the trademark holder (which needed certification). So I could likely add an ASIO-compatible host backend, but I don't know how far I could go in calling it an ASIO backend without permission.
I know of several ASIO Host Applications that do not use the ASIO SDK anywhere in their build process: they just enumerate drivers in the same way the ASIO SDK would (it's basically just listing some entries in the Windows Registry), and then instantiate ASIO driver COM classes directly and call the relevant methods using their own declarations in their own headers. NAudio is one such example. This is not particularly difficult to do: the code in the ASIO SDK is very small and trivial to reimplement.
That is informative, and may help me implement my own interface header. Having sample code that uses the API to play/record audio would be helpful too (I tried looking at PortAudio's source, but I couldn't find the code for their ASIO backend).
Having sample code that uses the API to play/record audio would be helpful too
There's some in the ASIO SDK, but obviously it assumes you're going through the ASIO SDK "glue layer" (i.e. the small shim between the app-facing ASIO SDK API and the stable driver-facing ASIO API/ABI - a very small amount of code since the two are very similar). Another thing you can do is run some sample ASIO app (such as the one in the ASIO SDK) with FlexASIO with logging enabled - the FlexASIO log will show every single call going into a driver from the perspective of the driver API (i.e. what you would have to code against). If and when you get serious about this I would also recommend reading this.
I tried looking at PortAudio's source, but I couldn't find the code for their ASIO backend
It's here, but again this is going through the ASIO SDK client API, not the driver API.
@dechamps Is there any significant functional difference between WASAPI exclusive, WDM-KS and ASIO/4ALL in terms of...?
Asking because someone already managed to force OpenAL Soft to use WASAPI in exclusive mode but it was an unfinished/outdated implementation so I wonder if it would be more feasible/practical compared to ASIO/WDM-KS. Besides being limited to 10 channels, which isn't enough for what the OP was asking for, unless it's indeed possible to output to to 8-channel devices simultaneously 🤔
On a side note, in case we go off-topic here, we also started a latency-focused discussion here https://github.com/kcat/openal-soft/issues/682
Is there any significant functional difference between WASAPI exclusive, WDM-KS and ASIO/4ALL in terms of...?
- Latency - I think ASIO provides lowest latency but I haven't taken objective measurements
It really depends on the quality of the respective drivers. I don't think ASIO in itself makes it possible to achieve latencies that are lower than what you could achieve with a good WDM driver implementation. Indeed modern WDM/KS in WaveRT packet mode (which also describes WASAPI Exclusive, since it's just a wrapper around that) is basically the same thing as ASIO: a double buffer allocated on the driver side with a notification mechanism. So if there are any latency differences between the two, my money is on the manufacturer not spending an equal amount of time/money/effort on their ASIO driver vs. their WDM driver, not fundamental limitations of the framework used. Or it could also be the app's fault: the app having better ASIO client code than WASAPI client code (though that seems less likely as it's harder to get wrong).
- Fidelity - I've read ASIO/4ALL may not be bit-perfect like WASAPI exclusive
I'm not intimately familiar with ASIO4ALL, but my understanding is it just wraps WDM/KS. That's bit-perfect in the exact same way WASAPI Exclusive is: the buffer goes directly to the WDM driver without any alteration. Now if there are features in ASIO4ALL where it would mess with the signal in some way that's a different story, but as far as I know there aren't any.
This is something that can easily be verified: just use a purely digital loopback connection, e.g. a S/PDIF loopback or a purely software WDM driver such as Virtual Audio Cable. Since it's purely digital, the bits you get on one end should be strictly identical to the bits you sent on the other end. This will always be true as far WDM/KS and WASAPI Exclusive are concerned.
- Stability - I've had BSODs with AudioRepeater KS
Yeah BSODs with WDM/KS are not too surprising. In theory they are never supposed to happen assuming bug-free WDM drivers, but WDM drivers like any other software are not completely bug-free. More importantly, I suspect audio WDM drivers are only tested with the Windows Audio Engine as a client (and hopefully WASAPI Exclusive), not niche/unusual clients like ASIO4ALL, PortAudio WDM/KS or this "AudioRepeater" thing you mentioned. This means that when using these third-party WDM/KS clients you may be triggering code paths in the WDM driver that the Windows Audio Engine would never exercise. At this point you end up in untested territory, and a bug in a Windows kernel-mode driver can often lead to a BSOD.
To be clear this is never the client's fault - BSODs are always the fault of the driver, period - but unusual WDM/KS clients can be particularly effective at finding edge cases where the driver is not handling some inputs/states correctly.
Asking because someone already managed to force OpenAL Soft to use WASAPI in exclusive mode but it was an unfinished/outdated implementation so I wonder if it would be more feasible/practical compared to ASIO/WDM-KS.
I would always recommend using WASAPI if you can. It is the modern API that is recommended by Microsoft. It is more user-friendly and likely more reliable than ASIO (which requires the user to fiddle with often poor-quality drivers), and it is much, much easier to use correctly than WDM/KS.
Besides being limited to 10 channels, which isn't enough for what the OP was asking for, unless it's indeed possible to output to to 8-channel devices simultaneously
As I said earlier I don't think WASAPI itself is limited to 10 channels, especially in exclusive mode. The real challenge is finding a device that will expose 10+ channels on a single endpoint from its WDM driver.
So are flexible licenses the reason why some projects are allowed to distribute binaries of ASIO-based projects?
Yes. A final GPL binary cannot forfeit any of the freedoms. Though it would still be legitimate for all the needed integration in the source code to be made, with the final users then being responsible for providing the proprietary SDK and compiling everything for their own private use. EDIT: see also audacity
providing binaries that do contain ASIO (added only during CI)?
That would be redistribution, and it would still violate the GPL. (unless you went the api reimplementation way suggested afterwards)
otherwise you couldn't distribute builds of (L)GPL code made with MSVC using MS's headers (the system library exception I don't think says anything about headers, just the DLLs)
To link to the dlls.. You need said headers? (which tend to have a pretty permissive license anyway)
Sadly I've looked at several projects and I'm yet to find a FLOSS alternative to ASIO.
I mean, it is really all just about some kind of api existing to get your waves to the sound card drivers and outputs (fun fact: before openal 1.1 and the soon-to-be-released Vista, the whole thing was just a cross-platform abstraction that on windows just used the normal native directsound even on Audigys). And admittedly, in the year of the lord 2024, asio sucks quite a bit considering the relative seamlessness of not just WASAPI exclusive but IAudioClient3 (on top of that, short of having really high impedance headphones.. I start to doubt even as far as quality is concerned that there is much to gain from a dedicated DAC or whatnot).
FlexASIO (or at least the principles behind it, without having to rely on the steinberg's interfaces as if this was a legacy windows XP application) is where bad bitches seem they should be at today. Especially after the big information drops that were just landed here. Like, for real.. Do we even have any evidence that using WASAPI exclusive has any downside compared to the other alternatives?
As I said earlier I don't think WASAPI itself is limited to 10 channels, especially in exclusive mode. The real challenge is finding a device that will expose 10+ channels on a single endpoint from its WDM driver.
What about the mysterious KSAUDIO_SPEAKER_DIRECTOUT
?
To link to the dlls.. You need said headers?
When building with MSVC, yes, that's my point. You don't need MS's headers to be able to link to system DLLs using MinGW as it comes with its own FLOSS headers, but when building with MSVC, you use MS's headers, which weren't always permissively licensed. So being able to build (L)GPL software with MSVC, and distribute binaries that used MS's proprietary headers, is apparently fine.
And admittedly, in the year of the lord 2024, asio sucks quite a bit considering the relative seamlessness of not just WASAPI exclusive but IAudioClient3 (on top of that, short of having really high impedance headphones.. I start to doubt even as far as quality is concerned that there is much to gain from a dedicated DAC or whatnot).
This is really where I'm standing. Admittedly, it's kinda fun to create my own interface to ASIO drivers to make them work, but I had been under the impression that ASIO was something more along the lines of JACK or CoreAudio, an audio server/service designed for low-latency audio production work, rather than giving apps direct hardware access. In the latter case, I'm not sure I see much benefit to it over WASAPI Exclusive mode if Shared mode has too much latency. API-wise, ASIO does have the nice feature of being callback-driven, but I don't think that's much of a benefit given the drawbacks (no real device selection, no channel labels or configuration detection, names being limited to ANSI strings, the sloppy way "direct output" is indicated, having to deal with individual driver quirks).
The only real utility ASIO would provide is having more than 8 or 10 output channels, but given how it's not designed for being a plugin-based system to be able to do something with those channels, and the awkwardness of making it output to a specific device that expects what's being output, it would really require some dedication on the part of the user to make useful. JACK for Windows includes an ASIO driver for connecting ASIO apps to a JACK server, where you have more freedom to route connections between apps, plugins, and devices, but at that point, why not just use JACK directly? Unless there's an alternative low-latency server with an ASIO driver that more preferred over JACK on Windows.
What about the mysterious KSAUDIO_SPEAKER_DIRECTOUT?
That just indicates the channels are unlabelled, so rather than saying "4 channels: front-left, front-right, back-left, back-right", it just says "4 channels", and it's completely up to the driver where they end up. That doesn't allow getting more channels than are specified, it just leaves it unspecified what the channels you have are (really, any channels not specified in the channel mask are "direct out", e.g. if you specify 4 channels with the front-left and front-right mask bits only, then the first two channels are front stereo, and the extra two go wherever the driver wants).
So being able to build (L)GPL software with MSVC, and distribute binaries that used MS's proprietary headers, is apparently fine.
Yes, but my main point was that's 100% covered by the system headers exception. Like, sure, mingw was eventually created.. but otherwise it's not like you had choices.
but I had been under the impression that ASIO was something more along the lines of JACK or CoreAudio,
I don't know about coreaudio, but AFAIR jack is far more properly maintained and coded. (or if not any, it's not a 1997 api that hasn't been seeing pretty much any functional change for at least a decade)
an audio server/service designed for low-latency audio production work, rather than giving apps direct hardware access.
You must appreciate how you couldn't really do the former without the latter, at least on older Windows. You needed some way out of the awful native apis (and be it not invented here syndrome, or be it a DAW manufacturer not wanting to touch any of the aforementioned device quirks, but even when WDM-KS was later introduced I think asio's quick and stupid api may ironically still have been the one least directly "involved" with the hardware)
In the latter case, I'm not sure I see much benefit to it over WASAPI Exclusive mode if Shared mode has too much latency.
As #682 notes, shared mode is actually WAY better in windows 10. Possibly not so much to make exclusive redundant, but still. But alas, as I complain there and in https://github.com/mumble-voip/mumble/issues/1604 the thing is that we really are lacking any kind of damn testing, to ascertain this.
Both to see just how good modern WASAPI now is (I wonder if they couldn't have cut another ms in W11 perhaps), but also to check if as far as "underground apis" go KS isn't actually the most that you'd ever really need (even though, who knows, the results may be different between shitty sound cards and professional ones and perhaps ASIO could still have a measurable advantage).
where you have more freedom to route connections between apps, plugins, and devices, but at that point, why not just use JACK directly?
I mean, JACK cannot do anything without relying on another api on windows...
Unless there's an alternative low-latency server with an ASIO driver that more preferred over JACK on Windows.
Considering what Etienne said (and considering that at least on <W10, KS truly seems the absolute best at least when it come to support) I guess it might indeed be worth looking into some lower level library.
That doesn't allow getting more channels than are specified
Uhm.. Then why cannot you just specify more?
You must appreciate how you couldn't really do the former without the latter, at least on older Windows.
Sure, it would've included some way to handle low-latency audio I/O, but I don't see that being too useful on its own without something to make use of it.
I mean, JACK cannot do anything without relying on another api on windows...
Yes, but that would be an implementation detail of JACK. The point I was getting at was that if the intent is to use something like JACK, to connect apps to filters/converts to devices and such, and you make an ASIO host app that would use an ASIO driver that's just a front for accessing that server, it would be better to use that server directly.
Uhm.. Then why cannot you just specify more?
Because you can't get more channels than the device provides. If the device is reporting 8 channels, you can't get 16. KSAUDIO_SPEAKER_DIRECTOUT
won't change that.
Do we even have any evidence that using WASAPI exclusive has any downside compared to the other alternatives?
@mirh In my experience, WASAPI exclusive often crackles when forcing latency/buffer low enough to match ASIO4ALL, which is why I prefer to use the latter in DAWs like Reaper and OmniMIDI. I don't recall that happening with the WASAPI exclusive fork, but then again I think the lowest sample size I could set without the app crashing was like 160 (and a bit lower on lower sample rates), compared to 64 in ASIO.
JACK for Windows includes an ASIO driver for connecting ASIO apps to a JACK server, where you have more freedom to route connections between apps, plugins, and devices, but at that point, why not just use JACK directly?
@kcat I gave it a shot, and maybe I messed up the setup but results weren't good. Details in https://github.com/kcat/openal-soft/pull/1033
Sorry, maybe I implied too much - of course I was hinting at IAudioClient3's WASAPI exclusive, not the original one. Without it, before W10 or with older drivers (is there some way to check this support, without programmatically asking for it?) there's no way in theory or in practice to match ASIO/KS.
Putting aside that on top of not having heard many KS vs ASIO comparisons, again even this bloody IAudioClient3 seems very badly "reviewed".
Because you can't get more channels than the device provides. If the device is reporting 8 channels, you can't get 16. KSAUDIO_SPEAKER_DIRECTOUT won't change that.
I'm really having a bad time understanding where else one would like to have sound be sent, if we don't want (or have) so many physical outputs - and even "virtual" ones aren't desired?
I'm curious what IAudioClient3
is supposed to achieve in regards to exclusive mode. It only adds 3 functions: GetSharedModeEnginePeriod
(which reports valid period values for the requested format in shared mode), GetCurrentSharedModeEnginePeriod
(which gets the current shared mode format and period size), and InitializeSharedAudioStream
(which allows initializing a shared mode stream with a new period size). I don't see how it would improve anything beyond being able to request different period sizes for the shared mode mixing server/service, and doesn't touch exclusive mode.
The main issue it seems with getting really low latencies with WASAPI exclusive is it being event driven. A period passes and the device is ready for more audio, and WASAPI will signal an event HANDLE
that an app's thread is waiting on via one of the WaitFor*Object*
functions. That thread then has to wait for Windows' scheduler to wake it up (and reset the event) before doing work to send audio to the device. A callback-driven approach would allow the app's processing function to run more directly in response to an interrupt from the audio device, being less reliant on userspace wait functions and the Windows scheduler to sleep and wake up in time.
I'm really having a bad time understanding where else one would like to have sound be sent, if we don't want (or have) so many physical outputs - and even "virtual" ones aren't desired?
The problem at issue was making WASAPI virtual devices have more channels. Despite being set up to take a 16-channel third-order ambisonic signal with a virtual cable, the device only reported 8 channels in a 7.1 configuration to apps querying it. So OpenAL Soft could try to create a steam that's 16 channels and specify KSAUDIO_SPEAKER_DIRECTOUT
/0
as the channel mask despite the mismatch, but if the device is only accepting 8 channels, that will either fail creating the stream, or lose half of the channels.
The main issue it seems with getting really low latencies with WASAPI exclusive is it being event driven. A period passes and the device is ready for more audio, and WASAPI will signal an event
HANDLE
that an app's thread is waiting on via one of theWaitFor*Object*
functions. That thread then has to wait for Windows' scheduler to wake it up (and reset the event) before doing work to send audio to the device. A callback-driven approach would allow the app's processing function to run more directly in response to an interrupt from the audio device, being less reliant on userspace wait functions and the Windows scheduler to sleep and wake up in time.
I have not looked into this in detail, but intuitively I would be very skeptical this would be a problem.
First, this doesn't take as long as you think it does. As long as the proper thread priorities are set and things are operating normally, the Windows scheduler will immediately schedule the thread as soon as the kernel signals the event. There is no "wait".
Second, I don't think anyone can actually implement a true "callback-driven" approach like you're describing, at least not on Windows. The kernel won't let you - you would end up calling into user space directly from the kernel, I don't see that happening. And even if you could, I'm not sure how that would be significantly faster than signaling an event and then immediately scheduling the thread waiting on the event (which is how it currently works). The only way to improve on that is to get rid of the context switch into user mode, and for that you would need to run the app logic itself in kernel mode (good luck).
The ASIO API is callback-driven on the surface, but in practice I would expect all ASIO drivers to be ultimately implemented by spinning up a thread, waiting for events in that thread, and firing the ASIO callbacks when the event is signaled. It looks like callbacks from the ASIO host app's perspective, but behind the scenes it's all ultimately based on waiting for some event from the kernel.
It may perhaps be possible to get this very slightly faster by having the thread spin on the event instead of waiting for it (to remove the context switch cost), but I doubt it'd be worth it.
I'm curious what IAudioClient3 is supposed to achieve in regards to exclusive mode. It only adds 3 functions
From a totally theoretical read of the documentation, it doesn't really seems that's up to you to bother or rethink.
My guess was that just like putting the right version string in the application manifest unlocks the same GetVersionEx
call actually accessing the true OS version, purposefully bumping the api revision could clearly signal that the program is ready to accept the lowest period sizes.
but if the device is only accepting 8 channels, that will either fail creating the stream, or lose half of the channels.
I mean, if you are targetting a virtual device then I don't see why it should? Microsoft specifically notes uses cases like outputting to "digital mixer or a digital audio storage device", and WavPack's creator even mentioned ambisonics in particular.
From a totally theoretical read of the documentation, it doesn't really seems that's up to you to bother or rethink. My guess was that just like putting the right version string in the application manifest unlocks the same
GetVersionEx
call actually accessing the true OS version, purposefully bumping the api revision could clearly signal that the program is ready to accept the lowest period sizes.
Yeah, I suppose requesting an IAudioClient3
interface could provide alternate versions of the base IAudioClient
functions, which could behave a bit different.
but if the device is only accepting 8 channels, that will either fail creating the stream, or lose half of the channels.
I mean, if you are targetting a virtual device then I don't see why it should? Microsoft specifically notes uses cases like outputting to "digital mixer or a digital audio storage device", and WavPack's creator even mentioned ambisonics in particular.
That doesn't seem to be talking about creating a stream with more channels than the WASAPI device accepts. If a device only accepts 8 channels, it won't accept 16 channels.
But as for the quoted part about nChannels and dwChannelMask:
Typically, the count in nChannels equals the number of bits set in dwChannelMask, but this is not necessarily so. If nChannels is less than the number of bits set in dwChannelMask, the extra (most significant) bits in dwChannelMask are ignored. If nChannels exceeds the number of bits set in dwChannelMask, the channels that have no corresponding mask bits are not assigned to any physical speaker position. In any speaker configuration other than KSAUDIO_SPEAKER_DIRECTOUT, an audio sink like KMixer (see KMixer System Driver) simply ignores these excess channels and mixes only the channels that have corresponding mask bits.
The mismatch behavior doesn't match my experience with the topic. If nChannels
is less than the number of bits in dwChannelMask
, that's an error, whereas if nChannels
is greater than the number of bits in dwChannelMask
, the mask indicates what the first set of channels are and the extra channels go to whatever remaining outputs the target has available. I really wish it was as it says,
If nChannels exceeds the number of bits set in dwChannelMask, the channels that have no corresponding mask bits are not assigned to any physical speaker position. [...] an audio sink like KMixer (see KMixer System Driver) simply ignores these excess channels and mixes only the channels that have corresponding mask bits.
as that would make 3- and 4-channel UHJ files more practical. You could specify nChannels
as 3 or 4 and dwChannelMask
as just SPEAKER_FRONT_LEFT | SPEAKER_FRONT_RIGHT
, so the first two channels would be played as stereo and the third and fourth would be ignored and not played, as intended. But that's not how it has worked for me.
@dbry
In case it's relevant/useful regarding channel count expansion, turns out we can use CRU to fake up to 7.1 by modifying the EDID of the stereo-only playback device of a DisplayPort/HDMI display to be able to use HeSuVi to virtualize the system surround mix into stereo that gets forwarded to the 2 real channels (the 6 extra ones are silent/unused, of course).
I've always wanted to do something like that for regular sound cards, so I emailed dechamps and this was his response:
If I remember correctly, the Windows audio engine allows SFX/LFX "pre-mix" APOs (but not EFX/GFX "post-mix" APOs) to have a different number of input channels vs. output channels. This might make it possible to build a third-party APO that do what you describe, but I never tried anything like this. I wonder if Equalizer APO might be able to do this out-of-the-box with the right config, or perhaps you could build it from source and make a couple of small code changes to experiment with the SFX APO input channel count negotiation.
I think the main issue you might run into with an APO-based solution is that, while the SFX APO might support downmixing, the application itself might still decide to open the audio stream with only 2 channels instead of 5.1/7.1, most likely because the application is trying to autodetect the channel count to use from the hardware audio device info (which would still report 2 channels), bypassing APOs. Maybe it's possible to work around that by making the APO downright refuse to negotiate 2 channels, i.e. reject 2 channels and suggest 5.1/7.1 instead in IsInputFormatSupported(). It's quite possible this could cause massive compatibility issues with applications though (especially those that only support stereo).
As you suggest, another approach is to use a filter driver to intercept Kernel Streaming (KS) device info query calls from the Windows Audio Engine to the Windows WDM audio device driver to make it look like the device is a 5.1/7.1 device instead of a stereo device. However, I see two obstacles here. First, implementing an audio filter driver tends to be extremely challenging (it's harder than implementing a normal audio driver which is already not a piece of cake), even for something as simple as changing device metadata, because KS is a notoriously complicated, over-engineered and under-documented API. You would likely need to do things like stateful interception of calls which will get very complicated very fast (you'd likely end up having to maintain all kinds of mappings between the "frontend" user-facing view of the KS device/pins graph and the "backend" device-facing view). The second problem is that, if you expose 5.1/7.1 from the driver, then this presumably means the filter driver will actually end up receiving a 5.1/7.1 audio stream. But the underlying backend driver only supports 2 channels. Which means the filter driver has to do format conversion. Which means the filter driver now has to actually intercept audio buffers, and I must warn you that is likely to bump the difficulty from "really hard" to "batshit insane" as there are lots of difficulties with intercepting KS streams (buffer management, timing/scheduling, having to implement support 3 different streaming methods depending on the hardware used, etc.). On top of all that, all this development would have to happen in Windows kernel mode, where everything is harder than user mode (need to write everything in C, need to use a separate machine/VM with a kernel debugger, bugs mean the whole machine BSODs, need to test with many different kinds of hardware because there are lots of variations on how different underlying drivers react to KS calls, driver signing requirements are a nightmare, etc.). Anyway... basically this whole approach is an uphill battle with an extremely steep slope. You can look at WinSoftVol if you're curious as to how the basic skeleton of a trivial, minimal WDM/KS audio filter driver looks like (but note that WinSoftVol solves a much, much simpler problem than what you're trying to do).
Another solution that you might want to look into is intercepting calls in user mode by hooking calls into application-facing Windows audio APIs. In practice you can probably get away with only hooking into WASAPI as DirectSound and MME just use WASAPI behind the scenes. For example you could try to intercept application calls to IAudioClient::IsFormatSupported() so that the call always reports 5.1/7.1 as available, and then you can do the actual downstreaming in an SFX APO as I described above. I've never tried anything like this, and my understanding is that this also likely comes with its own set of "interesting" challenges, but it is likely more realistic (and arguably cleaner) than attempting to write a filter driver. You could distribute this as a Windhawk mod, for example. One potential difficulty here is that WASAPI is a COM-based API, and I have no idea how COM call interception looks like as COM involves dynamic dispatch, as opposed to direct function calls, but presumably there are people out there who know how to do it.
Hello kCat,
Are there any plans to use the full AmbiSonics 3rd order (16 speakers max) with OpenAl-Soft in a way that I can directly route 16 channels from the internal decoder to discrete outputs, let's say 16 ASIO channels on two sound cards?
I don't know how to explain it better, but I want to use up to 16 speakers with the outputs of two ASIO cards. (no Receiver, and also no HDMI)
The only way to move around the Windows 7.1 config (as I see it) would be to directly address the ASIO channels that are available in the system. So the decoder has to use an algorythm that produces true 16 channels from the internal B-stream, and map them to the channel-table.
But I assume that OpenAL-Soft never support the tool to do the matrix calculations for 16 speakers and the ambdec config file itself. The "highest" configuration at the moment uses 7.1.4 speakers. The downside of it is that is produces no sound from below.
The idea was that I built my own matrix/rig of speakers (up to 16), but I don't know at the moment how to implement this with OpenAL-Soft.