kcat / openal-soft

OpenAL Soft is a software implementation of the OpenAL 3D audio API.
Other
2.23k stars 536 forks source link

Upgrade makemhr workflow to build and upload utils #1026

Closed ThreeDeeJay closed 3 months ago

ThreeDeeJay commented 3 months ago

I've had this branch for a while and was reminded of it here and then I recently saw https://github.com/kcat/openal-soft/commit/3b10c6f9bc83e535f5de119f668ce6fd17e4e67d and wanted to try it out so I figured it'd be a good time to share this new workflow. Which by the way, is playback supposed to work or is it still a WIP? When I try the LAF files I posted here, I just get a channel breakdown and it immediately exits the process without playing any audio 🤔 image

kcat commented 3 months ago
  • Includes makemhr, sofa-info, alrecord, altonegen, openal-info and allafplay Windows executables in utils.zip

    • Uses my libmysofa fork that fixed the Invalid format error by removing a filesize/density limit, so that makemhr is able to convert larger SOFA files (like EAC HRTFs with 5° elevations/azimuths) v1.3.2 (latest upstream) ... My fork (just a two-liner so let me know if you'd rather fork and remove limit yourself or if I should revert to the limited official branch) ...

It's understandable to want to use a version that doesn't arbitrarily limit the supported size, but that particular change seems questionable. It's removing checks for the sizes being negative, and with larger values, may need overflow checks in various other places. Improperly constructed SOFA files may be erroneously detected as valid, and cause misbehavior without overflow checks.

Which by the way, is playback supposed to work or is it still a WIP? When I try the LAF files I posted here, I just get a channel breakdown and it immediately exits the process without playing any audio 🤔

It should work, yes. Odd that it seems to fail reading the channel header data (it's all 0s, when the E values should be -0 or -nan), despite correctly reading the header markers and the quality, mode, and track count values, which in turn causes it to fail to read the sample rate and sample count, aborting playback. I don't see why, the file is opened in binary mode and those files play for me; if there was a problem reading binary/non-text data through std::ifstream on Windows, I'd have expected a similar problem with OpenAL Soft itself loading mhr files, which it obviously does without issue.

Unrelated, but it also seems like my trick of calling std::terminate in a catch(...) block doesn't seem to actually report anything on Windows? No dialog box about an abnormal termination, nor a message about why it terminated?

ThreeDeeJay commented 3 months ago

It's understandable to want to use a version that doesn't arbitrarily limit the supported size, but that particular change seems questionable. It's removing checks for the sizes being negative, and with larger values, may need overflow checks in various other places. Improperly constructed SOFA files may be erroneously detected as valid, and cause misbehavior without overflow checks.

Makes sense. I reverted it to the official repo though I updated it to v1.3.2 which has a slightly more generous limit. So what if we just keep the negative size check, but expand the limit to around 4GB? For reference a 1 degree azimuth/elevation EAC HRTF is >2GB so 96Khz might be double that, and I've seen other SOFAs that are GBs in size. explorer_LceHBLufWm

It should work, yes. Odd that it seems to fail reading the channel header data (it's all 0s, when the E values should be -0 or -nan), despite correctly reading the header markers and the quality, mode, and track count values, which in turn causes it to fail to read the sample rate and sample count, aborting playback. I don't see why, the file is opened in binary mode and those files play for me; if there was a problem reading binary/non-text data through std::ifstream on Windows, I'd have expected a similar problem with OpenAL Soft itself loading mhr files, which it obviously does without issue.

Looks like https://github.com/kcat/openal-soft/commit/7a7350af2b2cb9ad6103a43e7a932a4c45c31cb7 did the trick because this time I did get audio playback 👍 image Dolby's Universe Fury demo had never sounded this good to me with my HRTF, except, that there are really loud static noises at seemingly random times, which I just noticed also happen in Cavern when playing the LAF, but when playing the original file directly, I instead get a suppressed pop/clip. So there might be a conversion bug or the Cavern decoder is yet to address some data getting corrupted 🤔

EDIT: Welp, apparently the static noise is a known issue with DD+ JOC https://github.com/VoidXH/Cavern/issues/17 But on the bright side, playing the LAF converted from ADM sounds pretty much flawless

Log ```batch C:\Programs\Cavern\GUI>allafplay.exe "Nature's Fury.laf" Opened "OpenAL Soft on Speakers (SB X-Fi Surround 5.1)" Filename: "Nature's Fury.laf" quality: 16-bit int mode: objects track count: 126 Track 0: E=-0, A=0 (LFE: 0) Track 1: E=-0, A=0 (LFE: 0) Track 2: E=-0, A=0 (LFE: 0) Track 3: E=-0, A=0 (LFE: 1) Track 4: E=-0, A=0 (LFE: 0) Track 5: E=-0, A=0 (LFE: 0) Track 6: E=-0, A=0 (LFE: 0) Track 7: E=-0, A=0 (LFE: 0) Track 8: E=-0, A=0 (LFE: 0) Track 9: E=-0, A=0 (LFE: 0) Track 10: E=-0, A=0 (LFE: 0) Track 11: E=-0, A=0 (LFE: 0) Track 12: E=-0, A=0 (LFE: 0) Track 13: E=-0, A=0 (LFE: 0) Track 14: E=-0, A=0 (LFE: 0) Track 15: E=-0, A=0 (LFE: 0) Track 16: E=-0, A=0 (LFE: 0) Track 17: E=-0, A=0 (LFE: 0) Track 18: E=-0, A=0 (LFE: 0) Track 19: E=-0, A=0 (LFE: 0) Track 20: E=-0, A=0 (LFE: 0) Track 21: E=-0, A=0 (LFE: 0) Track 22: E=-0, A=0 (LFE: 0) Track 23: E=-0, A=0 (LFE: 0) Track 24: E=-0, A=0 (LFE: 0) Track 25: E=-0, A=0 (LFE: 0) Track 26: E=-0, A=0 (LFE: 0) Track 27: E=-0, A=0 (LFE: 0) Track 28: E=-0, A=0 (LFE: 0) Track 29: E=-0, A=0 (LFE: 0) Track 30: E=-0, A=0 (LFE: 0) Track 31: E=-0, A=0 (LFE: 0) Track 32: E=-0, A=0 (LFE: 0) Track 33: E=-0, A=0 (LFE: 0) Track 34: E=-0, A=0 (LFE: 0) Track 35: E=-0, A=0 (LFE: 0) Track 36: E=-0, A=0 (LFE: 0) Track 37: E=-0, A=0 (LFE: 0) Track 38: E=-0, A=0 (LFE: 0) Track 39: E=-0, A=0 (LFE: 0) Track 40: E=-0, A=0 (LFE: 0) Track 41: E=-0, A=0 (LFE: 0) Track 42: E=-0, A=0 (LFE: 0) Track 43: E=-0, A=0 (LFE: 0) Track 44: E=-0, A=0 (LFE: 0) Track 45: E=-0, A=0 (LFE: 0) Track 46: E=-0, A=0 (LFE: 0) Track 47: E=-0, A=0 (LFE: 0) Track 48: E=-0, A=0 (LFE: 0) Track 49: E=-0, A=0 (LFE: 0) Track 50: E=-0, A=0 (LFE: 0) Track 51: E=-0, A=0 (LFE: 0) Track 52: E=-0, A=0 (LFE: 0) Track 53: E=-0, A=0 (LFE: 0) Track 54: E=-0, A=0 (LFE: 0) Track 55: E=-0, A=0 (LFE: 0) Track 56: E=-0, A=0 (LFE: 0) Track 57: E=-0, A=0 (LFE: 0) Track 58: E=-0, A=0 (LFE: 0) Track 59: E=-0, A=0 (LFE: 0) Track 60: E=-0, A=0 (LFE: 0) Track 61: E=-0, A=0 (LFE: 0) Track 62: E=-0, A=0 (LFE: 0) Track 63: E=-0, A=0 (LFE: 0) Track 64: E=-0, A=0 (LFE: 0) Track 65: E=-0, A=0 (LFE: 0) Track 66: E=-0, A=0 (LFE: 0) Track 67: E=-0, A=0 (LFE: 0) Track 68: E=-0, A=0 (LFE: 0) Track 69: E=-0, A=0 (LFE: 0) Track 70: E=-0, A=0 (LFE: 0) Track 71: E=-0, A=0 (LFE: 0) Track 72: E=-0, A=0 (LFE: 0) Track 73: E=-0, A=0 (LFE: 0) Track 74: E=-0, A=0 (LFE: 0) Track 75: E=-0, A=0 (LFE: 0) Track 76: E=-0, A=0 (LFE: 0) Track 77: E=-0, A=0 (LFE: 0) Track 78: E=-0, A=0 (LFE: 0) Track 79: E=-0, A=0 (LFE: 0) Track 80: E=-0, A=0 (LFE: 0) Track 81: E=-0, A=0 (LFE: 0) Track 82: E=-0, A=0 (LFE: 0) Track 83: E=-0, A=0 (LFE: 0) Track 84: E=-0, A=0 (LFE: 0) Track 85: E=-0, A=0 (LFE: 0) Track 86: E=-0, A=0 (LFE: 0) Track 87: E=-0, A=0 (LFE: 0) Track 88: E=-0, A=0 (LFE: 0) Track 89: E=-0, A=0 (LFE: 0) Track 90: E=-0, A=0 (LFE: 0) Track 91: E=-0, A=0 (LFE: 0) Track 92: E=-0, A=0 (LFE: 0) Track 93: E=-0, A=0 (LFE: 0) Track 94: E=-0, A=0 (LFE: 0) Track 95: E=-0, A=0 (LFE: 0) Track 96: E=-0, A=0 (LFE: 0) Track 97: E=-0, A=0 (LFE: 0) Track 98: E=-0, A=0 (LFE: 0) Track 99: E=-0, A=0 (LFE: 0) Track 100: E=-0, A=0 (LFE: 0) Track 101: E=-0, A=0 (LFE: 0) Track 102: E=-0, A=0 (LFE: 0) Track 103: E=-0, A=0 (LFE: 0) Track 104: E=-0, A=0 (LFE: 0) Track 105: E=-0, A=0 (LFE: 0) Track 106: E=-0, A=0 (LFE: 0) Track 107: E=-0, A=0 (LFE: 0) Track 108: E=-0, A=0 (LFE: 0) Track 109: E=-0, A=0 (LFE: 0) Track 110: E=-0, A=0 (LFE: 0) Track 111: E=-0, A=0 (LFE: 0) Track 112: E=-0, A=0 (LFE: 0) Track 113: E=-0, A=0 (LFE: 0) Track 114: E=-0, A=0 (LFE: 0) Track 115: E=-0, A=0 (LFE: 0) Track 116: E=-0, A=0 (LFE: 0) Track 117: E=-0, A=0 (LFE: 0) Track 118: E=-nan(ind), A=0 (LFE: 0) Track 119: E=-nan(ind), A=0 (LFE: 0) Track 120: E=-nan(ind), A=0 (LFE: 0) Track 121: E=-nan(ind), A=0 (LFE: 0) Track 122: E=-nan(ind), A=0 (LFE: 0) Track 123: E=-nan(ind), A=0 (LFE: 0) Track 124: E=-nan(ind), A=0 (LFE: 0) Track 125: E=-nan(ind), A=0 (LFE: 0) Channels: 118 Sample rate: 48000 Length: 5120000 samples (106.667 sec) C:\Programs\Cavern\GUI> ```

Unrelated, but it also seems like my trick of calling std::terminate in a catch(...) block doesn't seem to actually report anything on Windows? No dialog box about an abnormal termination, nor a message about why it terminated?

Nope, nada.

kcat commented 3 months ago

So what if we just keep the negative size check, but expand the limit to around 4GB? For reference a 1 degree azimuth/elevation EAC HRTF is >2GB so 96Khz might be double that, and I've seen other SOFAs that are GBs in size.

Looks like it may need an overflow check too. Something like

if(elements <= 0 || size <= 0 || INT_MAX/size > elements)
    ... error ...

I don't know how this check in specific relates to the file or dataset size, but at least the result of elements * size can't go far above 2.1 billion, or else it'll overflow and become undefined behavior. Expanding that would need more work to avoid signed ints, where those kinds of changes tend to need farther reaching changes since those variables may interact with other variables that may not be happy when they're different types.

Looks like 7a7350a did the trick because this time I did get audio playback 👍

That's quite odd. It shouldn't have affected reading, basically just replacing terminating asserts with exceptions that skip just the current file.

EDIT: Welp, apparently the static noise is a known issue with DD+ JOC VoidXH/Cavern#17 So I wonder how this person managed to get a clean sound

This demo was converted from DD+ E-AC-3 JOC to LAF with Cavern and then rendered using OpenAL Soft's Default HRTF

I may not know what I'm talking about, or misunderstanding what I thought I read, but IIRC, support for pure DD+ JOC was a problem, while E-AC-3 JOC worked better. Or maybe that was TrueHD with Atmos that didn't work, while DD+/E-AC-3 with Atmos did.

On the bright side, playing the LAF converted from ADM (invite) works beautifully

That's the kind of thing I expected to see from LAF files converted from Atmos. Rather than the other LAF file that had bursts of static, which seems to just be a 7.1.4 mix (that doesn't even mark the LFE channel properly), that one there uses a fixed channel bed (with a properly marked LFE channel) and several dozen dynamic objects. It does still seem to use left-handed coordinates though (+Z is front), which isn't correct according to the LAF documentation.

It does make me think the LAF player should try to be more efficient with source playback, stopping sources that are disabled in chunk headers, and starting them when enabled with more audio. Doing that would make it more efficient when multiple channels are disabled/silent most of the time. It would require the AL_SOFT_source_start_delay extension to make sure sources are restarted at the proper time synchronized with other sources (currently no extensions are required, outside of float32 and int32 for files that use float32 or int24 samples respectively), and if there's any resampling going on, synchronization will be extra tricky to account for sub-sample offsets.

ThreeDeeJay commented 3 months ago

Looks like it may need an overflow check too.

Done https://github.com/ThreeDeeJay/openal-soft/commit/f3d74878d2ac3e4e45b853569bec406dcb527bb3, though the =<5° EAC SOFA still fails to convert with the limit. Also, the CI now clones a specific commit, for stability reasons, and I added commit date, count and hash to the workflow artifact to avoid duplicate confusion when debugging

I may not know what I'm talking about, or misunderstanding what I thought I read, but IIRC, support for pure DD+ JOC was a problem, while E-AC-3 JOC worked better. Or maybe that was TrueHD with Atmos that didn't work, while DD+/E-AC-3 with Atmos did.

Yeah, what I do know is that TrueHD can't be implemented at the moment. So I've been looking into the process to at least mix to 9.1.6 so maybe I can convert it to ADM then into LAF, since source ADM tracks are extremely scarce. And apparently coarse JOC is what causes decoding artifacts:

There are two types of JOC coding, coarse and sparse. Coarse, which is widely used by movies, works without issues, sparse (found in some versions of amaze) doesn't, as the documentation is incorrect about it https://github.com/VoidXH/Cavern/issues/17#issuecomment-1268149518

kcat commented 3 months ago

Wait, sorry. That check should be

if(elements <= 0 || size <= 0 || elements > INT_MAX/size)
  return MYSOFA_INVALID_FORMAT;
ThreeDeeJay commented 3 months ago

Thanks, that did the trick! 👌 By the way why does it sometimes use less than the total IRs? 🤔 image

kcat commented 3 months ago

By the way why does it sometimes use less than the total IRs? 🤔

It ignores IRs that don't align with the detected uniform layout. The IR layout for the mhr file needs to have fixed stepping along the elevation (starting at +90 degrees and stepping down to -90 degrees), and fixed stepping along the azimuth for each valid elevation (starting at 0 degrees, directly in front, and going around a full 360 degrees).

ThreeDeeJay commented 3 months ago

Nice 👍 By the way, you might wanna delete https://github.com/kcat/openal-soft/releases/tag/makemhr since that release is now obsolete.

kcat commented 3 months ago

I may not know what I'm talking about, or misunderstanding what I thought I read, but IIRC, support for pure DD+ JOC was a problem, while E-AC-3 JOC worked better. Or maybe that was TrueHD with Atmos that didn't work, while DD+/E-AC-3 with Atmos did.

Yeah, what I do know is that TrueHD can't be implemented at the moment. So I've been looking into the process to at least mix to 9.1.6 so maybe I can convert it to ADM then into LAF, since source ADM tracks are extremely scarce. And apparently coarse JOC is what causes decoding artifacts:

There are two types of JOC coding, coarse and sparse. Coarse, which is widely used by movies, works without issues, sparse (found in some versions of amaze) doesn't, as the documentation is incorrect about it VoidXH/Cavern#17 (comment)

I'm curious if ffmpeg's code could help at all. It seems to handle decoding TrueHD tracks just fine, "truehd (Dolby TrueHD + Dolby Atmos)" even. "eac3 (Dolby Digital Plus + Dolby Atmos)" also plays fine, without any glitching. Interestingly, it reports and provides such tracks as 7.1, which seem to have everything in the mix. I can't do a deeper look to see if it's 7.1(.4) premixed or something that it's giving with a simple downmix, or if it has dynamic objects being moved and mixed in real-time by its decoder.

But on the bright side, playing the LAF converted from ADM sounds pretty much flawless

Also interestingly, this LAF doesn't seem too well optimized. Each chunk always has all tracks enabled, when I'm sure many of them are silent for long stretches of time. I also wonder if there's room to optimize track utilization (if two or more tracks don't play sound at the same time, they can be on the same track; you'd only need as many tracks as are played simultaneously). Those would help cut down on the LAF file size pretty well, I think.

ThreeDeeJay commented 3 months ago

I'm curious if ffmpeg's code could help at all. It seems to handle decoding TrueHD tracks just fine, "truehd (Dolby TrueHD + Dolby Atmos)" even. "eac3 (Dolby Digital Plus + Dolby Atmos)" also plays fine, without any glitching. Interestingly, it reports and provides such tracks as 7.1, which seem to have everything in the mix. I can't do a deeper look to see if it's 7.1(.4) premixed or something that it's giving with a simple downmix, or if it has dynamic objects being moved and mixed in real-time by its decoder.

Last I checked, ffmpeg could just mux/identify the TrueHD Atmos track and decode the 7.1 "core" surround, but not heights or objects (which I think are encoded into the 7.1 track with metadata), though some figured out how to render up to 9.1.6 using a script or Music Media Helper, both of which rely on Dolby Reference Player/Encoder

According to a user in our server, DDPA was implementable in FOSS since there is an open spec (ETSI EAC object carriage). Decoding THDA is not implementable in FOSS since there is no open spec, unless you get the proprietary software DRP to do the job, which is what the script does. https://cavern.sbence.hu/cavern/doc.php?p=Cavernize#cd click on "Codec disclaimer"

Also interestingly, this LAF doesn't seem too well optimized. Each chunk always has all tracks enabled, when I'm sure many of them are silent for long stretches of time. I also wonder if there's room to optimize track utilization (if two or more tracks don't play sound at the same time, they can be on the same track; you'd only need as many tracks as are played simultaneously). Those would help cut down on the LAF file size pretty well, I think.

I forwarded it to the dev since tbh I'm not really familiar with the Cavern codebase (I'm just the CI guy there too lol), so feel free to join the conversation here, or via a new issue/PR 👀👌

VoidXH commented 3 months ago

Also interestingly, this LAF doesn't seem too well optimized. Each chunk always has all tracks enabled, when I'm sure many of them are silent for long stretches of time. I also wonder if there's room to optimize track utilization (if two or more tracks don't play sound at the same time, they can be on the same track; you'd only need as many tracks as are played simultaneously). Those would help cut down on the LAF file size pretty well, I think.

Thanks for the implementation! When a LAF is exported by converting an E-AC-3 file, even if noise is present, the track will be active. One track is one object from the source. You could optimize it by clustering or gating the PCM signals, but that wouldn't be lossless.