lordmulder / DynamicAudioNormalizer

Dynamic Audio Normalizer
Other
251 stars 36 forks source link

DynamicAudioNormalizerNET.flushBuffer() throws unknown exception on next processInplace() #29

Closed ghost closed 1 year ago

ghost commented 1 year ago

Using Dotnet 7, for the life of Christ something is quite broken using it with dynaudnorm.

I've tried dynaudnorm() on two projects now. Both crashes the processInplace() after calling flushBuffer(). The only way that works is by not calling it, or call reset(). None of these make the sound seamless and the sounds produced on my second project are utter complete broken mess.

The first project processes long pcm waves. The second project is processing live recording in ieee floating point format. I'm very certain that I've gotten the conversion of bits, sample rate, and the rest correct. But regardless of how I tweaked, it either ends up sounding worse, or throwing unknown exception.

Unknown exception? Yeah. The exception messages are unhandled. I've checked the sources on the github and there are certainly message for each and every exception handling at least. So, something is broken.

This is the error it throws at me every time.

DynamicAudioNormalizer.DynamicAudioNormalizerNET_Error: Unhandeled exception in native MDynamicAudioNormalizer function!
   at DynamicAudioNormalizer.DynamicAudioNormalizerNET.processInplace(Double[,] samplesInOut, Int64 inputSize)
   at NNamespace.CDynAudNorm.Process(Stream streamInput)  ...

I've also tried older versions, but no luck. Nothing works.

I've obtained the how-to-use-dynaudnorm from the NET samples as well as the readme.html. It doesn't put much further into details about what else are there to know about flushBuffer().

I'm sure I've referenced it correctly. StackOverflow screams at me to just add this line into the csproj. Since the NET sample doesn't have a .csproj, I've got no clue. Adding all the x64 bit dlls just throws dependency package error at me.

  <ItemGroup>
  <Reference Include="DynamicAudioNormalizerNET.dll" />
  </ItemGroup>

I really want this to work. Could I get some help on this, please? Thanks!

ghost commented 1 year ago

Never mind. The documentation didn't mention that continuous streams do not need to call the flushBuffer( ). Though, I still don't know why it causes exception. As long as I call flushBuffer(), processInplace() will 100% crash. The documentation mentions nothing about requiring to call reset() right after flushBuffer() to reuse the instance. Unless I'm blind and glossed that over. I still don't know if this is even right.

No, the MOST important part of the documentation that I stupidly glossed over is the that Dynaudnorm is not Thread Safe. I had to lock the processInplace() line. I had real hard time debugging the reason why the sound was so bad, was because I used the horrid sample C# code from this repo, they're extremely not optimised and I had to do a lot of refactoring.

Thanks a lot!

I can't use C# to call its native dynaudnorm api. That requires c++ class wrapper, which is out of my league. C# don't give much options in its managed code. Unmanaged bits is too complex for my smooth brain. The first project was a text to speech, seems that using dynaudnorm on it caused a lot of sound artifacts. While my second project involves live-process-and-play simultaneously and requires enough performance to have seamless playback.

This is what I came up with for continuous processing of streams for immediate playback: The first ProcessStream( ) uses disposable BinaryReader and BinaryWriter on one singular MemoryStream. It seems I haven't encountered that dynaudnorm outputs buffer bigger than its input. Though the output buffer size do change, this means I can optimise it by reusing the very same memoryStream for input and output. The second ProcessStream() uses BitConverter. I think the BinaryReader/BinaryWriter is the fastest here. Didn't really do a diagnostic test, though.

    public int ProcessStream( Stream memoryStream, int bufferBytesSize ) {
        using ( BinaryReader binaryReader = new BinaryReader( memoryStream ) )
        using ( BinaryWriter binaryWriter = new BinaryWriter( memoryStream ) ) {
            int bufferDoublesSize = bufferBytesSize / ( ChannelsCount * 4 );
            double[ , ] bufferDoubles = new double[ ChannelsCount, bufferDoublesSize ];
            for ( int i = 0; i < bufferDoublesSize; i++ )
            for ( int c = 0; c < ChannelsCount; c++ )
                bufferDoubles[ c, i ] = ( double ) binaryReader.ReadSingle( ) / SingleMaxHalf;

            lock ( ThreadLock ) {
                bufferDoublesSize = ( int ) Instance.processInplace( bufferDoubles, bufferDoublesSize );
            }
            if ( bufferDoublesSize == 0 )
                return 0;
            bufferBytesSize = bufferDoublesSize * ChannelsCount * 4;
            memoryStream.Position = 0;
            for ( int i = 0; i < bufferDoublesSize; i ++ )
            for ( int c = 0; c < ChannelsCount; c ++ )
                binaryWriter.Write( ( Single ) ( bufferDoubles[ c, i ] * SingleMaxHalf ) );
            return bufferBytesSize;
        }
    }
    public int ProcessStream( byte[ ] bufferBytes, int bufferBytesSize ) {
        int bufferDoublesSize = bufferBytesSize / ( ChannelsCount * 4 );
        double[ , ] bufferDoubles = new double[ ChannelsCount, bufferDoublesSize ];
        for ( int i = 0; i < bufferDoublesSize; i++ )
        for ( int c = 0; c < ChannelsCount; c++ )
            bufferDoubles[ c, i ] = ( double ) BitConverter.ToSingle( bufferBytes, ( ( i * 2 + c ) * 4 ) ) / SingleMaxHalf;

        bufferDoublesSize = ( int ) Instance.processInplace( bufferDoubles, bufferDoublesSize );
        if ( bufferDoublesSize == 0 )
            return 0;
        bufferBytesSize = bufferDoublesSize * ChannelsCount * 4;
        byte[ ] bytes4;
        int index;
        for ( int i = 0; i < bufferDoublesSize; i ++ )
        for ( int c = 0; c < ChannelsCount; c ++ ) {
            bytes4 = BitConverter.GetBytes( ( Single ) bufferDoubles[ c, i ] * SingleMaxHalf );
            index = ( i * 2 + c ) * 4;
            bufferBytes[ index ] = bytes4[ 0 ];
            bufferBytes[ index + 1 ] = bytes4[ 1 ];
            bufferBytes[ index + 2 ] = bytes4[ 2 ];
            bufferBytes[ index + 3 ] = bytes4[ 3 ];
        }

        return bufferBytesSize;
    }

For the closure... I've tried playback using mpv/ffmpeg dynaudnorm, ffplay dynaudnorm, and this dynaudnorm + NAudio/CSCore. After some tries, I manage to get mpv/ffmpeg and c# dynaudnorm to work without errors. CSCore is the library I decided to call it as upgrade as it has the option to reduce the latency significantly. Though, for some weird reason I couldn't get ffplay's dynaudnorm to work with the same source wave. Setting -af "dynaudnorm" causes it to freeze and buffers strangely, and adding options for its configurationmakes it not work at all, the sound came out unprocessed.

I also noticed that the latest ffmpeg master as of writing this, have more options to configure dynaudnorm. While this repo here seems to be abandoned(?).

Okay, I'm done ranting. Closing this...

lordmulder commented 1 year ago

Never mind. The documentation didn't mention that continuous streams do not need to call the flushBuffer( )

Huh? I think the documentation is pretty clear on this 😏

From the documentation:

MDynamicAudioNormalizer::flushBuffer() This function shall be called at the end of the process, after all input samples have been processed via processInplace() function, in order to flush the pending samples from the internal buffer. It writes the next pending output samples into the output buffer, in FIFO order, if and only if there are still any pending output samples left in the internal buffer. Once this function has been called, you must call reset() before the processInplace() function may be called again! If this function returns fewer output samples than the specified buffer size, then this indicates that the internal buffer is empty.

See also:

Quick Start Guide:

  1. Create a new MDynamicAudioNormalizer instance. This allocates required resources.
  2. Call the initialize() method, once, in order to initialize the MDynamicAudioNormalizer instance.
  3. Call the processInplace() method, in a loop, until all input samples have been processed.
  4. Call the flushBuffer() method, in a loop, until all the pending "delayed" output samples have been flushed.
  5. Destroy the MDynamicAudioNormalizer instance. This will free up all allocated resources.

So, you call flushBuffer() at the very end, in order to drain the pending sample from the internal buffer. It is not supposed to be interleaved with processInplace() method! Wouldn't make any sense...


BTW: The "dynaudnorm" filter in FFmpeg is a completely independent re-write. It apparently was inspired by MDynamicAudioNormalizer, but doesn't use my code at all. The MDynamicAudioNormalizer core library is written in C++, whereas the dynaudnorm" filter in FFmpeg is written in pure C. It also uses a lot of FFmpeg "internal" functions/infrastructure. So, back-porting their changes or fixes to MDynamicAudioNormalizer is not easily possible... 🤔

Compare:

I have not been working on MDynamicAudioNormalizer recently, due to lack of time and because of other projects 😪

Regards.

ghost commented 1 year ago

Huh? I think the documentation is pretty clear on this 😏

From the documentation:

MDynamicAudioNormalizer::flushBuffer() This function shall be called at the end of the process, after all input samples have been processed via processInplace() function, in order to flush the pending samples from the internal buffer. It writes the next pending output samples into the output buffer, in FIFO order, if and only if there are still any pending output samples left in the internal buffer. Once this function has been called, you must call reset() before the processInplace() function may be called again! If this function returns fewer output samples than the specified buffer size, then this indicates that the internal buffer is empty.

What the hell, how did I even miss that?! How?! I can't even... I apologise that I couldn't even read, this part is all my fault for skipping docs.

BTW: The "dynaudnorm" filter in FFmpeg is a completely independent re-write. It apparently was inspired by MDynamicAudioNormalizer, but doesn't use my code at all. The > MDynamicAudioNormalizer core library is written in C++, whereas the dynaudnorm" filter in FFmpeg is written in pure C. It also uses a lot of FFmpeg "internal" functions/infrastructure. So, back-porting their changes or fixes to MDynamicAudioNormalizer is not easily possible... 🤔

Again, my apologise for ranting this on you. Dynaudnorm was truly the best thing I've ever discovered in my life. I use it everyday. Please don't let my rants discourage you. If it did, I apologise.

The limitation is that dynaudnorm is only for media files. What I really want is implementing dynaudnorm system-wide -- just like window's loudness equalization. But that's not an easy task, I'm also new to audio systems and their terminologies are just as complicated.

The only way I could think of now is to reroute application audio to another output endpoint, record the endpoint using Wasapi loopback capture, process the audio samples with dynaudnorm, and finally playback using naudio/cscore/ffmpeg/ffplay/mpv to the default endpoint output. I look forward to implement wasapi loopback capture per application, which is demonstrated possible by [ApplicationLoopback example](https://github.com/microsoft/windows-classic-samples/tree/main/Samples/ApplicationLoopback). But it's written in C++ and I don't even know how am I going to port it to C# if that's even possible.

Of course the best solution is to implement it as a custom audio driver. But you know... There's no freaking way I can do that. Not with C#. C and C++ are way too complicated for my age. After all, I also don't know if it's possible to implement the existing loudness equalization to combine with dynaudnorm on the custom driver. The combination is the best for headphone/earphone user. While sole dynaudnorm is best for loudspeakers.

As for the closure, I really want to thank you for creating Dynaudnorm! I can't live without it because it's so good. It combats against the problem with https://en.wikipedia.org/wiki/Loudness_war as well as the opposite.

Thank you for your time.

lordmulder commented 1 year ago

DynamicAudioNormalizer requires a large delay. This is inherent to the algorithm, because the core idea is to "smooth out" each sample's normalization factor was those of the preceding ("past") and the subsequent ("future") samples. Obviously, the only way to be able to access "future" samples is by buffering a sufficiently large chunk of samples in the filter, so that, at the moment when we process the sample at position n, we already have all samples up to position n + (1/2 windows_size) in the buffer. The requirement to buffer a large number of samples after the sample that we currently process unavoidable results in a large delay!

This is also the reason why, at the end of the process, the "pending" samples in the buffer need to be flushed 😏

For the "offline" processing of media files, the "delay" is not a problem, because we can read/write the input/output samples at any rate that we like. But this will never work for real-time processing – unless you're okay with a delay of ~30 seconds 😨

TL;DR: Using DynamicAudioNormalizer as a system-wide "loudness equalization" filter, for in real-time processing, will not work.

ghost commented 1 year ago

DynamicAudioNormalizer requires a large delay. This is inherent to the algorithm, because the core idea is to "smooth out" each sample's normalization factor was those of the preceding ("past") and the subsequent ("future") samples. Obviously, the only way to be able to access "future" samples is by buffering a sufficiently large chunk of samples in the filter, so that, at the moment when we process the sample at position n, we already have all samples up to position n + (1/2 windows_size) in the buffer. The requirement to buffer a large number of samples after the sample that we currently process unavoidable results in a large delay!

This is also the reason why, at the end of the process, the "pending" samples in the buffer need to be flushed 😏

For the "offline" processing of media files, the "delay" is not a problem, because we can read/write the input/output samples at any rate that we like. But this will never work for real-time processing – unless you're okay with a delay of ~30 seconds 😨

TL;DR: Using DynamicAudioNormalizer as a system-wide "loudness equalization" filter, for in real-time processing, will not work.

Umm, I do not know what you're talking about. I've been using dynaudnorm as realtime with no delay.

I use "dynaudnorm=f=10:g=3:m=100:n=1" for general audio to maximize all quietness with shortest frame time for most reactive response, same as with windows loudness equalisation for shortest response time to eliminate any delays. Or with "dynaudnorm=f=10:g=3:m=100:n=0" to decouple channels for soundtracks to bring out the sound of all instruments effectively. Coupled channels works best for non-music type of audio as decoupled could bring out unwanted audio artifacts.

The biggest problem with every audio production today is that majority parts of the audio are too quiet, where we have to tune up the volume, but risk ear raped by loud areas. My solution? Dynaudnorm the crap out of quiet parts into as loud as possible so the loudness is equialised at max.

Of course this solution seems to cause certain types of audio to explode. Such as 8-bit retro-style chiptune audio, which becomes insanely louder than usual.

So, it works. A bit broken. That's why for earphone users, it's best to combine dynaudnorm with loudness equalization for most effective zero-ear-rape situations with max loudness of quietness.

I suppose if I were to make it into an audio driver, I would have to address these issues. But even windows loudness equalization isn't perfect. It does not work well with speakers and causes the audio to be too quiet, even combining with earphone.

lordmulder commented 1 year ago

Again: It is inherent to the way how DynAudNorm works that we must have access to the "future" samples, i.e. the samples that are going to follow after the sample that we currently process. The one and only way to accomplish this (assuming that we don't have a time machine!) is by using a large FIFO buffer. Simply put, the sample that we currently process is always located in the center of the FIFO buffer, the preceding ("past") samples are on the left, and the subsequent ("future") sample are on the right.

Now, before we can start processing the samples (i.e. before we can start returning any output samples), we necessarily have to fill our FIFO buffer! This is not a problem at all when processing files, because we can simply "read ahead" in the input file as far as we want/need, before we start writing out the output file. Even though there still is a "delay" in the way how the samples are processed, there is not really a delay in the end result (output file). This is totally different when doing real-time processing! When doing real-time processing, you can not "read ahead". You actually have to wait for those input samples to come in!

In simple words, we have to record quite a few seconds of input audio material from the microphone (or whatever "real-time" source we are using), before we can start sending any output audio material to the speakers. That is a "real" delay 😨

Surely, you can make the delay smaller, by using a smaller "window" size, but you can never get rid of the delay! Also, the smaller you make the "window" size (and thus the delay), the less effective the Gaussian smoothing will be! After all, the whole idea of DynAudNorm is to apply a Gaussian smoothing kernel on the normalization factors, over a relatively large "window". Making the "window" extremely small, kind of defeats the algorithm. You'd effectively end up with a strong "dumb" compressor.

If you use DynAudNorm with an extremely small "window" size, you might just as well not use it at all, but rather use a simple "compressor" – like the one that's probably built into your sound card or amplifier. If, on the other hand, you use DynAudNorm with sufficiently large "window" size, then there will be a significant delay. Cant't have the cake and eat it too 😏

TL;DR: You need a relatively large delay in order to make DynAudNorm work properly, which conflicts "real-time" processing.