sdroege / ebur128

Implementation of the EBU R128 loudness standard
MIT License
98 stars 16 forks source link

How to use for real time processing #53

Closed laenzlinger closed 1 year ago

laenzlinger commented 1 year ago

I am trying to use the lib in an lv2 plugin. Is it possible to run the calculation for each frame to avoid high processing need when the result is calculated.

As per recommendation I would like to expose the short term value at a frequency of 10Hz and the integrated value on 1Hz.

When getting the short therm value every 100ms the plugin produces click noises, which I guess are caused by calculation of the short term value every 100ms. The sample rate is 48kHZ

Thank you very much for some suggestions how to improve the code. Let me know if I should provide more details.

sdroege commented 1 year ago

Why does the plugin produce click noises though? This library is only analyzing samples, it is not modifying them.

Getting the short term loudness every 100ms shouldn't be a problem. The GStreamer audioloudnorm plugin is doing the same (it's processing in 100ms chunks).

laenzlinger commented 1 year ago

Thank you @sdroege for your reply and sorry for not quoting the source code of the plugin:

Here it is: https://github.com/pedalboard/loudness-meter.lv2/blob/fb30b2b54b0cc1d691ccf1dc4cbac73cafb40e62/src/lib.rs#L54

I did a test and removed just getting the loudness values (replaced that line with a static value). This solved the noise problem

I should also add that I tested the the same plugin with 2 different lv2 hosts:

Interestingly the click noises can only be heard on elk audio os

I will check out the audioloudnorm plugin. Hopefully I can learn from there about how to properly implement real time audio plugins in rust (both are pretty new to me, I apologize if I am asking stupid questions here)

sdroege commented 1 year ago

My guess would be that by the additional processing, in elk you're becoming too slow and get behind its latency deadline. You could measure how long that one line takes (std::time::Instant for example) and check if you can reproduce the same effect if you sleep there for that long.

laenzlinger commented 1 year ago

I just followed the first step: Measured the time that it takes to call ebu.loudness_shortterm(): 13623ns

Interesting: I am running with a Sample Rate of 48000 Hz and a Frame Size of 64 which results in 13333ns I guess thats already a first indicator that the calculation takes too long?

When I run the same plugin with jalv on a non realtime system. the calulation takes 1460820ns. This would be 1.46s which sound wrong to me. I guess my measurements are wrong - or I did something wrong on the setup of the library?

sdroege commented 1 year ago

Yeah, if processing each frame takes longer than the duration of the frame then you can't play it back in real time.

laenzlinger commented 1 year ago

this is how I measured the time

            let start = Instant::now();
            let short_term = self.ebu.loudness_shortterm().unwrap();
            let duration = start.elapsed();
            ports.short_term.set(duration.as_nanos() as f32);

then I can read the value with jalv (monitor command)

sdroege commented 1 year ago

Yes that's correct. You probably want to do this less often :)

laenzlinger commented 1 year ago

I am not sure if I understood correctly. What do you mean by doing it less often? Wouldn't the processing take longer if loudness_shortterm() is called less often (becuase there is more data to be processed)

I was hoping to find a way to run the processing more often, so that the required processing power could be evenly distributed. So when ever new samples are added, some part of the processing could be done. And then finally - once the result is required (in the recommended frequency of the spec) there is not much processing power needed anymore?

sdroege commented 1 year ago

Wouldn't the processing take longer if loudness_shortterm() is called less often (becuase there is more data to be processed)

No, that would take approximately the same time independent of how much data was processed so far. What takes different time proportional to the number of samples is add_samples().

laenzlinger commented 1 year ago

@sdroege thanks for the info. So if I understand correctly, I should also check how much time add_samples() takes.

I have reduced now the calculation to only 'Momentary' https://github.com/pedalboard/loudness-meter.lv2/blob/main/src/lib.rs#L48 (removed I and S Mode)

I update the meter every 100ms. This seems to be possible within the real-time task.

I will do more measurements and report here.

I running the code on a Raspberry Pi CM4 (Compute Module). I wondered, if this algorithm is really so complex, that it takes so much time? Do you have any ideas for how I could optimise my code?

sdroege commented 1 year ago

So if I understand correctly, I should also check how much time add_samples() takes.

No that's not what I meant :)

You were asking if loudness_shortterm() would take more time if more samples were processed and that's not the case. It will always take approximately the same amount of time.

The only function that takes longer if you provide more samples is add_samples().

I wondered, if this algorithm is really so complex, that it takes so much time?

I'm not sure which part exactly you mean that takes more time than you would expect. 13us does not seem a lot for loudness_shortterm(), but it also doesn't make sense to call this for every <13us of samples: it's the loudness over the last 3s, it's not going to change that rapidly.

Do you have any ideas for how I could optimise my code?

You're currently collecting all samples into a temporary Vec in your run() function. That's a heap allocation that you should probably avoid. Doing heap allocations as part of real-time audio processing is a bad idea. You could keep around an array in your struct and re-use that every time run() is called, for example.

I don't see any other obvious places to optimize the code.

laenzlinger commented 1 year ago

but it also doesn't make sense to call this for every <13us of samples: it's the loudness over the last 3s, it's not going to change that rapidly.

I think loudness_shortterm() is not called every 13us. There is a guard around the call which should only match every 100ms.

        let rate = self.ebu.rate() / 10;
        if self.sample_count > rate {
             // call shortterm_loudness()
        }

This is based on the following recommendation in https://tech.ebu.ch/docs/tech/tech3341.pdf

  1. The short-term loudness uses a sliding rectangular time window of length 3 s. The measurement is not gated. The update rate for ‘live meters’ shall be at least 10 Hz.

This was the reason, why I initially started with calling shortterm_loudness() every 100ms. Is this a bad idea?

You're currently collecting all samples into a temporary Vec in your run() function. That's a heap allocation that you should probably avoid. Doing heap allocations as part of real-time audio processing is a bad idea. You could keep around an array in your struct and re-use that every time run() is called, for example.

Oh thats a very good hint. Thank you very much. Real-time audio processing (and Rust) are both new to me. I need to learn a lot!

sdroege commented 1 year ago

I think loudness_shortterm() is not called every 13us.

Not right now but from what I understood you did that earlier when you ran into problems (e.g. https://github.com/sdroege/ebur128/issues/53#issuecomment-1613568343 ). Did I misunderstand that part?

But also the 1.4s in the non-realtime case are surprising.

loudness_shortterm() is running a calculation over the last 3s of samples that were added via add_samples(). There is no caching or anything, so every time you call it, it will process 3s of samples. This should take the same amount of time no matter when or how often you call it.

OOC, are you doing a release build btw, or is this a debug build? That's going to make a quite big difference.

laenzlinger commented 1 year ago

loudness_shortterm() is running a calculation over the last 3s of samples that were added via add_samples(). There is no caching or anything, so every time you call it, it will process 3s of samples. This should take the same amount of time no matter when or how often you call it.

Ok, with this information, i think i should:

a) run the loudness_shortterm() less often (maybe once a second) and b) run it as part of a lv2_worker task

The momentary measurements I would like to keep on a 10Hz frequency (as proposed by the EBU standard), for simplicity as part of the realtime thread. But I will repeat my time measurements. In the meantime I have upgraded to Sushi 1.1 which has improved to LV2 logging infrastructure. This should give me more confidence in my measurement results, which I previously exported via LV2 output parameters.

Would it make sense and be possible to implement some sort of caching to reduce processing power?

Does the same (non-caching) principle also apply to the momentary calculations?

laenzlinger commented 1 year ago

OOC, are you doing a release build btw, or is this a debug build?

I am pretty convinced that I used a release build for my measurements. But I am going to repeat the tests and make sure that I measure with a release build

sdroege commented 1 year ago

Would it make sense and be possible to implement some sort of caching to reduce processing power?

Possibly. You'd have to come up with a way for doing that efficiently first :) It needs a bit of thinking.

Does the same (non-caching) principle also apply to the momentary calculations?

Yes, it's exactly the same calculation but instead of over 3s it's only over 400ms of samples so should be ~7.5x faster.

laenzlinger commented 1 year ago

I can report some new measurement results:

The measured values are between 103 and 111 (micro seconds)

laenzlinger commented 1 year ago

I have also rewritten the code now to use a (very small 1 sample) buffer in the LV2 plugin instance: https://github.com/pedalboard/loudness-meter.lv2/blob/f02e89f11094d5a8f6c07bc44d549c853237d467/src/lib.rs#L66

As far as I understood, the buffer size should not have a huge impact on the performance. Is this correct?

Do you see anything else that could be improved?

sdroege commented 1 year ago

As long as that's all inlined, it should be fine. If you have one actual function call per sample that's going to be quite a bit of overhead from the function calls themselves :)

sdroege commented 1 year ago

Anything else left to be done here or can the issue be closed? :)

laenzlinger commented 1 year ago

can be closed, thanks a lot for your help