museumsvictoria / spatial_audio_server

An audio backend for the multi-layered soundscape of Beyond Perception: Seeing the Unseen, a permanent exhibition at Scienceworks in Melbourne, Australia.
81 stars 14 forks source link

Audio Crackling #180

Open freesig opened 6 years ago

freesig commented 6 years ago

Spikes in cpu happening every so often causing glitches in audio. Worse on start up

freesig commented 6 years ago

Here you can see two peaks where the flags are. They are around 90% cpu but only last about 18ms. This time period is 10s image

freesig commented 6 years ago

This is the responsible function: 9.00 ms 100.0% 7.00 ms core::ops::function::impls::_$LT$impl$u20$core..ops..function..Fn$LT$A$GT$$u20$for$u20$$RF$$u27$a$u20$F$GT$::call::h1b01f83aac3f9eb9 It's just called by this: 9.00 ms 100.0% 0 s _$LT$nannou..audio..requester..Requester$LT$S$GT$$GT$::fill_buffer::h837708360a31ea69 So it's something in the audio thread that is occasionally peaking

mitchmindtree commented 6 years ago

We spoke to Charlie earlier and he mentioned that he might have noticed glitching occurring more often for Continuous sounds (sounds that play to a continuous timeline) as opposed to Retrigger sounds (sounds that always restart from the beginning. The only difference between these two types of playback behind the scenes is that Continuous sounds seek to the correct position within the WAV file before starting playback. However, this occurs on the WAV reading threads (not on the audio thread) so I'm unsure how this could be impacting the audio thread glitching - just adding this comment for the record and as a possible avenue of investigation if we're struggling to find the culprit within the audio thread's fill_buffer function.

freesig commented 6 years ago

Right now I'm slicing the function up into sections and measuring variance between the average length that section runs and the longest runs. I'm also resetting every 1000 calls. So what I'm thinking is that a section that is running normally should have basically no variance. A section that is peaking every now and then should have a few runs that show a large variance.

Some issues I've come across:

mitchmindtree commented 6 years ago

Sweet, this looks like a good approach :+1:

Not sure if you've came across this yet, but this line! macro might be useful to more easily identify where in the code the profiler is referring to.

Maybe a nice way of logging this data would be to implement Serialize for whatever the collection is that stores the resulting profiling and write it to a JSON file so that it's easier to share here? Otherwise I guess you could just take a look yourself and post here the areas that seem like potential culprits.

And yep, I think the largest absolute time variance is likely what we really care about. This is probably how we will find the spikes that were showing up in the Time Profiler graph.

Also, seeing as we're working in such a high-performance domain it might also be worth timing how long it takes to measure the time between two markers with no other code between them. This will let us know if the I/O required to read from the system clock is significant enough to affect the profiling. Maybe then we could even subtract the average time taken by the markers themselves from the profiling results, but this probably isn't worth it unless it looks like the markers themselves are affecting the profiling.

freesig commented 6 years ago

Got it working pretty well using the line macro.

Seems to have a cost of about 0.05+- nano seconds which should not be an issue unless it's being called a lot in a loop.

The last piece of the puzzle is working out how to store the data for reading. I want to run a version of audio_server with this timing code overnight and then get some data that we can see logged to a file. I'm thinking maybe a graph that shows: top duration time for a section as a percentage of total average time to run the function contrast against average duration time for a section as a percentage of total average time to run the function

This should show pretty clearly which sections are peaking out periodically. Might be worth graphing the top ten. This would reset every 1000 frames. I should be able to serialize it and make the charts in google docs or plot.ly

mitchmindtree commented 6 years ago

I think there is a CSV crate for generating tables that you can read from google docs or excel but I haven't had a play with it yet

freesig commented 6 years ago

Ok finally got something to show. image So this is the top durations for a section of a of the function as a percentage of the average function run time. So you can see the top lines is this range (line numbers are slightly off due to added timing code)

This section seems to be often run over 100% of the average time for the entire function. These readings are the top values over 500 frames of audio. This chart is from reading 1000 to 2000. I skipped the start because there is some loading that peaks when the application starts. I have over 5gig of data so there's a lot more I can pull from this and make my timer display this info better. Also the next challenge will be to time inside the loops. But this will generate a lot more data and also call instant a lot more, so the readings might become less relevant. I will give it a go though. Heres the chart svg file if you want to have a look yourself

chart.svg.zip

freesig commented 6 years ago

Just ran a timing run on my mac with more timers so won't be the most accurate but it shows this section as using the most

mitchmindtree commented 6 years ago

Some other things that might be worth tracking simultaneously are:

as each of these contribute to the number of iterations in these loops. E.g. if this mixing loop that you highlight in your last comment is indeed the bottleneck, it may be because one or more sounds and their channels very occasionally become within range of many more speakers and require many more iterations. I'm not sure this reflects the behaviour though, which is more like a very occasional click/pop rather than longer periods of glitching as sound's travel through larger groups of speakers. It might be worth experimenting by reducing the PROXIMITY_LIMIT to reduce the range of speakers that the channels can reach and in turn reduce the total number of iterations. This might at least help us to identify whether or not it is the bottleneck?

freesig commented 6 years ago

Ok got some more information: image The staggered thick brown line is amount of sound_channels. I couldn't get the number of channels per sound without logging inside loops. It would also create a lot of data. Here is a view more zoomed in: image All the points performing above 100% is this section Which shows a similar result to what was happening on my computer. In the second chart the yellow line running straight across at 70% is the average for this same section. This section is occasionally running way over but it doesn't seem to correlate exactly with the number of sound_channels. I'm pretty confident that this section is the causing the glitching but will need to look closer to find something correlates more in time with those peaks

freesig commented 6 years ago

I think the next step is to time inside the loop. This is a little challenging because if you imagine the loop "unrolled" it's like a very long piece of code. So it will produce a lot of data and the calls to Instant may start to be significant. The only thing that seems like it could make that run slow would be the length of the 3 for loops right? Is there anything else that you can see in that block that wouldn't be dependent on the length of the loops?

mitchmindtree commented 6 years ago

To get further confirmation that this nested loop area is the culprit, perhaps we can track the total number of iterations that occur in that nested loop and graph them alongside the spikes that you're seeing (this is basically the same as tracking the number of sounds, channels per sound and target speakers per sound as I mentioned in my previous comment). The most interesting way might be to add a let mut count_loop_iterations = 0; at the beginning of the function and then increment it on each of the deepest nested iterations here, and then check what the resulting count is at the end of the function. My guess is that we will see a correlation between the number of loop iterations and the CPU spikes. If that is correct, then hopefully reducing the PROXIMITY_LIMIT slightly will give us enough speed back, as this should reduce the total number of target speakers for each sound channel and in turn, reduce the total number of iterations. Best to check for sure first though.

freesig commented 6 years ago

Ok so there was no correlation between the number of iterations. image The orange line running across the top in number of iterations inside the loop. The more red line that jumps up is the peaks in function runs.

With the PROXIMITY_LIMIT is it possible that when a speaker becomes in range that it jumps from 0 volume to some higher volume because it's not using ramping smoothly up once it's in range?

Another thing is if it's not the amount of loops maybe its a cache missing issue? Could explain why a loop with the same number of iterations could be held up

Todo:

freesig commented 6 years ago

Some good information on Rust and cache misses

freesig commented 6 years ago

Could it be that in this line: let channel_sample = sound.unmixed_samples[channel_sample_index]; channel_sample_index is jumping around a lot so that the cpu has to keep grabbing from different parts of memory causing cache misses?