Open orottier opened 2 years ago
Hey, I did some (very) dirty logging and here are some impressions I have
(or maybe, we are ordering the nodes over and over again?)
I might be mistaking, but it seems to me that the ordering is done according the removal of the nodes from the graph after they finished their processing (which is de facto quite often in the granular bench), maybe a bit more than expected (I counted 1827 where I was waiting for 1500) but that could be due to the fact that related AudioBufferSource and GainNode are not always dropped at the exact same time. In any case it does not appear as if it was done on each tick or something.
But, I discovered something that could be a real problem if I'm not mistaking. I just dirty logged the number of nodes in Graph.order_nodes()
, something like:
println!("{:?}", duration);
println!("{:?}", self.nodes.keys().len());
println!("{:?}", self.ordered.len());
which this gives me at the very beginning:
18.803119ms
7506
7506
while the number of nodes in the graph should be > num nodes 3002
(~ 1500 * (src + gain))
After some struggling, it just appeared to me that the AudioParam
s seems to be stored in Graph::nodes
along with the AudioNode
s, in which case these numbers seems more logical because GainNode
has 1 param (gain
) and AudioBufferSourceNode
has 2 params (detune
and playbackRate
). So maybe we are just increasing the complexity the graph ordering by a factor of 2.5 for nothing here. Do you confirm this impression or do you think it's not that relevant regarding the implementation of the algorithm, I can't really figure out?
Thanks for digging in!
Indeed, AudioParams are treated just like AudioNodes, and are therefore part of the audio graph. Params share the same characteristics as nodes: they feed into other nodes, have internal processing state, emit 'samples', and other nodes can feed into them[1]. It is required for them to be part of the graph ordering.
18ms for sorting 7500 items is a long time! Definitely some quadratic complexity going on there, we should investigate.
But I now realize the main issue is probably that we re-sort the graph every time a node (and its params) is dropped. This is not necessary, because the topological order will never change[2] when a node disappears. Hence this code
// If there were any nodes decomissioned, clear current graph order
if nodes_dropped {
self.ordered.clear();
}
Should be changed to just selectively remove the dropped nodes from the ordered list (which is O(N), so fine).
[1] https://webaudio.github.io/web-audio-api/#dom-audionode-connect-destinationparam-output [2] I need to verify this, because a dropped node could remove a cycle and then the sorting does change. This is really an edge case though and probably not possible
I also realize we definitely need to look into 'inactive' audioparams. As you can see from the flame chart, AudioParam 'rendering' takes a lot of time. If we know an AudioParam only starts tweening at time T, can we short-circuit the results for t < T and not fill any buffers and stuff?
So in short, let's fix:
My guess is that 1) already fixes the bad benchmark result
ok, for AudioParam
s that was kind of my impression but better to be sure.
few remarks:
- dropping a node should not cause a full sort
Yup indeed, that will make a huge difference in the bench because the graph will be sorted only once in that case. The bias would thus be become, compared to an online version, that nodes are only inserted once. But in any case this is really a good thing to do.
- quadratic complexity of the sorting algo
I don't think this explain everything, but at first sight a simple thing to do would be to store marked
in the nodes themselves, so we do need to go through an entire list of nodes to check that each time, i.e. replace
// Do not visit nodes multiple times
if marked.contains(&node_id) {
return;
}
with something like:
// Do not visit nodes multiple times
if node.marked == true {
return;
}
- optimize inactive audio params
I don't think we can per se do that because, that would mean the AudioParam
would have to know if its node is active or not for this render quantum, while the param must be rendered before its node. But I think we however have simple room for improvement relying on silent
buffers:
RenderQuantum::add
if input is silent, i.e. if the param is not modulated by a signal (which is most of the case and 7,66% in the flamegraph)Much more related to GainNode
optimizations (but gains should be as cheap as possible IMO as you may have many many of them in complex routing):
param_intrisic_value
is 0 and no automation is scheduled, so the gain node just return silence tooI will have a look on a. which is really simple to do,
I think and try to advance on inactive audio nodes, for GainNode
it really simple to do and improve a lot the bench too
The example/benchmark.rs "Granular synthesis" program has terrible performance. This program adds 1500 buffer source nodes with increasing start time and a gain envelope.
I ran some profiling, which shows time slices
Probably some terrible quadratic complexity is happening in the order_nodes function. (or maybe, we are ordering the nodes over and over again?) Next up: audio param profiling
(open image in new tab to get mouse hover info)