Optimization in audio graph traversal

orottier commented 2 years ago

Some improvements could be made in graph.rs when rendering an audio quantum:

[x] Store edges more efficiently (currently a single HashMap that needs to be iterated many times)
[ ] Use a specialized container type for the nodes. e.g. https://crates.io/crates/intmap
[x] Avoid the remove and insert calls of the currently processing node. That was necessary for borrow reasons. Wrap nodes in Cell or equivalent
[x] Clear the input buffers when processing of a Node is done (we don't need them anymore and this way they can be reused)
~~When a Node has only one outgoing connection in the graph, its outputs can be moved instead of copied to that Node's inputs~~ this is not useful because the inputs are immutable anyway and need to be copied/mutated to outputs
[ ] Decouple the graph-topology code from the audio-specific code

orottier commented 2 years ago

I have a branch ready for item 1 at feature/intmap-for-graph But I will have a look at the other items first before deciding if this is worth the hassle of an exotic dependency

orottier commented 2 years ago

I took a stab at Avoid the remove and insert calls of the currently processing node. @b-ma but it's very tricky. We will need a lot of unsafe code to work around it. Not sure if that would be beneficial for the project. I tried a safe, intermediate solution but it had no benefits: 084d8187f7307ab4ca

I will have another look at Use a specialized container type for the nodes now

b-ma commented 2 years ago

I tried a safe, intermediate solution but it had no benefits: https://github.com/orottier/web-audio-api-rs/commit/084d8187f7307ab4ca96d5ca95749454aa380eb2

Just tested, no sign of improvement neither sorry...

I just wonder if we could not also try to bypass the HashMap altogether to use some kind of Vec<Option<Node>> for nodes and delegating to something like https://docs.rs/index-pool/latest/index_pool/ to manage the indexes. Then in the graph parsing we could just retrieve/reinsert the nodes like that let node = self.nodes.swap(index, None), or is it silly? edit Actually I just misread the swap method, would probably need something just similar as you did...

orottier commented 2 years ago

I took a new look at intmap: 6b7f11f80a4fe4. A very slight performance increase but maybe it's just noise. Also Granular synthesis seems to regress. I would say, no merge. Next up, create more flamecharts to look for other optimizations before spending time on this again.

b-ma commented 2 years ago

I just re-tested using RefCell<Node> in the HashMap and managed to have it working this time (slowly understanding some stuff :). It's there https://github.com/orottier/web-audio-api-rs/compare/main...b-ma:web-audio-api-rs:test/graph-render and the perf improvements are quite good (better than chrome on several test cases :)

before

+ id      | name                                                                     | duration (ms) | Speedup vs. realtime  | buffer.duration (s)
- 1       | Baseline (silence)                                                       | 26            | 4615.4x               | 120
- 2       | Simple source test without resampling (Mono)                             | 41            | 2926.8x               | 120
- 3       | Simple source test without resampling (Stereo)                           | 55            | 2181.8x               | 120
- 4       | Simple source test without resampling (Stereo and positionnal)           | 173           | 693.6x                | 120
- 5       | Simple source test with resampling (Mono)                                | 82            | 1463.4x               | 120
- 6       | Simple source test with resampling (Stereo)                              | 116           | 1034.5x               | 120
- 7       | Simple source test with resampling (Stereo and positionnal)              | 232           | 517.2x                | 120
- 8       | Upmix without resampling (Mono -> Stereo)                                | 46            | 2608.7x               | 120
- 9       | Downmix without resampling (Stereo -> Mono)                              | 44            | 2727.3x               | 120
- 10      | Simple mixing (100x same buffer) - be careful w/ volume here!            | 1755          | 17.1x                 | 30
- 11      | Simple mixing (100 different buffers) - be careful w/ volume here!       | 1733          | 17.3x                 | 30
- 12      | Simple mixing with gains                                                 | 340           | 352.9x                | 120
- 13      | Granular synthesis                                                       | 2662          | 2.8x                  | 7.5
- 14      | Synth (Sawtooth with Envelope)                                           | 3442          | 34.9x                 | 120
- 15      | Synth (Sawtooth with gain - no automation)                               | 2778          | 43.2x                 | 120
- 16      | Synth (Sawtooth without gain)                                            | 1681          | 71.4x                 | 120
- 17      | Substractive Synth                                                       | 423           | 283.7x                | 120
- 18      | Stereo panning                                                           | 82            | 1463.4x               | 120
- 19      | Stereo panning with automation                                           | 82            | 1463.4x               | 120
- 20      | Sawtooth with automation                                                 | 75            | 1600.0x               | 120
- 21      | Stereo source with delay                                                 | 210           | 571.4x                | 120

after

+ id      | name                                                                     | duration (ms) | Speedup vs. realtime  | buffer.duration (s)
- 1       | Baseline (silence)                                                       | 21            | 5714.3x               | 120
- 2       | Simple source test without resampling (Mono)                             | 30            | 4000.0x               | 120
- 3       | Simple source test without resampling (Stereo)                           | 44            | 2727.3x               | 120
- 4       | Simple source test without resampling (Stereo and positionnal)           | 158           | 759.5x                | 120
- 5       | Simple source test with resampling (Mono)                                | 75            | 1600.0x               | 120
- 6       | Simple source test with resampling (Stereo)                              | 106           | 1132.1x               | 120
- 7       | Simple source test with resampling (Stereo and positionnal)              | 209           | 574.2x                | 120
- 8       | Upmix without resampling (Mono -> Stereo)                                | 39            | 3076.9x               | 120
- 9       | Downmix without resampling (Stereo -> Mono)                              | 35            | 3428.6x               | 120
- 10      | Simple mixing (100x same buffer) - be careful w/ volume here!            | 1599          | 18.8x                 | 30
- 11      | Simple mixing (100 different buffers) - be careful w/ volume here!       | 1604          | 18.7x                 | 30
- 12      | Simple mixing with gains                                                 | 300           | 400.0x                | 120
- 13      | Granular synthesis                                                       | 2347          | 3.2x                  | 7.5
- 14      | Synth (Sawtooth with Envelope)                                           | 2899          | 41.4x                 | 120
- 15      | Synth (Sawtooth with gain - no automation)                               | 2212          | 54.2x                 | 120
- 16      | Synth (Sawtooth without gain)                                            | 1332          | 90.1x                 | 120
- 17      | Substractive Synth                                                       | 414           | 289.9x                | 120
- 18      | Stereo panning                                                           | 71            | 1690.1x               | 120
- 19      | Stereo panning with automation                                           | 73            | 1643.8x               | 120
- 20      | Sawtooth with automation                                                 | 62            | 1935.5x               | 120
- 21      | Stereo source with delay                                                 | 201           | 597.0x                | 120

The downside is that I didn't manage to get rid of unsafe code in 2 places. It very localized and seems to be the same problem each time (i.e. returning a reference to the buffer in Graph::render() and AudioParamValues::get()) so maybe you would have an idea to handle that?

orottier commented 2 years ago

Amazing, I did not realize there was this much to gain still from the Graph::insert/remove stuff. Let's continue discussing at https://github.com/orottier/web-audio-api-rs/pull/199

orottier commented 1 year ago

I'm closing this issue because I think the leftover point are no longer really interesting, given the current implementation

orottier / web-audio-api-rs

Optimization in audio graph traversal #55