Audio data is still - even by modern standards, and with compression - pretty heavy, and the human ear is incredibly good at recognizing repeating audio! We can combat that to some extent with layering, modulation, and effects, but that also gets pretty expensive when done extensively.
Thus, I propose a lightweight solution for assembling long sounds by sequencing shorter clips, usable as a direct replacement for ordinary waves where applicable.
Clips are all in the same format, and same sample rate
NO arguments
NO real-time parameters
NO start offset
Instances are NOT guaranteed to be unique, but can be shared across voices
Output is defined by a list of timestamped events, triggering the playback of clips
Structured Waves can optionally be Continuous
Continuous Structured Waves (unlike looped waves) do NOT have a defined starting point
Voices are even more likely to share instances
Multiple layers/tracks (or maybe just think of events as voices)
Per-event parameters:
Source clip
Start time
Start time randomization
Duration (default: length of clip)
Gain
Fade in (polynomial/spline?)
Fade out (polynomial/spline?)
Pan/balance
Randomization
Repeat:
Delay
Randomization
This design should allow for easy and intuitive creation of complex layered sounds, a very efficient implementation, and easy delegation to worker threads with state-global pre-buffering for zero latency playback. Continuous structured waves would allow instances (with buffers, worker threads etc) to be recycled, with shared buffers, forking etc.
Note that the goal here is to optimize for the single-instance case, meaning there will likely be occasional flanging/phasing issues if multiple instances of a Structured Wave are played simultaneously. It's NOT intended as an infinite round-robin generator for "polyphonic" playback! For that kind of use cases, more sophisticated solutions, such as actual round-robins, proper multi-oscillator voices with modulation, and/or live synthesis, are recommended.
Audio data is still - even by modern standards, and with compression - pretty heavy, and the human ear is incredibly good at recognizing repeating audio! We can combat that to some extent with layering, modulation, and effects, but that also gets pretty expensive when done extensively.
Thus, I propose a lightweight solution for assembling long sounds by sequencing shorter clips, usable as a direct replacement for ordinary waves where applicable.
This design should allow for easy and intuitive creation of complex layered sounds, a very efficient implementation, and easy delegation to worker threads with state-global pre-buffering for zero latency playback. Continuous structured waves would allow instances (with buffers, worker threads etc) to be recycled, with shared buffers, forking etc.
Note that the goal here is to optimize for the single-instance case, meaning there will likely be occasional flanging/phasing issues if multiple instances of a Structured Wave are played simultaneously. It's NOT intended as an infinite round-robin generator for "polyphonic" playback! For that kind of use cases, more sophisticated solutions, such as actual round-robins, proper multi-oscillator voices with modulation, and/or live synthesis, are recommended.
(See also #247 and #354.)