tesselode / kira

Library for expressive game audio.
https://crates.io/crates/kira
Apache License 2.0
855 stars 43 forks source link

PyO3 interface for Kira #91

Open jpedrick opened 5 months ago

jpedrick commented 5 months ago

Related to #90 , I think it'd be useful to have a Python interface for Kira to make implementing sound effects using Python easy. Initially, just providing hooks to transform StaticSoundData would be good enough for my use case.

jpedrick commented 5 months ago

The main challenge I see is that the frames are interleaved [ (left, right), ... ], which isn't immediately the easiest to work with.

I wonder if this structure would be simpler:

pub struct StaticSoundData {
    pub sample_rate: [u32](https://doc.rust-lang.org/nightly/std/primitive.u32.html),
    pub left: Arc<[f32]>,
    pub right: Arc<[f32]>,
    pub settings: [StaticSoundSettings](https://docs.rs/kira/latest/kira/sound/static_sound/struct.StaticSoundSettings.html),
}
tesselode commented 5 months ago

How would that make it easier to work with? Is there something about Python that makes it harder to work with interleaved data?

jpedrick commented 4 months ago

Yes, generally most math libraries expect de-structured data. For example:

https://numpy.org/doc/stable/reference/generated/numpy.fft.fft.html

Having the structure as [(f32, f32)] means that users will need to split and copy the data into two separate arrays by iterating in python. This prevents the user from directly supplying the array to numpy libraries and forces them to run iterate in the python interpreter unless they implement another way to de-structure the data themselves.

SolarLiner commented 4 months ago

It's easy to make your array library understand interleaved data, because it's the transpose. In numpy the transpose is a view (IIRC) and so is effectively free.

The bigger problem with using high-level languages to process audio is that they are not made to be "real-time safe". Processing audio in real-time needs careful programming and use of algorithms that are themselves real-time safe. Here is a good article about what it means to write real-time audio processing code.

One of the biggest sources of problems in the audio thread is allocations. Asking the OS for some memory means using a system call, at which point the OS may pause the thread (for an unknown amount of time) and do something else, then searching for an appropriate region of memory to allocate to the process, before finally returning control to the audio thread; needless to say, this can make the audio thread miss the deadline for processing audio, which results in audio glitches.

Python allocates memory for every object created, which means you wouldn't be able to do anything really without triggering an allocation; it's therefore not a good candidate for writing real-time audio processing code.

The same argument can be made for every other language with automatic memory management, with stop-the-world garbage collection or even with reference-counting. Especially with games, subpar audio thread performance is definitely going to be noticeable event with only a few effects active.

This is not a "this is what happens in the worse-case scenario" either; even if allocations trigger a pop in 0.1% of cases, that's still once every 10 seconds on average.

The only way to compromise on this is to ask for a big buffer size on the backend, so that each audio callback has more time to complete, and the effects of the scheduler, allocations, and locks are diminished; however this is at the cost of latency.

Python bindings make sense for writing games in higher-level languages, and using the raw power of Rust for the performance, but the reverse is sadly not feasible.

jpedrick commented 4 months ago

@SolarLiner my use case would be for transforming static assets, not realtime assets. Mostly, I think it would make sense to prototype transformations in Python, then for production code you translate that into Rust (or C, etc.)

I see your point with avoiding inserting python hooks into realtime audio processing though. With that in mind, perhaps a better choice would be to provide some C-style callbacks for extending Kira and letting the user shoot their own foot with python if they would like to.

SolarLiner commented 4 months ago

Kira is a real-time audio engine for games, so transforming static assets is strictly speaking outside the scope (though @tesselode is the final arbiter on this).

If you want to preprocess your audio files with Python there are many libraries available. You can then call the script in build.rs or in a Makefile (or Justfile, or Cargo Makefile, or xtask, etc., pick your poison). If you really want to do it at runtime, you can embed Python so you can run Python scripts from your binary.