Design issue: having playback handling and soundfont in same structure incentivizes people to write memory hogs(?)

ell1e commented 2 years ago

Okay, so I might be missing something. However, I feel like in some audio pipeline designs TinySoundFont's current API design has a potentially significant, bad impact on how I need to manage my memory. It seems to be written with the limiting idea that there is one single global midi device (or only one song playing/rendering per one soundfont, which seems alike in practice). Here is why it bothers me:

Hard to fit into "mixer-first" approach: in a project where the central audio handling is a digital mixer with a midi renderer just dropped in as a "child item" parented by any audio channel wanting to play midi, I run into a roadblock: if I use one tsf* shared, then suddenly two midi-playing audio channels can no longer render their audio independently which goes completely against how clean audio mixers work. If I use one tsf* per midi-playing audio channel instead, then I have to load the entire soundfont multiple times, effectively duplicating the whole thing in memory. Otherwise I have to go back to one single "special" midi channel, but this seems alien when added to a modern mixer with otherwise high limits on parallel sounds and songs.

*Why multiple `tsf`s is bad:* even for a small'ish 30MB GM soundfont, loading it 2+ times can be significant. Sure, many heavier programs use 1GB+ these days, but nice well-integrated modern midi is particularly interesting e.g. for low-spec retro style games. Ones that may e.g. target Raspberry PI 1-2! But then loading such a soundfont even once could easily be 10%+ of the app's entire memory allocation*, let alone multiple times. That seems like quite a bummer. Of course, on modern desktops it's less of an issue. (Although Fluid 3 GM is 300mb, right? So still far from ideal, depending on the use.)

Other impacted uses: I think it also impacts things like background rendering/baking of songs while others are playing. I wouldn't know how to do that efficiently, since the basic instrument data can't be easily read-only memory shared without duplicating it all or tons of locking. So either memory use is inefficient, or CPU use. Also, cross-fading between two midi tracks would be easier to implement in less memory wasteful ways without this limitation.

My deepest apologies if I got something wrong, but all in all this looks like a significant, and kind of unnecessary limitation to me.

Solution ideas, ranked from what I personally think could be best to worst:

A tsf_duplicate is added, with the soundfont not being copied but shared via an internal, separately allocated structure via reference counting. Downsides: 1. it'd be somewhat non-obvious that this new duplicate is very memory efficient so people who need it may not realize this, 2. if the ref counting is handled internally then using it from threads makes it slightly harder, since e.g., multiple threads freeing their tsf* instance would not be safe without locks anymore (if linked via tsf_duplicate), while it is now.
Or tsf_voice_render gets a companion tsf_voice_render_channels_range which allows specifying a channel subrange. Only notes in that range would end up in the render. Downsides: Would quite clumsy, and not help with threading efficiency. Might however enable other cool uses I didn't think of, and solve the memory waste for people who don't utterly need lock-free threaded rendering but just "independent enough" rendering (by letting them manage channel subranges to use for parallel songs).
Or tsf* with the actual soundfont data is refactored to be separated from a new tsfplay* which has the voice tracking, render functionality, and that - with multiple tsfplay* possibly referencing one tsf* for read-only instrument data. Downsides: while I think this would be the most natural approach, it might need existing users to do major code rewrites. That seems undesirable.

My deepest apologies if there already is a solution to all of this that I missed.

ell1e commented 2 years ago

Oops, seems like https://github.com/schellingb/TinySoundFont/pull/1 implements solution 1. What's the hold-up with that one?

schellingb commented 2 years ago

Hey there, thanks for the detailed post and sorry for never acting on #1!! I remember looking at it and thinking it was a very nice extension that didn't introduce too much overhead compared to the benefit it brings. It being non-API breaking certainly is a big plus, too.

In my own projects one tsf instance always suited me so I unfortunately never chased after it... But if you could try it out and see if it works as described and if it solves the issues for you then I want to go ahead and merge it.

ell1e commented 2 years ago

if you could try it out

I had initially planned to, but it's based on a TSF version older than some of the functions I use. Does Github maybe offer some easy way to get a rebased version on the latest master?

Edit: I rebased it manually now, see #65

schellingb / TinySoundFont

Design issue: having playback handling and soundfont in same structure incentivizes people to write memory hogs(?) #62