libsdl-org / SDL

Simple Directmedia Layer
https://libsdl.org
zlib License
9.87k stars 1.83k forks source link

Issues with SDL_SetAudioStreamFormat #8151

Closed 0x1F9F1 closed 1 year ago

0x1F9F1 commented 1 year ago

SDL_SetAudioStreamFormat currently has a few issues/annoyances:

icculus commented 1 year ago

Changing the source format does not clear the input queue, so there is likely to be stale data left in there.

So the idea was that one could change the format for two reasons:

The first one needs to have the queued audio convert so you don't get scrambled data at the edges of the buffer, the other shouldn't resample at all.

Clearly the intention was to prevent scrambled data because it converts the history buffer, but forgot to convert the existing queue. But this could be very expensive if someone dumps a gigantic .wav file to the audio stream in one shot and then moves on to another wav in another format right away...it would be converting a big buffer that could probably be avoided.

Maybe we should keep track of where these conversions happen in the stream? That could be a giant pain in the butt.

We could also say "resampling is okay but if you change data format we'll clear the queue" but that seems like a hostile move, and the app is going to risks gaps in the audio or losing a little of their output if they have to wait for the stream to drain before changing the format.

Also, it's weird in any case that changing the sample rate works differently than changing the data format, so maybe this is the wrong way to think of things in general.

I don't have a solution decided yet, I'm just thinking out loud.

None of the internal stream state actually relies on the output format (only the output frequency, though even that isn't technically true now that the history buffer is a constant size). If we want to mix in F32 it may be worth adding a version of SDL_GetAudioStreamAvailable/SDL_GetAudioStreamData which specifies the output format.

We might use that internally (although we dictate the output format when an audio stream is bound, so we can just set it to float), but I think there's a simplicity to the existing API, that it might not be worth changing even though we could. Let me think on it, though.

slouken commented 1 year ago

Does it make sense that the format change is queued and applied only when new data is processed?

0x1F9F1 commented 1 year ago

Also, it's weird in any case that changing the sample rate works differently than changing the data format, so maybe this is the wrong way to think of things in general.

Yeah playback speed needs to be a separate function. If someone queues some S16/44100 audio, then some S16/48000, I doubt they would expect the first part to suddenly start playing faster. Plus, you can think of playback speed as src_rate * speed, or dst_rate / speed. You don't need to "change" the input to do it.

Does it make sense that the format change is queued and applied only when new data is processed?

I think something like that would be the preferred option. Keep track of the format(s) in the queue and just let GetAudioStreamDataInternal handle the switch when it reaches a different format.

Maybe we should keep track of where these conversions happen in the stream? That could be a giant pain in the butt.

I don't think it would actually add much extra complexity, just shift it from SetAudioStreamFormat into GetAudioStreamDataInternal

icculus commented 1 year ago

Okay, I've been thinking about this and honestly it's too irresistible to not be able to just input and output in whatever format on demand.

For input, we track where the format changes for internal use, and since output no longer has internal state, we just spit out whatever is requested at the moment.

The question is what to do about things that don't change. Apps aren't going to want to keep an AudioSpec around just to feed the stream more bytes, so we're probably going to have to let them set a default that is honored if they pass a NULL spec. Not sure if this is awkward or not yet.

Internally, we'll always explicitly use a spec, solving the question of "what happens if the app changes format without the device knowing?" Because now they couldn't!

slouken commented 1 year ago

Just add APIs to set the stream input format and output format (two functions? one?) and call it a day. Now you'll be able to set a NULL destination format for streams that are going to be bound to a logical device, which solves @sezero's question about what to pass as the destination parameter for SDL_mixer.

icculus commented 1 year ago

We already have those, so that's good to go.

So I guess what we're talking about is:

slouken commented 1 year ago

Let's not expose a public API for setting the format per chunk. That's going to be confusing for most users, when they can just make the call to set the format separately.

ChadTheNinja commented 1 year ago

My suggestion would be to keep it as simple as possible and set the format once when a stream is created. Things like 3D positional audio and other effects that is going to require incoming data to modified on the fly so for anything more complex than basic playback and mixing the user is going to have handle these conversions before they enter the audio stream.

0x1F9F1 commented 1 year ago

My suggestion would be to keep it as simple as possible and set the format once when a stream is created. Things like 3D positional audio and other effects that is going to require incoming data to modified on the fly so for anything more complex than basic playback and mixing the user is going to have handle these conversions before they enter the audio stream.

It's kinda necessary to support changing formats on the fly, as SDL2 also supported it. The broken support was more of just an oversight after the audio system was rewritten.

Either way, I'll close the issue now since it's been fixed.

icculus commented 1 year ago

My suggestion would be to keep it as simple as possible and set the format once when a stream is created

That is how it worked for SDL2, where audio streams were mostly a means to buffer audio and convert variable-length data between two formats, but our needs are greater in SDL3.

It's now extremely necessary to allow format changes on the fly: since SDL_AudioStreams are how we send audio to the device in SDL3, a media player with a playlist will want to change formats as it starts the next file, without a gap in playback. SDL_mixer will use one stream per "channel" where arbitrary data will be used and swapped in.

On SDL's end, it might migrate between devices behind the scenes (user plugs in headphones, chooses a new system default device, etc), and being able to just change the stream's output format and keep going without the app even knowing is super-awesome.