mackron / miniaudio

Audio playback and capture library written in C, in a single source file.
https://miniaud.io
Other
4.08k stars 361 forks source link

emscripten threading #855

Open digitalsignalperson opened 5 months ago

digitalsignalperson commented 5 months ago

Hi, I'm curious what the challenges to move forward with emscripten threading.

As of today:

Emscripten has support for multithreading using SharedArrayBuffer in browsers. That API allows sharing memory between the main thread and web workers as well as atomic operations for synchronization, which enables Emscripten to implement support for the Pthreads (POSIX threads) API. This support is considered stable in Emscripten.

from https://emscripten.org/docs/porting/pthreads.html

mackron commented 5 months ago

That first note at the top of that article isn't something I find appealing. I'm not much of a web person and I don't know anything about COOP or COEP so not entirely sure what the implications are on that front, but miniaudio needs to "just work", so certainly enabling pthreads wholesale without an option to disable it sounds bad considering that note.

When using pthreads with Emscripten, is it using actual real threads, or is it just emulating it? If it's just emulating it, what are the tangible real-world benefits you'd get out of it? Looking at that article they make it sound like it's real threads?

digitalsignalperson commented 5 months ago

Thanks for those questions. Looking into it a bit, this is what I understand:

So for miniaudio, I think if the emscripten builds used pthreads, everything "just works". And if anyone wants to do the extra work to compile with -pthread and serve their site with COOP/COEP headers, then it doesn't actually change any code on the miniaudio side.

This blog was helpful https://unlimited3d.wordpress.com/2021/12/21/webassembly-and-multi-threading/ including the sections on "Cross-origin isolation headers" / "Isolating multi-threaded WebAssembly – what for?" to motivate why COOP/COEP are involved.

mackron commented 5 months ago

If I'm reading the Emscripten documentation correctly, it looks like __EMSCRIPTEN_PTHREADS__ will be defined if -pthread is being used. That, combined with it using actual real threads, probably makes it a reasonable thing to support in miniaudio. I'm assuming if __EMSCRIPTEN_PTHREADS__ is enabled, we just use pthreads like any other platform, and otherwise just leave it like it is now. Don't expect there to be too much additional code maintenance. I'll leave this ticket open and investigate when I get a chance. No time frame. Thanks for making me aware of this.

teropa commented 3 months ago

So for miniaudio, I think if the emscripten builds used pthreads, everything "just works". And if anyone wants to do the extra work to compile with -pthread and serve their site with COOP/COEP headers, then it doesn't actually change any code on the miniaudio side.

This seems to be the case. Building for the Emscripten worklets API with pthreads API enabled I can see an ma_resource_manager_job_thread running as a separate Web Worker process. Seems to work just fine.

digitalsignalperson commented 3 months ago

@teropa is the performance ok for you? My miniaudio dataCallback does some heavy lifting but curiously I didn't notice any performance difference with pthreads enabled.

I've been experimenting with this in a sokol project. I had to include -pthread -Wl,-u,_emscripten_run_callback_on_thread for it to compile.

Also my hack for the http-server the project uses I modify .local/lib/node_modules/http-server/lib/http-server.js to add

  this.headers['Cross-Origin-Embedder-Policy'] = 'require-corp';
  this.headers['Cross-Origin-Opener-Policy'] = 'same-origin';

I haven't explored all the considerations in https://emscripten.org/docs/porting/pthreads.html and there's other flags like PTHREAD_POOL_SIZE https://emscripten.org/docs/tools_reference/settings_reference.html#pthread-pool-size

I'm also curious about AudioWorklets per https://emscripten.org/docs/api_reference/wasm_audio_worklets.html

Audio Worklets API is based on the Wasm Workers feature. It is possible to also enable the -pthread option while targeting Audio Worklets, but the audio worklets will always run in a Wasm Worker, and not in a Pthread.

Which sounds like audio will be in another thread (without pthread support, and no change if including pthread support), but my program freezes at runtime when I try including the -DMA_ENABLE_AUDIO_WORKLETS -sAUDIO_WORKLET=1 -sWASM_WORKERS=1 -sASYNCIFY and I haven't debugged further.

mackron commented 3 months ago

@teropa It surprised me to read that you have a ma_resource_manager_job_thread instance running because I thought I explicitly disabled threading on the Emscripten build:

/* The Emscripten build cannot use threads. */
#if defined(MA_EMSCRIPTEN)
{
    resourceManagerConfig.jobThreadCount = 0;
    resourceManagerConfig.flags |= MA_RESOURCE_MANAGER_FLAG_NO_THREADING;
}
#endif

Are you using ma_engine? Or are you using a self-managed ma_resource_manager? I'm wondering if that might be working for you by coincidence rather than by design.

digitalsignalperson commented 3 months ago

for me I'm using

#define MINIAUDIO_IMPLEMENTATION
#define MA_ENABLE_ONLY_SPECIFIC_BACKENDS
#if defined(__EMSCRIPTEN__)
    #define MA_ENABLE_WEBAUDIO
    #define MA_NO_RESOURCE_MANAGER
#endif
teropa commented 3 months ago

@digitalsignalperson Performance seems good, though I've yet to measure it systematically. My audio processing is fairly light, and this translates to the audio thread being about 98% idle most of the time. The job pthread where I'm doing opus decoding looks much busier though.

Compilation flags: -pthread Linker flags: -sASYNCIFY -sAUDIO_WORKLET=1 -sWASM_WORKERS=1 -pthread -sPTHREAD_POOL_SIZE=2 -sALLOW_MEMORY_GROWTH miniaudio flags: -DMA_ENABLE_AUDIO_WORKLETS -DMA_AUDIO_WORKLETS_THREAD_STACK_SIZE=524288

The pthread pool size could probably be just 1, but I'm using an additional one for my own purposes. The trickiest bit was finding out I had to increase MA_AUDIO_WORKLETS_THREAD_STACK_SIZE as there was an obscure Emscripten error from the worklet thread otherwise. Running with the clang address sanitizer uncovered that problem as running out of stack space.

And yeah, we do also have COOP/COEP headers enabled. I assume the shared memory via SharedArrayBuffer just would not work otherwise.

Which sounds like audio will be in another thread (without pthread support, and no change if including pthread support), but my program freezes at runtime when I try including the -DMA_ENABLE_AUDIO_WORKLETS -sAUDIO_WORKLET=1 -sWASM_WORKERS=1 -sASYNCIFY and I haven't debugged further.

Right, with worklets enabled audio will always be on the Web Audio thread created by the browser, not a pthread created by emscripten. I was happy to find that's all pretty transparent with the Emscripten Audio Worklet support though. It creates the audio context, thread, and worklet, and I didn't really have to think about it. I haven't experienced any freezes either. I'm not doing any capture, so I assume that also simplifies things somewhat.

teropa commented 3 months ago

@mackron Right, yes, I'm wiring up my own ma_resource_manager with an Opus decoder backend. So I assume that's why I'm not hitting the code path where you disable threading.

mackron commented 3 months ago

Looking at the code, it looks like I disable threading in ma_engine, but I don't at the ma_resource_manager level. This was unintentional. When I first added Emscripten support, pthreads was experimental and my intention was to just not do any threading at all. With the exception of that code snippet I posted earlier, is there anything I need to do to allow you to use -pthread as miniaudio stands right now in your particular cases?

digitalsignalperson commented 3 months ago

I don't think I have it working, but I'm still learning the ropes of how to actually debug things in the browser.

Is the worklet part required? With just -pthread -Wl,-u,_emscripten_run_callback_on_thread -sPTHREAD_POOL_SIZE=1 I see the extra thread created in the devtools debugger on firefox

image

If I try to pause execution in this thread, the button greys out and says "Waiting for next execution".

image

In the debug build without -pthread my demo is showing 90fps initially, then when I click "Allow" to use my microphone, it drops to 54fps. When I enable -pthread it's exactly the same.

If I try in addition to enable audio worklets like with -DMA_ENABLE_AUDIO_WORKLETS -sAUDIO_WORKLET=1 -sWASM_WORKERS=1 -sASYNCIFY I get an assertion failure during ma_device_init() with this traceback

printErr
abort
___assert_fail
x
ma_device__on_notification
ma_device__on_notification_unlocked
x
createExportWrapper
unlock
(Async: promise callback)
unlock
(Async: EventlListener.handleEvent)
881419
881419
runEmAsmFunction
_emscripten_asm_const_int
x
ma_context_init__webaudio
ma_context_init
ma_device_init_ex
ma_device_init

For other things on the miniaudio side, I saw one #if !defined(__EMSCRIPTEN__) that seems like it can be removed. All the pthread functions are implemented, and while pthread_attr_setschedpolicy() and pthread_attr_setschedparam() are no-op, pthread_attr_setstacksize() does set the stacksize and pthread_create() uses it. Emscripten implementations here https://github.com/emscripten-core/emscripten/tree/main/system/lib/libc/musl/src/thread

mackron commented 3 months ago

@digitalsignalperson Try doing a fresh sync of the dev branch and try again. It might be fixed with this PR https://github.com/mackron/miniaudio/pull/888.

digitalsignalperson commented 3 months ago

@mackron on the dev branch now when I use the audio worklet flags that does seem to resolve the assert fail, but my app renders one frame and then is frozen without any error messages and not responding to inputs. I can get the devtools debugger to pause seemingly only in a registerOrRemoveHandler() javascript function. Maybe it's something on my end, though it works as expected without the audio worklet flags. I'll have to figure out how to debug the wasm and step through it or something.

teropa commented 3 months ago

@digitalsignalperson Does your app work if you do pure playback (no microphone activation / capture). I haven't tested that side of things and I know capture brings in a whole bunch of additional machinery on the web. Might help narrow things down.

teropa commented 3 months ago

With the exception of that code snippet I posted earlier, is there anything I need to do to allow you to use -pthread as miniaudio stands right now in your particular cases?

For our case (playback only, managed resource manager, audio worklets enabled) everything seems to be running smoothly with pthreads with the latest from dev branch. Have tested on current versions of Chrome, Firefox, Safari (Mac+iOS).

On Safari, especially on iOS, I'm seeing some memory issues triggered by pthreads and shared memory but I don't believe that's a miniaudio problem: emscripten-core/emscripten#19374

digitalsignalperson commented 3 months ago

@teropa great suggestion thank you. My app is entirely audio capture.

To test with this I did a hack to my init function

#ifndef NO_CAPTURE_TEST
    ma_device_config deviceConfig = ma_device_config_init(ma_device_type_capture);
    deviceConfig.capture.format = ma_format_f32;
    deviceConfig.capture.channels = 2;
#else
    ma_device_config deviceConfig = ma_device_config_init(ma_device_type_playback);
    deviceConfig.playback.format = ma_format_f32;
    deviceConfig.playback.channels = 2;
#endif

and in my dataCallback

void dataCallback(ma_device* pDevice, void* pOutput, const void* pInput, ma_uint32 num_frames) {
    (void) pOutput; // unused
#ifndef NO_CAPTURE_TEST
    float* in_buffer = (float*)pInput;
    unsigned channels = pDevice->capture.channels;
#else
    (void) pInput;
    float in[2048];
    float* in_buffer = in;
    unsigned channels = 1;
    for (int i = 0; i < 2048; i++) {
        in[i] = float(i)/512 - 1.0f;
    }
#endif

So now I'm faking capture with a simple triangle wave, meanwhile it's operating in playback mode.

When I do this, I'm still seeing no performance improvement with or without -pthread. I'll have to figure out how to do proper wasm debugging when I have time.