[bug] Event emit crashes app with high call rate

i-c-b commented 8 months ago

Describe the bug

Too many calls to tauri::Window::emit in a short amount of time causes the app to crash.

Reproduction

Initial testing was conducted using a minimal reproduction with the upload plugin and downloading the latest daily Blender build. It was later narrowed down to be related to the event system and an event flooder was created which exhibits the same behaviour on a more consistent basis with 100,000 or more calls.

Expected behavior

No response

Platform and versions

[✔] Environment
    - OS: Windows 10.0.22000 X64
    ✔ WebView2: 119.0.2151.44
    ✔ MSVC: Visual Studio Community 2022
    ✔ rustc: 1.72.0 (5680fa18f 2023-08-23)
    ✔ cargo: 1.72.0 (103a7ff2e 2023-08-15)
    ✔ rustup: 1.26.0 (5af9b9484 2023-04-05)
    ✔ Rust toolchain: stable-x86_64-pc-windows-msvc (default)
    - node: 20.9.0
    - pnpm: 8.10.2
    - yarn: 1.22.19
    - npm: 10.1.0

[-] Packages
    - tauri [RUST]: 1.5.2
    - tauri-build [RUST]: 1.5.0
    - wry [RUST]: 0.24.4
    - tao [RUST]: 0.16.5
    - @tauri-apps/api [NPM]: 1.5.1
    - @tauri-apps/cli [NPM]: 1.5.6

[-] App
    - build-type: bundle
    - CSP: unset
    - distDir: ../dist
    - devPath: http://localhost:1420/
    - bundler: Vite

Stack trace

No response

Additional context

This issue was previously discussed on Discord. @FabianLars demonstrated a mitigation for the upload plugin by reducing the frequency of calls to emit on the download-throttled-events branch which led to reduced crashing but didn't eliminate it entirely. The impact this issue has on the upload plugin would be reduced in environments with slow drives and fast networks as there is more time between messages and fewer overall messages.

JJeris commented 8 months ago

I am the one who noticed the app crashing.

The download-throttled-events branch has yet to crash the app, which is a good sign, but i imagine, that the crash possibility isnt eliminated, just postponed.

This crashing behavior happened on my Win10 installation, that was, admitably, corrupt. After changing out the ssd (for good measure) and installing Win11, the crashing still happened, though not on this project (https://github.com/i-c-b/tauri-download-test), where the possibility of a crash was 1/30 download calls, where usually it was 1/2 download calls inside a react or vue test project.

Idk why the error only happend for me. My machine doesnt seem to be broken or worn out, as the pc itself is only 6-8 months old. After eliminating the ssd and or OS issue, the crashing still happened, which lead us to believe the issue is with the actual plugins code. For now, that is our best guess.

I would like to ask to save and not delete the download-throttled-events plugins branch, since otherwise i cant run the download code at all. If the crashing is deemed to occur extremely rarely, then please add a flag that can be added, so as to lower the number of emit calls.

kul-sudo commented 8 months ago

I'd also like to mention that there's a chance the bug occurs on the OS level: when the frequency of the events is too high, the process of the app becomes very heavy, so the OS terminates it. Also, here's some evidence for that: when I run the app in development, it manages to survive the heavy task of frequent emits and not get terminated, because in development it's a bit slower than in production, so the OS doesn't think it qualifies as harmful. That means that there's a chance the bug is unfixable on the side of Tauri, but there might be a workaround.

JJeris commented 8 months ago

I'd also like to mention that there's a chance the bug occurs on the OS level: when the frequency of the events is too high, the process of the app becomes very heavy, so the OS terminates it. Also, here's some evidence for that: when I run the app in development, it manages to survive the heavy task of frequent emits and not get terminated, because in development it's a bit slower than in production, so the OS doesn't think it qualifies as harmful. That means that there's a chance the bug is unfixable on the side of Tauri, but there might be a workaround.

Well the plugin has yet to crash when the event calls are limited. I havent tested it in production yet, so theres that.

But the default v1 branch crashed on both prod and dev modes.

JJeris commented 7 months ago

Tested the throttled version in production on my main project, and worked, even with two downloads happening at the same time.

JJeris commented 7 months ago

Has any further work been done for this plugin?

FabianLars commented 7 months ago

No, this issue doesn't seem to be specific to that plugin so i left the workaround branch untouched. Maybe i'll merge it into the main branches as a temporary solution since it's typically the plugin with the most events.

kul-sudo commented 7 months ago

No, this issue doesn't seem to be specific to that plugin so i left the workaround branch untouched. Maybe i'll merge it into the main branches as a temporary solution since it's typically the plugin with the most events.

Yes, that's why I wasn't sure what everyone was talking about, because the issue always happens when the events are emitted and listened for too quickly, so it isn't specific for this case.

JJeris commented 7 months ago

Gotcha, good to know!

mhtmhn commented 7 months ago

+1 I implemented my own downloader with reqwest blocking for use with std::thread and found that my app crashes silently on Windows 10. No unsafe operations and my buffers (allocated in runtime) never exceeds 100 MB. Removing events prevents the crash, tested on a 5GB download (previously crashed around 800 MB). Sending an event for every chunk is unnecessary overhead anyway, but I suppose this is an important bug that needs to be investigated further.

WinDBG reports some sort of stack overflow. Objects with source paths starting with `D:\a\_work\1\s\` are probably some precompiled stuff related to Webview / MSVC. `mgws` is the name of my crate.

``` (4d10.4e78): C++ EH exception - code e06d7363 (first chance) (4d10.4e78): C++ EH exception - code e06d7363 (first chance) (4d10.4e78): Security check failure or stack buffer overrun - code c0000409 (!!! second chance !!!) Subcode: 0x7 FAST_FAIL_FATAL_APP_EXIT ucrtbase!abort+0x4e: 00007ff8`8449286e cd29 int 29h 0:019> k # Child-SP RetAddr Call Site 00 00000065`579fa450 00007ff8`84491f9f ucrtbase!abort+0x4e 01 00000065`579fa480 00007ff7`6336ebd5 ucrtbase!terminate+0x1f 02 00000065`579fa4b0 00007ff7`63393e83 mgws!__FrameUnwindFilter+0x65 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\ehhelpers.cpp @ 144] 03 00000065`579fa4e0 00007ff7`6336e8e4 mgws!`__FrameHandler3::FrameUnwindToState'::`1'::filt$0+0xe [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\frame.cpp @ 1235] 04 00000065`579fa510 00007ff8`865f23af mgws!__C_specific_handler+0xa0 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\riscchandler.cpp @ 389] 05 00000065`579fa580 00007ff8`865a14b4 ntdll!RtlpExecuteHandlerForException+0xf 06 00000065`579fa5b0 00007ff8`865f0ebe ntdll!RtlDispatchException+0x244 07 00000065`579facc0 00007ff8`8414cf19 ntdll!KiUserExceptionDispatch+0x2e 08 00000065`579fba00 00007ff7`6336e833 KERNELBASE!RaiseException+0x69 09 00000065`579fbae0 00007ff7`6335112e mgws!_CxxThrowException+0x97 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\throw.cpp @ 78] 0a (Inline Function) --------`-------- mgws!panic_unwind::real_imp::panic+0x77 [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\panic_unwind\src\seh.rs @ 322] 0b 00000065`579fbb40 00007ff7`6334affd mgws!panic_unwind::__rust_start_panic+0x7e [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\panic_unwind\src\lib.rs @ 104] 0c 00000065`579fbb80 00007ff7`6334adf6 mgws!std::panicking::rust_panic+0xd [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\std\src\panicking.rs @ 757] 0d 00000065`579fbc10 00007ff7`6334aada mgws!std::panicking::rust_panic_with_hook+0x2a6 [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\std\src\panicking.rs @ 729] 0e 00000065`579fbd00 00007ff7`63348a49 mgws!std::panicking::begin_panic_handler::closure$0+0x8a [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\std\src\panicking.rs @ 597] 0f 00000065`579fbd80 00007ff7`6334a820 mgws!std::sys_common::backtrace::__rust_end_short_backtrace+0x9 [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\std\src\sys_common\backtrace.rs @ 170] 10 00000065`579fbdb0 00007ff7`633933a5 mgws!std::panicking::begin_panic_handler+0x70 [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\std\src\panicking.rs @ 595] 11 00000065`579fbe00 00007ff7`63393452 mgws!core::panicking::panic_fmt+0x35 [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\core\src\panicking.rs @ 67] 12 00000065`579fbe50 00007ff7`62f9ddcf mgws!core::panicking::panic+0x42 [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\core\src\panicking.rs @ 117] 13 00000065`579fbec0 00007ff7`62fecd3d mgws!ZN15futures_channel4mpsc5queue14Queue$LT$T$GT$8pop_spin17h89d0074227a785f8E+0xbf 14 00000065`579fbf20 00007ff7`62fbe25d mgws!ZN82_$LT$futures_channel..mpsc..Receiver$LT$T$GT$$u20$as$u20$core..ops..drop..Drop$GT$4drop17h93c562c262dc5797E+0x7d 15 00000065`579fc030 00007ff7`62f9e285 mgws!ZN4core3ptr44drop_in_place$LT$hyper..body..body..Body$GT$17h025de5309615ee39E.llvm.12916701759479899952+0x5d 16 00000065`579fc090 00007ff7`62f03efd mgws!ZN4core3ptr186drop_in_place$LT$alloc..collections..vec_deque..VecDeque$LT$futures_channel..oneshot..Sender$LT$hyper..client..client..PoolClient$LT$reqwest..async_impl..body..ImplStream$GT$$GT$$GT$$GT$17h3dfc8a77218b24cbE.llvm.2056436263545094218+0xb5 17 00000065`579fc0e0 00007ff7`62f109dd mgws!ZN4core3ptr54drop_in_place$LT$tauri_runtime_wry..WebviewIdStore$GT$17h21ff921831b7ea48E.llvm.3899771856510389569+0x15ed 18 00000065`579fc140 00007ff7`63372310 mgws!ZN4mgws5utils8download13download_file17h1cc5cdc8c867c651E+0x2e0d 19 00000065`579fc1c0 00007ff7`63371736 mgws!_CallSettingFrame+0x20 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\amd64\handlers.asm @ 50] 1a 00000065`579fc1f0 00007ff7`6336d838 mgws!__FrameHandler3::FrameUnwindToState+0x112 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\frame.cpp @ 1231] 1b 00000065`579fc260 00007ff7`633709cd mgws!__FrameHandler3::FrameUnwindToEmptyState+0x54 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\risctrnsctrl.cpp @ 257] 1c 00000065`579fc290 00007ff7`6336e159 mgws!__InternalCxxFrameHandler<__FrameHandler3>+0x115 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\frame.cpp @ 372] 1d 00000065`579fc2f0 00007ff8`865f242f mgws!__CxxFrameHandler3+0x71 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\risctrnsctrl.cpp @ 283] 1e 00000065`579fc340 00007ff8`86580939 ntdll!RtlpExecuteHandlerForUnwind+0xf 1f 00000065`579fc370 00007ff7`6336dd06 ntdll!RtlUnwindEx+0x339 20 00000065`579fca90 00007ff7`6336f665 mgws!__FrameHandler3::UnwindNestedFrames+0xee [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\risctrnsctrl.cpp @ 766] 21 00000065`579fcb80 00007ff7`6336fa82 mgws!CatchIt<__FrameHandler3>+0xb9 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\frame.cpp @ 1374] 22 00000065`579fcc20 00007ff7`63370ac9 mgws!FindHandler<__FrameHandler3>+0x32e [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\frame.cpp @ 670] 23 00000065`579fcd90 00007ff7`6336e159 mgws!__InternalCxxFrameHandler<__FrameHandler3>+0x211 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\frame.cpp @ 403] 24 00000065`579fcdf0 00007ff8`865f23af mgws!__CxxFrameHandler3+0x71 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\risctrnsctrl.cpp @ 283] 25 00000065`579fce40 00007ff8`865a14b4 ntdll!RtlpExecuteHandlerForException+0xf 26 00000065`579fce70 00007ff8`865f0ebe ntdll!RtlDispatchException+0x244 27 00000065`579fd580 00007ff8`8414cf19 ntdll!KiUserExceptionDispatch+0x2e 28 00000065`579fe2c0 00007ff7`6336e833 KERNELBASE!RaiseException+0x69 29 00000065`579fe3a0 00007ff7`6335112e mgws!_CxxThrowException+0x97 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\throw.cpp @ 78] 2a (Inline Function) --------`-------- mgws!panic_unwind::real_imp::panic+0x77 [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\panic_unwind\src\seh.rs @ 322] 2b 00000065`579fe400 00007ff7`6334affd mgws!panic_unwind::__rust_start_panic+0x7e [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\panic_unwind\src\lib.rs @ 104] 2c 00000065`579fe440 00007ff7`6334adf6 mgws!std::panicking::rust_panic+0xd [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\std\src\panicking.rs @ 757] 2d 00000065`579fe4d0 00007ff7`6334aada mgws!std::panicking::rust_panic_with_hook+0x2a6 [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\std\src\panicking.rs @ 729] 2e 00000065`579fe5c0 00007ff7`63348a49 mgws!std::panicking::begin_panic_handler::closure$0+0x8a [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\std\src\panicking.rs @ 597] 2f 00000065`579fe640 00007ff7`6334a820 mgws!std::sys_common::backtrace::__rust_end_short_backtrace+0x9 [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\std\src\sys_common\backtrace.rs @ 170] 30 00000065`579fe670 00007ff7`633933a5 mgws!std::panicking::begin_panic_handler+0x70 [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\std\src\panicking.rs @ 595] 31 00000065`579fe6c0 00007ff7`63393452 mgws!core::panicking::panic_fmt+0x35 [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\core\src\panicking.rs @ 67] 32 00000065`579fe710 00007ff7`63091edf mgws!core::panicking::panic+0x42 [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\core\src\panicking.rs @ 117] 33 00000065`579fe780 00007ff7`630918fe mgws!ZN15futures_channel4mpsc27BoundedSenderInner$LT$T$GT$13poll_unparked17h7847b690f01ea597E.llvm.6290563055099438679+0x2df 34 00000065`579fe7e0 00007ff7`63092f2d mgws!ZN15futures_channel4mpsc17Receiver$LT$T$GT$12next_message17h5c1600a30fe09916E.llvm.6290563055099438679+0xde 35 00000065`579fe8b0 00007ff7`6307a311 mgws!ZN89_$LT$futures_channel..mpsc..Receiver$LT$T$GT$$u20$as$u20$futures_core..stream..Stream$GT$9poll_next17h3e3528ee4ae68d92E+0x2d 36 00000065`579fe960 00007ff7`6307a7b1 mgws!ZN5hyper4body4body4Body11delayed_eof17h9dbdf6bbd663e47dE+0x2e1 37 00000065`579feab0 00007ff7`62fc4205 mgws!ZN59_$LT$hyper..body..body..Body$u20$as$u20$http_body..Body$GT$9poll_data17h36135c7dbc8fcee0E+0xc1 38 00000065`579feba0 00007ff7`62fc4000 mgws!ZN72_$LT$reqwest..async_impl..body..WrapHyper$u20$as$u20$http_body..Body$GT$9poll_data17he4360070173e1bb9E+0x25 39 00000065`579fec30 00007ff7`62f9baa3 mgws!ZN73_$LT$reqwest..async_impl..body..ImplStream$u20$as$u20$http_body..Body$GT$9poll_data17hb5b0f54dfcd66a8cE+0x90 3a 00000065`579fed20 00007ff7`62fb432d mgws!ZN101_$LT$futures_util..stream..stream..map..Map$LT$St$C$F$GT$$u20$as$u20$futures_core..stream..Stream$GT$9poll_next17hcb337f69e169a1c4E+0x23 3b 00000065`579fedc0 00007ff7`62f9b6f4 mgws!ZN124_$LT$futures_util..stream..try_stream..into_async_read..IntoAsyncRead$LT$St$GT$$u20$as$u20$futures_io..if_std..AsyncRead$GT$9poll_read17ha229db2d930e4a2fE+0xad 3c 00000065`579feed0 00007ff7`62fa2cde mgws!ZN7reqwest8blocking4wait7timeout17hf19bab160dff77e8E+0x164 3d 00000065`579fefe0 00007ff7`62f0f3de mgws!ZN71_$LT$reqwest..blocking..response..Response$u20$as$u20$std..io..Read$GT$4read17h223bdd6220b43933E+0x11e 3e 00000065`579ff0e0 00007ff7`62e87b6f mgws!ZN4mgws5utils8download13download_file17h1cc5cdc8c867c651E+0x180e 3f 00000065`579ff850 00007ff7`62f5aeae mgws!ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17hfc69141b4093a58bE+0x3f 40 00000065`579ff8d0 00007ff7`6334f02c mgws!ZN4core3ops8function6FnOnce40call_once$u7b$$u7b$vtable.shim$u7d$$u7d$17h4cf73a66527cf539E.llvm.6685387612596814721+0x32e 41 00000065`579ffa20 00007ff8`86267344 mgws!std::sys::windows::thread::impl$0::new::thread_start+0x4c [/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library\std\src\sys\windows\thread.rs @ 57] 42 00000065`579ffab0 00007ff8`865a26b1 KERNEL32!BaseThreadInitThunk+0x14 43 00000065`579ffae0 00000000`00000000 ntdll!RtlUserThreadStart+0x21 0:019> !analyze ******************************************************************************* * * * Exception Analysis * * * ******************************************************************************* *** WARNING: Check Image - Checksum mismatch - Dump: 0x106d34, File: 0x102eb3 - C:\ProgramData\Dbg\sym\ucrtbase.dll\81CF5D89100000\ucrtbase.dll PROCESS_NAME: mgws.exe ERROR_CODE: (NTSTATUS) 0xc0000409 - The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. SYMBOL_NAME: ucrtbase!abort+4e MODULE_NAME: ucrtbase IMAGE_NAME: ucrtbase.dll FAILURE_BUCKET_ID: FAIL_FAST_FATAL_APP_EXIT_c0000409_ucrtbase.dll!abort FAILURE_ID_HASH: {e31753ac-c98a-8055-3663-47e707543d20} --------- ```

JJeris commented 7 months ago

@mhtmhn But you would need some number of event calls. What would be the limit? Currently the working branch of the plugin has 10x less event calls than the v1 branch

mhtmhn commented 7 months ago

@JJeris I limited the events per second. I didn’t like the idea of throttling using an iterator as the total number of events scales with file size and we currently don’t know the root cause. I also contemplated using invoke instead to fetch a counter, but didn’t in the end.

0rvar commented 7 months ago

I have created a minimal reproduction of this that reliably crashes on Mac, have no tested on other OSes yet.

https://github.com/0rvar/tauri-sigabort-reproduction

Includes output from sanitizer=thread as well

0rvar commented 7 months ago

Update: It seems that if clipboard and globalShortcut are both disabled in tauri config allowlist, then the crashes don't happen. I still see data races with those disabled, but there's no crash in my reproduction app.

So that's a short-term solution if you do not depend on those features.

JJeris commented 7 months ago

@0rvar What did the clipboard and globalShortcut do in respect that removing them make the crashing not happen?

Also, how did you figure that out?

0rvar commented 7 months ago

@JJeris

What did the clipboard and globalShortcut do in respect that removing them make the crashing not happen?

In https://github.com/0rvar/tauri-sigabort-reproduction README.md I have included the output of running thread sanitizer on the reproduction. The output implies that more than one thread is poking around in Rc internals (the reference count) in some Rc inside DispatcherMainThreadContext at the same time. This is Bad News and it means that the single-threaded assumptions in the runtime are invalid.

// https://github.com/tauri-apps/tauri/blob/327c7aec302cef64ee7b84dc43e2154907adf5df/core/tauri-runtime-wry/src/lib.rs#L273-L286
#[derive(Debug, Clone)]
pub struct DispatcherMainThreadContext<T: UserEvent> {
  pub window_target: EventLoopWindowTarget<Message<T>>,
  pub web_context: WebContextStore,
  #[cfg(all(desktop, feature = "global-shortcut"))]
  pub global_shortcut_manager: Rc<Mutex<WryShortcutManager>>,
  #[cfg(feature = "clipboard")]
  pub clipboard_manager: Arc<Mutex<Clipboard>>,
  pub windows: Rc<RefCell<HashMap<WebviewId, WindowWrapper>>>,
  #[cfg(all(desktop, feature = "system-tray"))]
  system_tray_manager: SystemTrayManager,
}

As you can see, pub global_shortcut_manager: Rc<Mutex<WryShortcutManager>>, and pub clipboard_manager: Arc<Mutex<Clipboard>>, are both included in that struct only when their corresponding feature is active. When you change allowlist items, Tauri cli updates your Cargo.toml to add or remove features. So, compiling with those items turned off in allowlist causes the struct above to not have global_shortcut_manager and clipboard_manager.

I am very curious whether this workaround works for others as well.

goenning commented 6 months ago

Just here to say I noticed this too when upgrading from tauri 1.4.1 to 1.5.2

The workaround I'm using is basically reverting to 1.4.1

i-c-b commented 6 months ago

Thank you @0rvar for the thread tracing and @goenning for the versions. Tauri v1.5.0 introduced std::rc::Rc in place of std::sync::Arc for some parts of the event system; it's worth noting that although Tauri v2.0.0-alpha.12 implemented the same changes, it doesn't seem to experience the crashing behaviour. As for the clipboard or global-shortcut feature flags, the event flooder doesn't use them but still demonstrates consistent crashing; it's possible that these features exacerbate the crashing by adding more events into the mix than the 10_000 created in the SIGABRT/SIGSEGV reproduction.

mhtmhn commented 6 months ago

@0rvar I don't have the clipboard or global-shortcut features enabled. So it could be unrelated.

0rvar commented 6 months ago

@0rvar I don't have the clipboard or global-shortcut features enabled. So it could be unrelated.

@mhtmhn what does your allowlist look like?

mhtmhn commented 6 months ago

@mhtmhn what does your allowlist look like?

@0rvar Here you go...

``` "allowlist": { "all": false, "shell": { "all": false, "open": true }, "window": { "all": false, "hide": true, "show": true, "close": true, "center": true, "setSize": true, "minimize": true, "unminimize": true, "startDragging": true, "setAlwaysOnTop": true }, "dialog": { "all": false, "open": true, } } ```

FabianLars commented 6 months ago

Can someone who can reliably reproduce this on Windows or Linux try the event-crash branch? It has a small commit that fixes @0rvar's macOS issue on my macbook and i'd like to check whether it's really the same issue.

To test it, add this to your Cargo.toml file

[patch.crates-io]
tauri = { git = "https://github.com/tauri-apps/tauri", branch = "event-crash" }
tauri-build = { git = "https://github.com/tauri-apps/tauri", branch = "event-crash" }

then in the dir that contains that toml file run cargo update before compiling your app again.

0rvar commented 6 months ago

@FabianLars my reproduction, with "allowlist": { "all": true }, is still crashing with that patch on mac (but so far only when running with thread sanitizer)

FabianLars commented 6 months ago

Yeah, it's def not a proper fix. The underlying issue is still there but i don't think i will be able to find it so i just added back the Arc to "hide" it (which btw doesn't make sense to me either). That said, if it only crashes with the thread sanitizer this would be good enough for a hotfix until someone else finds the actual fix.

lucasfernog commented 6 months ago

https://github.com/i-c-b/tauri-download-test is working now with the latest fixes, but I just saw https://github.com/i-c-b/tauri-event-flood and it needs a little more debugging. stay tuned (actually we should just go with Fabian-Lars' fix)

FabianLars commented 6 months ago

(actually we should just go with Fabian-Lars' fix)

I'm not sure if it actually fixes anything because i'm having trouble to reproduce this issue again with and without that branch so i wanted others to try. Either way i don't think it's the actual rr complete fix (at least according to clippy lol)

lucasfernog commented 6 months ago

Well I can reproduce the issue without your branch, and when I check it out it doesn't happen anymore.

lucasfernog commented 6 months ago

On Windows it still crashes even with #8402, GetLastError() returns Error 1816 (Not enough quota is available to process). when using send_event. So basically we're hammering the event loop too hard.

Raphiiko commented 6 months ago

Coming here from this comment, I have a feeling my application might be experiencing the same issue.

I had it before when upgrading to Tauri v1.5, after which I downgraded back again to v1.4.1, which solved the issue. However recently I tried upgrading to v1.5 again, this update also started using the globalShortcut module. Same issue started occurring, so I downgraded to v1.4.1 but keeping globalShortcut enabled. This time, the errors remain happening even on v1.4.1.

Gq6wnOy

kawadakk commented 2 months ago

I'm seeing random crashes (mostly panics in Rc) in our app that I suspect to be related to this issue. In one case where I managed to reproduce it under gdb, the crash occurred in a worker (non-main) thread calling tauri::window::Window::emit. The backtrace contained a call to <DispatcherMainThreadContext as Clone>::clone, even though the safety comments suggest DispatcherMainThreadContext is only used in the main thread.

https://github.com/tauri-apps/tauri/blob/80a215a6f3e40f494afc45a68affe3bd0cf12036/core/tauri-runtime-wry/src/lib.rs#L300-L302

pxdl commented 2 months ago

Copying my message from the other issue, as it seems to belong to this one instead.

After some testing, I noticed that the download plugin worked fine at first, even when downloading large files. But now it doesn't seem to be working properly, it seems to choke when sending updates to the channel and only properly updates when the download finishes, especially with large files. After this happens, lots of errors pop up on the console after reloading the page, which can be further detailed in the Network tab:

These errors keep popping up until I restart the program completely, reloading the page isn't enough.

This is how I'm updating the download progress state (I'm passing down the handleDownloadProgress function to where I call the plugin):

function reducer(state, action) {
  switch (action.type) {
    case 'reset':
      return { total: 0, downloaded: 0 };
    case 'downloaded_chunk':
      return {
        downloaded:
          action.total !== state.total
            ? action.chunk
            : state.downloaded + action.chunk,
        total: action.total,
      };
    default:
      throw Error('Unknown action.');
  }
}

function Updater() {
const [downloadState, dispatchDownloadState] = useReducer(reducer, {
    total: 0,
    downloaded: 0,
  });

const handleDownloadProgress = ({
    progress,
    total,
  }: {
    progress: number;
    total: number;
  }) => {
    dispatchDownloadState({ type: 'downloaded_chunk', chunk: progress, total });
  };

  return (
    <Text>{`${downloadState.downloaded} bytes / ${downloadState.total} bytes`}</Text>
  );
}

Here's a small gif showing what happens. This was working fine a while ago, updating in real time: msedgewebview2_CKELyv4L2l

Originally posted by @pxdl in https://github.com/tauri-apps/plugins-workspace/issues/1266#issuecomment-2089427355

goenning commented 2 months ago

I was just looking at the 1.6.0 release notes and it mentions a bugfix on event loop crash ( https://v2.tauri.app/blog/tauri-1-6/#event-loop-crash )

Is it related to this issue or something else?

FabianLars commented 2 months ago

It's something else and wasn't even involving events iirc.

amrbashir commented 2 months ago

I spent today looking into this and unfortunately, we can't fix this easily, the Windows OS sets limits for the event loop to only handle 10,000 messages at a time, I tried implementing a queue for failed events but it has a limitation of:

slowing down normal event rate (because it requires locking a mutex)
failed events are only re-dispatched on new emit calls (because we can't have a background thread that is spinning each 15ms to retry failed, because it will probably mess up the events order)

The recommendation for now is, if you're spamming a lot of events, check if emit failed, wait a few milli seconds, 15 or probably lower, is good enough, then retry sending the same failed event, see https://github.com/tauri-apps/tauri/pull/9698 for an example.

amrbashir commented 2 months ago

If there is any internal API that causes lost events or crashes because it spams events (like the shell Command API bug descriped in #7684), please let me know and I will fix it.

goenning commented 2 months ago

@amrbashir are you able to reproduce this on Tauri 1.4.1 as well? Because it seems like it started on later versions

amrbashir commented 2 months ago

@goenning I wasn't able to reproduce any crashes, not with 1.4.1 nor with 1.5 so I'd appreciate if you could point me to a reproduction. I was using https://github.com/i-c-b/tauri-event-flood for my tests.

Also which API causes crashes for you and how often are you calling it to cause the crash

tauri-apps / tauri