Open i-c-b opened 8 months ago
I am the one who noticed the app crashing.
The download-throttled-events
branch has yet to crash the app, which is a good sign, but i imagine, that the crash possibility isnt eliminated, just postponed.
This crashing behavior happened on my Win10 installation, that was, admitably, corrupt. After changing out the ssd (for good measure) and installing Win11, the crashing still happened, though not on this project (https://github.com/i-c-b/tauri-download-test), where the possibility of a crash was 1/30 download calls, where usually it was 1/2 download calls inside a react or vue test project.
Idk why the error only happend for me. My machine doesnt seem to be broken or worn out, as the pc itself is only 6-8 months old. After eliminating the ssd and or OS issue, the crashing still happened, which lead us to believe the issue is with the actual plugins code. For now, that is our best guess.
I would like to ask to save and not delete the download-throttled-events
plugins branch, since otherwise i cant run the download code at all. If the crashing is deemed to occur extremely rarely, then please add a flag that can be added, so as to lower the number of emit calls.
I'd also like to mention that there's a chance the bug occurs on the OS level: when the frequency of the events is too high, the process of the app becomes very heavy, so the OS terminates it. Also, here's some evidence for that: when I run the app in development, it manages to survive the heavy task of frequent emits and not get terminated, because in development it's a bit slower than in production, so the OS doesn't think it qualifies as harmful. That means that there's a chance the bug is unfixable on the side of Tauri, but there might be a workaround.
I'd also like to mention that there's a chance the bug occurs on the OS level: when the frequency of the events is too high, the process of the app becomes very heavy, so the OS terminates it. Also, here's some evidence for that: when I run the app in development, it manages to survive the heavy task of frequent emits and not get terminated, because in development it's a bit slower than in production, so the OS doesn't think it qualifies as harmful. That means that there's a chance the bug is unfixable on the side of Tauri, but there might be a workaround.
Well the plugin has yet to crash when the event calls are limited. I havent tested it in production yet, so theres that.
But the default v1 branch crashed on both prod and dev modes.
Tested the throttled version in production on my main project, and worked, even with two downloads happening at the same time.
Has any further work been done for this plugin?
No, this issue doesn't seem to be specific to that plugin so i left the workaround branch untouched. Maybe i'll merge it into the main branches as a temporary solution since it's typically the plugin with the most events.
No, this issue doesn't seem to be specific to that plugin so i left the workaround branch untouched. Maybe i'll merge it into the main branches as a temporary solution since it's typically the plugin with the most events.
Yes, that's why I wasn't sure what everyone was talking about, because the issue always happens when the events are emitted and listened for too quickly, so it isn't specific for this case.
Gotcha, good to know!
+1 I implemented my own downloader with reqwest blocking for use with std::thread and found that my app crashes silently on Windows 10. No unsafe operations and my buffers (allocated in runtime) never exceeds 100 MB. Removing events prevents the crash, tested on a 5GB download (previously crashed around 800 MB). Sending an event for every chunk is unnecessary overhead anyway, but I suppose this is an important bug that needs to be investigated further.
WinDBG reports some sort of stack overflow. Objects with source paths starting with `D:\a\_work\1\s\` are probably some precompiled stuff related to Webview / MSVC. `mgws` is the name of my crate.
@mhtmhn But you would need some number of event calls. What would be the limit? Currently the working branch of the plugin has 10x less event calls than the v1 branch
@JJeris I limited the events per second. I didn’t like the idea of throttling using an iterator as the total number of events scales with file size and we currently don’t know the root cause. I also contemplated using invoke instead to fetch a counter, but didn’t in the end.
I have created a minimal reproduction of this that reliably crashes on Mac, have no tested on other OSes yet.
https://github.com/0rvar/tauri-sigabort-reproduction
Includes output from sanitizer=thread as well
Update: It seems that if clipboard
and globalShortcut
are both disabled in tauri config allowlist
, then the crashes don't happen. I still see data races with those disabled, but there's no crash in my reproduction app.
So that's a short-term solution if you do not depend on those features.
@0rvar What did the clipboard and globalShortcut do in respect that removing them make the crashing not happen?
Also, how did you figure that out?
@JJeris
What did the clipboard and globalShortcut do in respect that removing them make the crashing not happen?
In https://github.com/0rvar/tauri-sigabort-reproduction README.md I have included the output of running thread sanitizer on the reproduction. The output implies that more than one thread is poking around in Rc internals (the reference count) in some Rc inside DispatcherMainThreadContext at the same time. This is Bad News and it means that the single-threaded assumptions in the runtime are invalid.
// https://github.com/tauri-apps/tauri/blob/327c7aec302cef64ee7b84dc43e2154907adf5df/core/tauri-runtime-wry/src/lib.rs#L273-L286
#[derive(Debug, Clone)]
pub struct DispatcherMainThreadContext<T: UserEvent> {
pub window_target: EventLoopWindowTarget<Message<T>>,
pub web_context: WebContextStore,
#[cfg(all(desktop, feature = "global-shortcut"))]
pub global_shortcut_manager: Rc<Mutex<WryShortcutManager>>,
#[cfg(feature = "clipboard")]
pub clipboard_manager: Arc<Mutex<Clipboard>>,
pub windows: Rc<RefCell<HashMap<WebviewId, WindowWrapper>>>,
#[cfg(all(desktop, feature = "system-tray"))]
system_tray_manager: SystemTrayManager,
}
As you can see, pub global_shortcut_manager: Rc<Mutex<WryShortcutManager>>,
and pub clipboard_manager: Arc<Mutex<Clipboard>>,
are both included in that struct only when their corresponding feature is active. When you change allowlist
items, Tauri cli updates your Cargo.toml to add or remove features. So, compiling with those items turned off in allowlist causes the struct above to not have global_shortcut_manager
and clipboard_manager
.
I am very curious whether this workaround works for others as well.
Just here to say I noticed this too when upgrading from tauri 1.4.1 to 1.5.2
The workaround I'm using is basically reverting to 1.4.1
Thank you @0rvar for the thread tracing and @goenning for the versions. Tauri v1.5.0 introduced std::rc::Rc
in place of std::sync::Arc
for some parts of the event system; it's worth noting that although Tauri v2.0.0-alpha.12 implemented the same changes, it doesn't seem to experience the crashing behaviour. As for the clipboard
or global-shortcut
feature flags, the event flooder doesn't use them but still demonstrates consistent crashing; it's possible that these features exacerbate the crashing by adding more events into the mix than the 10_000 created in the SIGABRT/SIGSEGV reproduction.
@0rvar I don't have the clipboard or global-shortcut features enabled. So it could be unrelated.
@0rvar I don't have the clipboard or global-shortcut features enabled. So it could be unrelated.
@mhtmhn what does your allowlist look like?
@mhtmhn what does your allowlist look like?
@0rvar Here you go...
Can someone who can reliably reproduce this on Windows or Linux try the event-crash branch? It has a small commit that fixes @0rvar's macOS issue on my macbook and i'd like to check whether it's really the same issue.
To test it, add this to your Cargo.toml file
[patch.crates-io]
tauri = { git = "https://github.com/tauri-apps/tauri", branch = "event-crash" }
tauri-build = { git = "https://github.com/tauri-apps/tauri", branch = "event-crash" }
then in the dir that contains that toml file run cargo update
before compiling your app again.
@FabianLars my reproduction, with "allowlist": { "all": true }
, is still crashing with that patch on mac (but so far only when running with thread sanitizer)
Yeah, it's def not a proper fix. The underlying issue is still there but i don't think i will be able to find it so i just added back the Arc to "hide" it (which btw doesn't make sense to me either). That said, if it only crashes with the thread sanitizer this would be good enough for a hotfix until someone else finds the actual fix.
https://github.com/i-c-b/tauri-download-test is working now with the latest fixes, but I just saw https://github.com/i-c-b/tauri-event-flood and it needs a little more debugging. stay tuned (actually we should just go with Fabian-Lars' fix)
(actually we should just go with Fabian-Lars' fix)
I'm not sure if it actually fixes anything because i'm having trouble to reproduce this issue again with and without that branch so i wanted others to try. Either way i don't think it's the actual rr complete fix (at least according to clippy lol)
Well I can reproduce the issue without your branch, and when I check it out it doesn't happen anymore.
On Windows it still crashes even with #8402, GetLastError() returns Error 1816 (Not enough quota is available to process)
. when using send_event.
So basically we're hammering the event loop too hard.
Coming here from this comment, I have a feeling my application might be experiencing the same issue.
I had it before when upgrading to Tauri v1.5, after which I downgraded back again to v1.4.1, which solved the issue. However recently I tried upgrading to v1.5 again, this update also started using the globalShortcut
module. Same issue started occurring, so I downgraded to v1.4.1 but keeping globalShortcut
enabled. This time, the errors remain happening even on v1.4.1.
I'm seeing random crashes (mostly panics in Rc
) in our app that I suspect to be related to this issue. In one case where I managed to reproduce it under gdb, the crash occurred in a worker (non-main) thread calling tauri::window::Window::emit
. The backtrace contained a call to <DispatcherMainThreadContext as Clone>::clone
, even though the safety comments suggest DispatcherMainThreadContext
is only used in the main thread.
Copying my message from the other issue, as it seems to belong to this one instead.
After some testing, I noticed that the download plugin worked fine at first, even when downloading large files. But now it doesn't seem to be working properly, it seems to choke when sending updates to the channel and only properly updates when the download finishes, especially with large files. After this happens, lots of errors pop up on the console after reloading the page, which can be further detailed in the Network tab:
These errors keep popping up until I restart the program completely, reloading the page isn't enough.
This is how I'm updating the download progress state (I'm passing down the handleDownloadProgress
function to where I call the plugin):
function reducer(state, action) {
switch (action.type) {
case 'reset':
return { total: 0, downloaded: 0 };
case 'downloaded_chunk':
return {
downloaded:
action.total !== state.total
? action.chunk
: state.downloaded + action.chunk,
total: action.total,
};
default:
throw Error('Unknown action.');
}
}
function Updater() {
const [downloadState, dispatchDownloadState] = useReducer(reducer, {
total: 0,
downloaded: 0,
});
const handleDownloadProgress = ({
progress,
total,
}: {
progress: number;
total: number;
}) => {
dispatchDownloadState({ type: 'downloaded_chunk', chunk: progress, total });
};
return (
<Text>{`${downloadState.downloaded} bytes / ${downloadState.total} bytes`}</Text>
);
}
Here's a small gif showing what happens. This was working fine a while ago, updating in real time:
Originally posted by @pxdl in https://github.com/tauri-apps/plugins-workspace/issues/1266#issuecomment-2089427355
I was just looking at the 1.6.0 release notes and it mentions a bugfix on event loop crash ( https://v2.tauri.app/blog/tauri-1-6/#event-loop-crash )
Is it related to this issue or something else?
It's something else and wasn't even involving events iirc.
I spent today looking into this and unfortunately, we can't fix this easily, the Windows OS sets limits for the event loop to only handle 10,000 messages at a time, I tried implementing a queue for failed events but it has a limitation of:
The recommendation for now is, if you're spamming a lot of events, check if emit failed, wait a few milli seconds, 15 or probably lower, is good enough, then retry sending the same failed event, see https://github.com/tauri-apps/tauri/pull/9698 for an example.
If there is any internal API that causes lost events or crashes because it spams events (like the shell Command
API bug descriped in #7684), please let me know and I will fix it.
@amrbashir are you able to reproduce this on Tauri 1.4.1 as well? Because it seems like it started on later versions
@goenning I wasn't able to reproduce any crashes, not with 1.4.1 nor with 1.5 so I'd appreciate if you could point me to a reproduction. I was using https://github.com/i-c-b/tauri-event-flood for my tests.
Also which API causes crashes for you and how often are you calling it to cause the crash
Describe the bug
Too many calls to
tauri::Window::emit
in a short amount of time causes the app to crash.Reproduction
Initial testing was conducted using a minimal reproduction with the
upload
plugin and downloading the latest daily Blender build. It was later narrowed down to be related to the event system and an event flooder was created which exhibits the same behaviour on a more consistent basis with 100,000 or more calls.Expected behavior
No response
Platform and versions
Stack trace
No response
Additional context
This issue was previously discussed on Discord. @FabianLars demonstrated a mitigation for the
upload
plugin by reducing the frequency of calls toemit
on thedownload-throttled-events
branch which led to reduced crashing but didn't eliminate it entirely. The impact this issue has on theupload
plugin would be reduced in environments with slow drives and fast networks as there is more time between messages and fewer overall messages.