[bug] `run_iteration` example busy-loops on Windows

ReactorScram commented 5 months ago

Describe the bug

I'm getting this output indicating a busy loop:

Ran Tauri iteration at 4.7094867s
Ran Tauri iteration at 4.7097679s
Ran Tauri iteration at 4.7099947s
Ran Tauri iteration at 4.710248s
Ran Tauri iteration at 4.7104644s
Ran Tauri iteration at 4.7106774s
Ran Tauri iteration at 4.7108921s
Ran Tauri iteration at 4.7111063s
Ran Tauri iteration at 4.7113324s
Ran Tauri iteration at 4.7115689s
Ran Tauri iteration at 4.7118283s
Ran Tauri iteration at 4.7120621s
Ran Tauri iteration at 4.7122877s
Ran Tauri iteration at 4.712554s
Ran Tauri iteration at 4.8543264s

Reproduction

https://github.com/ReactorScram/tauri_repro

Run cargo tauri dev or cd src-tauri; cargo run. stdout will be filled with constant print statements.

fn main() {
    let mut app = tauri::Builder::default()
        .invoke_handler(tauri::generate_handler![greet])
        .build(tauri::generate_context!())
        .expect("should be able to build the Tauri app");

    let start_time = std::time::Instant::now();

    loop {
        let iteration = app.run_iteration();
        println!("Ran Tauri iteration at {:?}", start_time.elapsed());
        if iteration.window_count == 0 {
            break;
        }
    }
}

Expected behavior

I expected from #5532 that run_iteration would block until the main event loop receives an event, then process it, then return. Instead it doesn't seem to block, at least on Windows.

This is part of a larger yak shave for me. I'm trying to join a worker thread after Tauri exits, but all these methods have failed:

app.exit doesn't trigger RunEvent::Exit (intentional behavior per #4662), so I can't clean up in there
run_iteration busy loops
No way to inject my own RunEvent into the main loop to wake up run and check whether we need to clean up
No known way to just stop the Tauri event loop and let things Drop naturally, per #7358
No way to register something like on_exit with Builder https://docs.rs/tauri/latest/tauri/struct.Builder.html

My fallback plan is to write my own cleanup function and call that, and have it call app.exit at the end. It'll work, but it'll require some comments explaining it.

Full `tauri info` output

(I'm on Windows 11 Home. I'm not sure why it says Windows 10.)

WARNING: no lock files found, defaulting to npm

[✔] Environment
    - OS: Windows 10.0.22621 X64
    ✔ WebView2: 120.0.2210.133
    ✔ MSVC:
        - Visual Studio Build Tools 2019
        - Visual Studio Community 2022
    ✔ rustc: 1.75.0 (82e1608df 2023-12-21)
    ✔ cargo: 1.75.0 (1d8b05cdd 2023-11-20)
    ✔ rustup: 1.26.0 (5af9b9484 2023-04-05)
    ✔ Rust toolchain: stable-x86_64-pc-windows-msvc (environment override by RUSTUP_TOOLCHAIN)
    - node: 20.10.0
    - npm: 10.2.3

[-] Packages
    - tauri [RUST]: 1.5.4
    - tauri-build [RUST]: 1.5.1
    - wry [RUST]: 0.24.7
    - tao [RUST]: 0.16.6
    - tauri-cli [RUST]: 1.5.6
    - @tauri-apps/api : not installed!
    - @tauri-apps/cli [NPM]: 1.5.6 (outdated, latest: 1.5.9)

[-] App
    - build-type: bundle
    - CSP: unset
    - distDir: ../src
    - devPath: ../src

Stack trace

N/A

Additional context

~~If this is a platform limitation, like maybe it's just impossible to gracefully stop a GUI event loop on Windows, I would understand.~~

~~But if it's not... can I just have an app.stop that stops the event loop, so everything can Drop?~~

Seems to be a tao thing, see later comment

ReactorScram commented 5 months ago

Okay I guess it's a limitation inherited from tao? https://docs.rs/tao/latest/tao/event_loop/struct.EventLoop.html#method.run

https://docs.rs/tao/latest/x86_64-pc-windows-msvc/src/tao/platform_impl/windows/event_loop.rs.html#223

ReactorScram commented 5 months ago

Will try https://docs.rs/tauri/latest/tauri/plugin/struct.Builder.html#method.on_drop. I haven't used a plugin yet but maybe that's the more idiomatic way to do it.

FabianLars commented 5 months ago

iirc on_drop only runs if you use remove_plugin, it's not part of the cleanup that app.exit() runs (it only cleans up the tray and Commands i think)

ReactorScram commented 5 months ago

Yeah I checked the code and the only cleanup that really happens is something like stopping child processes. I don't have any other reason to use a plugin, so if I have to do cleanup in a plugin I may as well do my own cleanup manually outside the plugin.

For curious readers I ended up finding where tao explains it

I don't fully understand it, but I guess it is some kind of platform limitation. :( I would have thought after more than 20 years of GUI paradigms, the major OSes would have made GUIs as nice to program as CLIs. Oh well.

I'll put a mutex or something and a cleanup_if_not_already_cleaned_up function and see if I can make a lint to forbid direct calls to app.exit in my own code.

Edit: Just a little more opinion-posting - I can see why webdev took off, if the GUI is just a TCP listener, the OS is much nicer to you than if you actually use most OS' GUI primitives

FabianLars commented 5 months ago

I would have thought after more than 20 years of GUI paradigms, the major OSes would have made GUIs as nice to program as CLIs. Oh well.

I thought so too at one point, until i got into working on (instead of "with") a gui framework. Now i cry myself to sleep every night 🤷

That said, not having an easier way to do custom clean up is on us and our api design. (No idea about the event loop stuff thi issue was about though)

arialpew commented 5 months ago

Tauri never call destructors and never stack unwind because it's calling std::process::exit. This is in my opinion a failure in design, I don't see any good reason why you can't return control flow when application exit on Windows for example. It's maybe more an issues on some platform, but on Windows it should not. Most people will face this at some point if they have to start an infinite background thread running alongside Tauri. How to join properly and how to exit the thread gracefully ?

My current workaround about dropping resources is using OnceCell/OnceLock with a static variable and a signal.

A cell can be initialized only once. To set the content of the cell, you use set(). If the cell is already initialized, this is a no-op. To drop the cell content, you use take(). This require mutable reference to the cell, so obviously you have to make your static mut.

Yes, this require unsafe and a carefull design, but if you use the correct implementation it's safe (the compiler is simply unable to reason about it).

Note :

If the cell is empty when you take() it, it's a logical error from your side.
If the cell is full when you set() it, it's a logical error from your side.
You have to make the static mut to drop the cell content because it require a &mut. This is a bit annoying, because this bypass some Rust compiler safety and increase potential of runtime crash. Maybe we can do better and get ride of the &mut ? There's also UnsafeSyncCell that will be stabilized and could replace static mut (it's possible that static mut is deprecated in later edition, it's still in discussion).
To signal thread to stop, you can use Arc<AtomicBool> or a channel.

Cell errors can be handled gracefully with the cell API, at the cost of useless check if you know you don't have errors in your code (but who care, it's about initialization and destruction, performance does not really matter here and if you really want you can write unchecked method with unreachable hint for dead code optimization).

pub struct ThreadManager {
  should_exit: Arc<AtomicBool>,
  handle: std::thread::JoinHandle<()>,
}

impl ThreadManager {
  fn new(should_exit: Arc<AtomicBool>) -> Self {
    // Start thread and return a new ThreadManager with the associated signal and thread handle.
    // Move "should_exit" (Arc<AtomicBool>) into the thread.
    // ThreadManager::new is private, can not be constructed outside of the start() method.
  }

  /// Start the ThreadManager.
  ///
  /// Calling start multiple time has no effect, unless you empty the cell with stop(). There's only a single ThreadManager.
  pub fn start(should_exit: Arc<AtomicBool>) {
      unsafe {
          let _ = THREAD_MANAGER.set(Self::new(should_exit));
      }
  }

  /// Stop the ThreadManager. Usefull for exiting gracefully.
  /// Calling stop multiple time has no effect. There's only a single ThreadManager.
  /// Note that you should signal the thread to stop since it's running an infinite loop.
  /// If you don't set "should_exit = false" (from the Arc<AtomicBool> you passed-in originally), we gonna wait the join forever.
  pub fn stop() {
      unsafe {
          // Just in case someone call stop() multiple time, we want to be sure the Cell is not empty.
          if let Some(thread_manager) = THREAD_MANAGER.take() {
              thread_manager
                  .handle
                  .join()
                  .expect("thread should join");
          }
      };
  }
}

/// ThreadManager is thread safe. We use a OnceLock to ensure it's initialized only once.
/// Note : it's not public, the only way to interact with it is with start() and stop().
/// Since the ThreadManager is a static global, it's not possible to drop it without making it a static mutable.
static mut THREAD_MANAGER: OnceLock<ThreadManager> = OnceLock::new();

In main function, you create an Arc<AtomicBool> with false as default value. Then you call ThreadManager::start and clone the Arc<AtomicBool> to pass it as argument.

You clone and move the Arc<AtomicBool> into the run FnMut callback. Inside the callback, you match on RunEvent::ExitRequested and set Arc<AtomicBool> to true and then you call ThreadManager::stop.

The single storage that will leak in this process is the Arc<AtomicBool> that live in main and the cell itself (not the content). Your thread will be joined properly before Tauri exit, and the static storage will be cleaned up.

The same principle can be applied if you want to share data globally in Tauri and need to destroy it properly. You can pass tauri::State<&YourGlobalData> to any commands by using manage(MyGlobalData::get()) assuming MyGlobalData::get() return the content of OnceCell::get / OnceLock::get that come from a static. It's your responsability to set the cell content and take it's content when it's appropriated : that's why it's not unsafe if you hide it behind a safe API, because all the unsafety about cell usage are logical errors and theses doesn't cause undefined behavior if you put the appropriated runtime check that transform theses errors into no-op.

Outside of ugly static storage duration, you could also use Arc<Mutex<Option<T>>> directly inside main. I don't know if it's possible, certainly. This would get ride of the static mut requirement, but then you are stuck in a hole because Tauri will keep an Arc strong reference due to manage call (tauri::State<Arc<Mutex<Option<T>>>).

End result is it will remain 1 strong reference at the end and things will never get dropped - Arc::try_unwrap will fail even if you destroyed all strong reference, Tauri still keep one and will never drop it :) .

Pick your poison :

Rust doesn't drop static storage duration because it's undefined behavior.
Tauri doesn't drop it's managed state nor it do stack unwinding of main due to std::process::exit call.
Memory leak is not unsafe and doesn't cause undefined behavior (but can be an issues if you manage resources that flush their content when closed).
OnceCell::take / OnceLock::take require &mut. In case of static that mean static mut and it's unsafe. In a ideal world, we could get ride of the mut. Maybe you can achieve better with RefCell and it's Sync counterpart RwLock, instead of OnceCell / OnceLock.

ReactorScram commented 5 months ago

@arialpew I have a similar case, I have a worker thread for COM that I'd like to join gracefully. I intercept the exit events and do my own graceful shutdown within a controller task, before allowing Tauri to crash the process: https://github.com/firezone/firezone/blob/main/rust/windows-client/src-tauri/src/client/gui.rs#L608

I didn't end up needing unsafe for this.

Edit: I dug this up - It's a limitation of tao, not Tauri itself, and tao has some justification for it that I don't fully understand :/ https://docs.rs/tao/latest/x86_64-pc-windows-msvc/tao/event_loop/struct.EventLoop.html#method.run

monax3 commented 4 months ago

Having just spent a couple days trying to gracefully clean up a background future, this whole situation needs to be better documented if nothing else. Exiting with std::process::exit means a lot of normal Rust assumptions don't hold and the reason why is not immediately obvious.

ReactorScram commented 4 months ago

Agreed on docs. I always look at examples first, so if it was me I'd put an unreachable! right here at the end of main in Hello World, with a link to the relevant docs

https://github.com/tauri-apps/tauri/blob/f5e4b672031923ce21dbd98675ddc22610e3b8a7/examples/helloworld/main.rs#L13

Also... The docs say run returns Result<()>, I guess since ! is still in nightly: https://github.com/rust-lang/rust/issues/35121

But you could have a Result<Impossible> where struct Impossible {} has a docstring /// This is for run, which can bail out but never returns if the app successfully starts up

tauri-apps / tauri