rust-windowing / winit

Window handling library in pure Rust
https://docs.rs/winit/
Apache License 2.0
4.84k stars 903 forks source link

Winit `run_return` and Android does not get `Resumed` and `Suspended` #2706

Closed Hoodad closed 1 year ago

Hoodad commented 1 year ago

Hi!

I have noticed there are some inconsistencies (or at least it appears so) when running an event loop using run_return compared to run. I have tried creating as small as possible repro example Android Activity Winit Glutin where I have taken a working example and replaced the run with run_return and I cannot get the messages Resumed or Suspended to arrive.

More specifically see this commit diff

I can get the Resumed event to arrive if I don't exit the loop when I get Event::MainEventsCleared, however that is a bit unexpected.

Please let me know if I have missed something important that will make the events appear.

rib commented 1 year ago

Curious - at least from an initial inspection it wasn't obvious to me why that wouldn't work.

My initial thought was that I'd be cautious about moving the handling code for any specific Winit events (such as Resumed) outside of the winit event loop closure, because the backend sometimes needs to synchronize with Java when an event is being dispatched.

Technically this is currently only needed for the android-activity SaveState event (not exposed via Winit) and TerminateWindow events (which map to Winit Suspended events) so I think it might be possible to get away with deferring the Resume event like this but not sure atm if that might be risky to rely on.

Looking at tracing what happens with your example then what I see is that the Java thread is being blocked very early on when setting the app state to Start and repeated calls to run_return weren't leading to android_activity::AndroidApp::poll_events being called to really progress the event handling and unblock the Java thread.

I think the issue comes from how Winit has a special case for the first iteration of the mainloop with a cause of StartCause::Init.

I don't think run_return is currently well suited to this model of trying to run a single event polling iteration due to how (most?) backends will run this special-case (StartCause::Init) iteration of the mainloop as the first thing they do before really starting a loop that polls for OS/window system events.

In this respect the Android backend currently copies what is also done in the Windows and Linux backends.

(I couldn't see on a quick inspection how the MacOS backend handles StartCause::Init.)

For this example I think this special case iteration of the mainloop at the start of run_return is basically blocking ever reaching the real event polling / handling because each time it reaches MainEventsCleared in the Init iteration it will then bail out with the exit code before starting the real loop.

As an experiment I made this hacky change to the Android backend and got the example working:

diff --git a/src/platform_impl/android/mod.rs b/src/platform_impl/android/mod.rs
index 68e4cf66..563a1c31 100644
--- a/src/platform_impl/android/mod.rs
+++ b/src/platform_impl/android/mod.rs
@@ -694,6 +694,7 @@ impl<T: 'static> EventLoop<T> {
         let mut pending_redraw = false;

         // run the initial loop iteration
+        /*
         let mut iter_result = self.single_iteration(
             &mut control_flow,
             None,
@@ -701,9 +702,12 @@ impl<T: 'static> EventLoop<T> {
             &mut cause,
             &mut callback,
         );
+        */
+        let mut iter_result = IterationResult { deadline: None, timeout: None, wait_start: Instant::now() };

         let exit_code = loop {
             if let ControlFlow::ExitWithCode(code) = control_flow {
+                log::debug!("run_return: early exit with code");
                 break code;
             }

I guess it should be possible to handle this more neatly with some extra care around when the StartCause::Init iteration runs, though it maybe needs some agreement across backends on what the "correct" behaviour is supposed to be here.

Currently I think some other backends work in a similar way to the Android backend and I'm not sure atm but maybe it's useful for other use cases that the StartCause::Init iteration should consistently run each time run_return is called (perhaps for use cases that only expect the run_return loop to eventually exit instead of immediately)

rib commented 1 year ago

For reference, here's a related issue on macos, for a different use case: https://github.com/rust-windowing/winit/issues/2431

rib commented 1 year ago

It could be good to understand the use case a bit more here.

Can you maybe say why you're trying to move some event loop handling outside of the winit callback?

I have a feeling that the reason for wanting this pattern might instead imply a different shortcoming with Winit that can potentially be addressed, so that you can keep your event loop handing inside Winit's event loop.

Needing to support using run_return purely as a function for polling events probably can't be done portably. At least I guess that model wont be easy to support consistently on macos, ios and web.

Maybe a different event polling API could be made available on platforms where that is possible though, for situations like this where you're effectively embedding Winit into an external event loop.

kchibisov commented 1 year ago

For reference, here's a related issue on macos, for a different use case: #2431

that's not related issue though. The mentioned issue was asking "how to create event loop ondemand to perform window requests".

rib commented 1 year ago

Yeah, I figured it was related enough to mention here, since it represented a different use case for run_return and it's also talking about issues with calling it multiple times.

I'm not suggesting it's a duplicate, just potentially related enough for added context here.

It's an example of wanting to call run_return in a situation where you expect it to return eventually, but not treating it like a poll function like the example above.

Hoodad commented 1 year ago

Thanks @rib for looking into to this issue!

I wasn't aware that calling run_return would call StartCause::Init before doing things, that's interesting. I would have expected it to pump the received events since last call and return back.

Regarding the use-case of why to use run_return compared to encapsulating the app inside the event loop callback like done with run. I would say that it boils down to principals, we use run_return becaues really don't want our entire callstack go get deeper and more complex and give up ownership of the main loop. we and our app is the boss, not winit. If we could use winit similar to GetMesssage that would be preferred. Too me it feels like winit is too opinionated about how to run the event loop. Please note that this reason is not only limited to Android.

The documentation about run_return states

You are strongly encouraged to use run, unless the use of this is absolutely necessary.

But does not expand on why not to use it or what it's purpose is.

Either way I'm looking for some kind of solution to work around this limitation, one alternative could be to switch to run but that would not feel too great 😢

rib commented 1 year ago

I wasn't aware that calling run_return would call StartCause::Init before doing things, that's interesting. I would have expected it to pump the received events since last call and return back.

Yeah the docs aren't particularly clear about it's behaviour.

run is generally a tiny shim over run_return so run_return generally behaves the same as run except that it may be able to return to the caller - whereas run is declared with -> ! and can't return back to the caller.

In addition to running a StartCause::Init iteration of the loop run_return will currently also emit LoopDestroyed each time it returns.

I would say that it boils down to principals, we use run_return becaues really don't want our entire callstack go get deeper and more complex and give up ownership of the main loop. we and our app is the boss, not winit. If we could use winit similar to GetMesssage that would be preferred. Too me it feels like winit is too opinionated about how to run the event loop. Please note that this reason is not only limited to Android.

I can sort of see the appeal in not having to bend to how Winit defines its event loop but at the same time I think Winit's design generally comes from trying to provide an abstraction that can work across OSs/window systems and some window systems aren't compatible with that kind of GetMessage model. It's a tricky juggling act.

Take for example web. On web the "loop" is an illusion because its fundamentally not possible to create a long running loop that would block the browser. In this case the loop is really just an abstraction over the browser's internal event loop and it's not possible to explicitly poll for events in a browser, like you can on Linux / Android.

Its been a while since I've looked a macos / ios closely but I think it could be tricky to support that model there too. Although they use kevents under the hood similar to epoll on Linux / Android, applications use Run Loops that hide a lot of details and I don't think there's really a way to manage the loop yourself and poll/dispatch events yourself - it's just not how the macos window system is designed to be used.

If you only need to be portable across Android, Linux and Windows then I think it might be possible to integrate Winit into an external event loop, similar to the example above - but yeah, it would at least require a change similar to above.

one alternative could be to switch to run but that would not feel too great 😢

Incidentally when I first started having a look at this the other day, I realized that the Android backend's implementation of run is really not ideal currently and filed this issue as a reminder: https://github.com/rust-windowing/winit/issues/2709#issuecomment-1451609491 (just connecting the dots) So with the way run is implemented currently I'd avoid using it on Android for now.

Did you maybe hit some specific road block from being able drive your own event / rendering / frame processing from within Winit? It should ideally be possible to drive the same logic that you'd have in the outer loop, between run_return calls via a Winit event instead (such as via redraw callbacks) - but maybe you hit some technical issue with that?

At it's core, Winit is designed to provide a portable event loop for graphical apps, so if you're looking at side stepping that event loop that's not really a good sign :) The event loop is the crux of what Winit is.

If there's some technical limitation that means Winit isn't able to run your apps main loop then it would be good to understand that and hopefully address that.

I probably wouldn't worry too much about callstack depth - the backend abstractions for running the event loop in Winit are pretty shallow as far as I've seen.

Winit does add complexity though for sure. There is some inherent complexity with abstracting across window systems though and it might be hard to avoid unless you know you only care about a limited set of platforms.

Hoodad commented 1 year ago

In an attempt to keep the discussions more focused there seems be two main topics. 1) The issue of run_return not working as expected on the Android platform

2) Why use the run over run_return or vice versa.

Topic 1, run_return not working

With regards topic 1, I'm wondering if the hacky fix you suggested has a more mature variant? Seeing how the current issue is preventing the use of run_return on Android. I tried out the hacky fix and the Resume and Suspend events are now received as expected and the flow works much better 👍Maybe the event loop can keep track of if its the first time being called? But its also interesting that calling single_iteration has this side effect.

Topic 2, why use run_return

Regarding the second topic, why does run_return exist if you're never expected to use it "unless the use of this is absolutely necessary". If the API contains functions, that are not deprecated, I would assume they are meant to be used. Quickly skimming examples it appears to only be used in examples/window_run_return suggesting its not actually on the same level as run (also hinted by in the documentation). I don't know the history of winit API's well enough but I suspect that run_return might be a legacy thing that maybe its time to remove or deprecate if there is a strong preference to use run?

There are as you point out @rib hard to make a single interface for such a large variety of different ways of handling message loops and preferences. So I see it as a great strength that winit has the types of ways to run the event loops even though they may not have feature parity or even functions on all platforms.

And as last thing I want to highlight the usage of winit for me is developing a game. So maybe that in it self skews the perception of how the flow should work.

rib commented 1 year ago

Topic 1, run_return not working

I guess this is up in the air atm, because we aren't clear about what working means currently.

If we just think of run_return as an alternative to run that can return an exit status then it should work for that. This is what I would currently recommend doing on Android if possible.

The implementation of run on Windows, Macos, Linux and Android is currently a tiny shim around run_return that will just exit the process when run_return returns. This is how run is implemented across all platforms that support run_return:

pub fn run<F>(mut self, event_handler: F) -> !
    where
        F: 'static + FnMut(Event<'_, T>, &RootELW<T>, &mut ControlFlow),
    {
        let exit_code = self.run_return(event_handler);
        ::std::process::exit(exit_code);
    }

iOS and web are the two exceptions that don't provide a run_return implementation because there's no way to return from the event loop on those platforms.

Topic 2, why use run_return

Although I wasn't around when run_return was introduced, I've seen that it was part of the "Events Loop 2.0" reworking, e.g.: https://github.com/rust-windowing/winit/issues/459

also see: https://gist.github.com/Osspial/53efa87797899872650f6bece14736c0 for the original design proposal that gives some good background here.

Historically there were two APIs run_forever and poll_events and as part of the rework they got renamed to run and run_return.

So historically it looks like run_return was indeed designed to be used like a polling function similar to above - because prior to the 2.0 rework applications were responsible for the event loop itself and needed to poll for events.

Interestingly Windows was a key motivator at the time for switching over to the current callback design due to how Windows creates a nested loop when handling window resizing. (I had thought Windows was one of the OSs where you could actually get away with polling)

It looks like the older design was also very problematic for macos too: https://github.com/rust-windowing/winit/issues/231#issuecomment-359334884

I think your hunch is probably right when you say:

run_return might be a legacy thing that maybe its time to remove or deprecate if there is a strong preference to use run.

The documentation does still give hints that it was originally intended to be used for polling events but it can't really be used like that any more since that's not really a model that can be supported consistently across OSs (in addition to the specific issue caused by the StartCause::Init iteration of the mainloop).

At this point run_return is exactly the same as run except for the fact that run_return can return an integer exit status.

Current conclusion

Considering that Winit has already explored being based on a polling model in the past, and that didn't work out I think we probably shouldn't be pushing to re-introduce support for event polling via run_return.

E.g. on Android that model can't work for events where the handling needs to be synchronized with the Java thread (such as Suspend events or SaveState events if they get exposed) so it would also add complexity if each platform would have to document when events must be handled via the closure/callback and can't be polled.

Proposal

I think I'd currently advocate for removing run_return and updating run so that it can return a Result for any failure or exit status. This iteration of run would not be usable like an event polling function, but it would allow the event loop to be re-run more than once for use cases like: https://github.com/rust-windowing/winit/issues/2431

This was also what we started to discuss recently here: https://github.com/rust-windowing/winit/issues/2709

Unfortunately (since I see you aren't keen on not owning the loop itself) this would mean that you'd have to move your event handling inside the Winit event loop, but I think, practically, this is the only portable solution really.

Did you maybe hit other technical issues from already trying to drive your game engine logic from within the Winit event loop that need to be considered here? (apart from not being keen on the layering / lack of ownership)

Game engine requirements

I don't think I'd generally expect the callback model of Winit to be a problem for a game engine (e.g. this is inherently how it works on MacOS, iOS and Web) but I can think of things about Winit that are lacking that will be important for a game engine or UI toolkit (e.g. no good story yet for scheduling and throttling rendering).

Winit also doesn't give many ways to punch holes and do platform-specific things in situations where you want to be pragmatic and handle something directly while there isn't yet a platform-agnostic abstraction provided by Winit.

Hopefully these kinds of shortcomings can be addressed incrementally within Winit though and testing out solutions with the context of a specific game engine / use case should be able to help with that too if there are specific blocking problems being hit with the current design.

RFC

Maybe @kchibisov, @madsmtm and @msiglreith are more familiar with the historic usage of run_return and can also comment on whether run_return should support event polling or maybe we should look at consolidating run and run_return as discussed here: https://github.com/rust-windowing/winit/issues/2709

madsmtm commented 1 year ago

Not familiar with the history of run_return, I'd be fine with deprecating it in 0.28 and removing it in 0.29 (unless we get reports that we shouldn't).

Linking the lightning talk that @Osspial did three years ago about this problem. I think the issue with not having ! as the return type is that users kind of expect that they can do stuff after run, which they can't.

kchibisov commented 1 year ago

Although I wasn't around when run_return was introduced, I've seen that it was part of the "Events Loop 2.0" reworking, e.g.: #459

Because desktop clients don't want to terminate for example, they might want to do some cleanups, ect. For example alacritty is using run_return for a long time. From what I remember it's an alternative to poll events inside the closure in some vector and then run again. This stuff is actually used and this stuff makes it possible to use winit together with async https://github.com/notgull/winit-block-on.

Interestingly Windows was a key motivator at the time for switching over to the current callback design due to how Windows creates a nested loop when handling window resizing. (I had thought Windows was one of the OSs where you could actually get away with polling)

It looks like the older design was also very problematic for macos too: #231 (comment)

Yes, the winit design is driven by platforms like that, because they can't do much because how tie they are to the windowing system. On X11/Wayland it's not an issue for example.

At this point run_return is exactly the same as run except for the fact that run_return can return an integer exit status.

It doesn't terminate the entire app, that's the point why it exits and used. How you're going to use(like normal run or like before event loop v2 to do polling is up to you.

I think I'd currently advocate for removing run_return and updating run so that it can return a Result for any failure or exit status. This iteration of run would not be usable like an event polling function, but it would allow the event loop to be re-run more than once for use cases like: #2431

As long as it doesn't terminate the entire program I'm fine with it. Be aware that if you consider that it'll never return, but we won't tell you is a solution, it's not, I swear it'll break the destructors or user cleanup logic, so the current spilt is necessary.

If run will return on all platforms, but you won't be able to run again that's perfectly fine, it must just return.

Unfortunately (since I see you aren't keen on not owning the loop itself) this would mean that you'd have to move your event handling inside the Winit event loop, but I think, practically, this is the only portable solution really.

polling based on run_return is used by projects, so run should support it. Yes they can't move, + see async link above.

(e.g. no good story yet for scheduling and throttling rendering).

Winit can provide such stuff on the application level only, e.g. wayland frame callbacks, request animation frame and so on. Game engines should simply use the underlying OpenGL api's for such stuff and simply run a timer, they could build a timer on top of winit if they don't want high precision, because for example Windows sucks a timers.

An example of timer and frame scheduling along the winit is done in alacritty. We use both frame callbacks, timer based rendering (to not use Vsync because it's bugged), and drive all of that with winit. The FPS is sustainable to monitor even with multiple windows drawing at the same time.

Winit also doesn't give many ways to punch holes and do platform-specific things in situations where you want to be pragmatic and handle something directly while there isn't yet a platform-agnostic abstraction provided by Winit.

You might want to look into the glutin current design wrt such things for inspiration. It provides you with a way to do things you want directly. If more people will be interested in such design we could adopt to it, I do like current glutin design wrt handling different APIs, but I'm not sure how it'll suit winit.

madsmtm commented 1 year ago

Re alacritty: Is there a reason that the stuff that is done after run_return (only FreeConsole as far as I can see) can't just be done inside Event::LoopDestroyed, or perhaps rather inside Drop?

kchibisov commented 1 year ago

Re alacritty: Is there a reason that the stuff that is done after run_return (only FreeConsole as far as I can see) can't just be done inside Event::LoopDestroyed, or perhaps rather inside Drop?

I guess we can now.

Though, we don't really want to, unless run can propagate the errors and such.

rib commented 1 year ago

I think the issue with not having ! as the return type is that users kind of expect that they can do stuff after run, which they can't.

yeah, I can see the trade off here but overall I tend to think that it's even worse that run explicitly exits the process and doesn't let you handle any exit status and return gracefully from the main function.

Calling exit short circuits all of Rust's normal cleanup via Drop and in the case of Android is also short circuiting app lifecycle management in Java and making a potentially incorrect assumption that there aren't multiple activities running in the same process.

In some situations there can be meaningful code that goes after the run but yeah applications do need to be aware that that code will never be reached on some platforms - such as iOS and Web, but it can still be useful for the other platforms.

If the code after the run is essentially just relaying the exit status somewhere gracefully then it's OK that such code wouldn't be reached for platforms that can't exit their main loop.

It seems like these caveats could be explained quite easily in the docs for run.

kchibisov commented 1 year ago

It seems like these caveats could be explained quite easily in the docs for run.

I mean the destructors are being executed from inside. I wonder how you close on ios/web then, I think you got a notification that you're about to exit on ios and you can surely exit?

Though, I have no clue about such platforms, but I have a feeling that they let you know that you're returning and execting, you just not allowed to run anything again or something like that, which should be fine.

rib commented 1 year ago

How you're going to use(like normal run or like before event loop v2 to do polling is up to you.

This is the key question here really; do you think run_return should support event polling?

It looks like run_return no longer really supports the old style of polling. (That's effectively why @Hoodad's example doesn't work)

This isn't just an issue with Android - the same polling approach also would not work on Linux or Windows as far as I can tell, with the current way run_return is implemented.

Event polling is also fundamentally incompatible with MacOS, iOS and Web so it would seem surprising that you'd want to support that usage model for run_return?

If we want to try and support event polling via run_return then it raises some tricky questions:

rib commented 1 year ago

Re alacritty: Is there a reason that the stuff that is done after run_return (only FreeConsole as far as I can see) can't just be done inside Event::LoopDestroyed, or perhaps rather inside Drop?

I guess we can now.

Though, we don't really want to, unless run can propagate the errors and such.

This also relates to the question of whether run_return is supposed to be usable for event polling because currently you will get an Event::LoopDestroyed every time you poll which doesn't seem to make sense for that use case. This was also noted in https://github.com/rust-windowing/winit/issues/2431

kchibisov commented 1 year ago

Just so you know, event pooling with run return is used in the wild and works. One such project is https://github.com/smithay/smithay. They can't run their stuff inside winit, so they need a way to poll ondemand.

This also relates to the question of whether run_return is supposed to be usable for event polling because currently you will get an Event::LoopDestroyed every time you poll which doesn't seem to make sense for that use case. This was also noted in #2431

yeah, the way it communicates the events right now is unfortunate.

On Windows how should we explain that if you use run_return for polling events then you may hit problems with how resize events are handled due to the nested, modal loop that windows creates during resizing.

It's explained https://docs.rs/winit/0.28.2/winit/platform/run_return/trait.EventLoopExtRunReturn.html#caveats

In fact the original docs were written in a way that you poll via the run_return.

rib commented 1 year ago

I guess Smithay only supports Linux in this context though so it's maybe a bit of a special case that it can rely on the behaviour of the one platform that is most compatible with a polling based model.

I could imagine that if run_return were removed in favour of having a run that could return a Result then on Linux we could expose a platform-specific poll_events API that could be used by something like Smithay. This would also help make it clearer that it's not portable.

More generally for supporting cross-platform frameworks it doesn't look like polling can be supported in a portable way though.

rib commented 1 year ago

Looking a little bit closer at the Linux implementation of run_return - one other inconsistency I see is that for Wayland then at the start of run_return it actually only calls the callback as a special case with Event::NewEvents(StartCause::Init) as opposed to running a full iteration of the loop like the X11 and Android backends do.

I'm not quite clear what the behaviour is for other backends is atm.

I would expect that any wake up cause would be associated with a full iteration of the loop, including NewEvents, MainEventsCleared and RedrawEventsCleared - or if not then this should probably at least be consistent across all platforms.

TimonPost commented 1 year ago

Resume events are fired each timerun_return is called because they are always sent after Init. We had to work around this problem of getting Resume events each time we poll for events. And yes run is actually just run_return but exists the process on ExitEvent.

The question in this thread boils down to poll based vs event-based approach. There is likely not a one-fit-all answer on which system is better for all platforms and scenarios. And it also depends on the project motivation of winit. This is where I like @rib case "a different event polling API could be made available on platforms where that is possible".

For a game engine, it is essential to have control over update rates. And it is preferable to not depend on an external opensource library to control the main loop for us even being it by using Poll and WaitUntil. It might be possible to go in that direction but it will require quite some refactoring on our side as we have a system that is taking care of timing itself. An alternative system could be that game engines could run winit on the main thread and run our engine in a separate thread and communicate events via channels. This can not be done the other way around where winit runs on a non-main thread. But countering that it might not be ideal to create crossthread applications as it implies other problems.

In my view, I would like to see winit to be a black box that provides events when I poll for it. Just like mio does or I did in crossterm. As a developer, I like control over my application without a black box taking care of the execution of it.

/// crossterm
pub fn poll(timeout: Duration) -> Result<bool>; // returns readiness within the duration
pub fn read() -> Result<Event>; // block reads next event

// mio
pub fn poll(
    &mut self,
    events: &mut Events,
    timeout: Option<Duration>
) -> Result<()>

A poll-based API is very flexible and a closure system could be built on top of it. It is proven to work when interfacing with system io like with mio. You kinda invert the flow direction which is counter-intuitive for rust but if done well it opens up different designs and could remain to support the closure approach.

Perhaps winit should consider removing the Poll and WaitUntil control flows and run_return and instead create a dedicated API to support proper event-polling. Then the closure way of doing it can remain to exist but it is fully dedicated to be event-driven-winit-owned. I don't know much about winit-internals and the achievability of such design tho. One thing that is clear from this thread is that it seems like the run_return nature is not clear for users and developers alike.

kchibisov commented 1 year ago

Resume events are fired each time run_return is called because they are always sent after Init. We had to work around this problem of getting Resume events each time we poll for events. And yes run is actually just run_return but exists the process on ExitEvent.

Yeah, that's a bit unfortunate.

The question in this thread boils down to poll based vs event-based approach. There is likely not a one-fit-all answer on which system is better for all platforms and scenarios. And it also depends on the project motivation of winit. This is where I like @rib case "a different event polling API could be made available on platforms where that is possible".

That's impossible though, since some events are SYNC if you don't do what they tell you todo in sync manner you'll get bugged behavior. Like resize on macOS when it bounces if you don't do it sync.

The main issue with the polling API is that it doesn't really work anywhere except X11/Wayland. Though, for game engine I doubt you care about the resizes at all. You can't also block polling for the events like you can on let's say, Linux on macOS.

Perhaps winit should consider removing the Poll and WaitUntil control flows and run_return and instead create a dedicated API to support proper event-polling. Then the closure way of doing it can remain to exist but it is fully dedicated to be event-driven-winit-owned. I don't know much about winit-internals and the achievability of such design tho. One thing that is clear from this thread is that it seems like the run_return nature is not clear for users and developers alike.

We could probably modal polling API with the caveat that it's broken on Windows/macOS because they want SYNC behavior and that GUI toolkits shouldn't use it. For games you likely don't care about such things.

TimonPost commented 1 year ago

We do process resize and need to rebind our window to the backend, save new window dimensions, update egui DPI, and some other things on resize. I am not sure if I follow why a polling system breaks synchronous order in events?

kchibisov commented 1 year ago

@TimonPost once macOS sends you a resize and asks to RedrawRequested you must do that right away, if you don't it'll bounce. An example of missbehaving client would be alacritty since we don't do that on RedrawRequested right away, but simply do polling.

I mean for game it doesn't matter, because you don't care about how smooth your resizes are in the end, but for UI gui clients it does matter.

You can test yourself by resizing alacritty on macOS and see how viewport glitches because we do poll-like events processing.

rib commented 1 year ago

Resume events are fired each time run_return is called because they are always sent after Init.

Yeah, I pushed for this because there was a separate difficulty before with being able to write portable code that could initialize render state in a way that was properly synchronized with Android's lifecycle.

Tbh though I didn't appreciate that applications were trying to use run_return for polling at the time I did this.

I think there are currently quite a few oddities / inconsistencies atm with what events get dispatched if run_return is used for polling including:

If Winit wants to really try and support polling it would be good to re-evaluate those kinds of events that are more related to a full app/event loop lifecycle.

notgull commented 1 year ago

I think we could maybe expose a EventLoopExtPolling for platforms that can support polling, but I don't think we should. It would only be available for free Unix, and MacOS if you're willing to commit a code crime. In addition, it can be emulated by running the event loop in the main thread and then sending events to another thread over a channel (this is how poll_events worked prior to the rework, if memory serves).

Also, winit-block-on doesn't actually need run_return, it can use run as long as the future never returns.

rib commented 1 year ago

I would like to see winit to be a black box that provides events when I poll for it. Just like mio does or I did in crossterm. As a developer, I like control over my application without a black box taking care of the execution of it.

I can see some of the appeal but still tend to think that pushing for a polling model is not the ideal solution (though it could be a short-term pragmatic solution for a game engine that's currently based on that model). I don't really see it as the lower common denominator as described because some window systems are callback based and non-pollable.

The polling model mainly just works nicely on Linux since it's the only OS that is very consistent with driving all window system events via file descriptors that can be efficiently polled. For all other OSs (including Android that's based on Linux) there is some amount of impedance mismatch with this model - up to the point of being incompatible with iOS and Web.

I like the idea of an MIO based event loop (I recently started a side experiment to look at implementing just this) but in the context of something like Winit it's not going to be as simple as just polling via a standalone MIO selector as the one source of events because on some platforms we're forced to integrate with a loop that's provided by the system. On Android we have to integrate with a Looper and on MacOS/iOS we have to integrate with a RunLoop.

In the case of RunLoop integration on MacOS / iOS there's no underlying, low-level kqueue file descriptor for the loop like you'd have on Linux, and so you have to rely on the RunLoop implementation (as a black box) to handle efficiently waiting when there are no events. It would be possible for us to attach our own kqueue file descriptor for additional events to a RunLoop but not the reverse. The upshot here is that there is no efficient way of polling for events. On MacOS (not iOS) you can repeatedly "stop" the app to exit the RunLoop and then re-run it to approximate polling but it's still not really the same as being able to poll a file descriptor. E.g. I guess you'd have to hack in a special RunLoop source to try and emulate a timeout of zero and I'd expect this would use a disproportionate amount of CPU compared to polling a file descriptor.

Overall there are specific issues with polling across Windows, Android, MacOS, iOS and Web that all tend to suggest that the callback approach is the more portable model for any kind of cross-platform graphical application, including for a game engine.

This wouldn't necessarily preclude having some fairly monolithic callbacks for driving complex game engine updates.

Without looking at the specifics it's hard to really help evaluate what the technical / practical challenges are with moving code that currently resides in an app-owned loop inside a Winit closure but maybe if there are some specific reasons why that's tricky currently then it might separately be possible to make changes to Winit that would make that easier, so it's easier to adopt a more portable callback model.

E.g. I wonder if one thing that could help would be some accurate timer APIs to help schedule work in sync with rendering deadlines. E.g. one common strategy for minimizing latency in a game engine is to try and delay updates and rendering as late as possible before your presentation deadline. Right now there's no clear way to do that via a callback in Winit and I could imagine finding the polling model more convenient, basically as a workaround for other features that are missing from Winit.

Some of the portability hazards with the polling model...

Windows

I'm not very familiar with this one but my rough understanding is that polling can become blocked indefinitely during resizing due to how a nested, modal event loop gets run to handle the resize events. (not sure atm if this is an architectural issue with Winit or if it's a fundamental issue with how Windows is designed)

Android

Any event, such as a lifecycle event, that originates from the Java thread will involve synchronization between the Java thread and the native main() thread (i.e. that Java callback is blocked from returning until the native code has handled the event).

For some of those event the Java callback that is delivering that event will assume that the event has been handled before it returns.

Examples of this include notifications of the window/surface being destroyed or a request to SaveState. The code that responds to these events can't be queued to be processed later which makes those events incompatible with polling.

MacOS / iOS

The main one I'm aware of is that UIView draw events are supposed to be handled inline within the callback from the OS and can't be deferred.

MacOS / iOS is all delegate / callback based though and I would tend to assume there are numerous callbacks where the system expects the event to be handled in the callback itself.

Generally speaking lifecycle events should almost certainly be handled with their callback because it just doesn't conceptually make sense to buffer a lifecycle event and handle after the next poll, because the lifecycle state might then be out of date. For instance if you're being moved into the background or get some clue that your about to be closed then you may need to save state asap otherwise it might be too late if you buffer and handle via polling.

iOS can't be polled because UIApplicationMain is documented to never return.

Web

Web-based event loop support is basically all callback based because the event loop is managed within the browser. An application can't run it's own loop with polling since that would block the browser.

Summary

It might be possible to find tricky workarounds for some of these to stick with a polling model on some platforms but I think the model is mainly just compatible with Linux and Windows. It looks like it could become more complex over time to be sticking with a polling model on MacOS and Android and impossible on iOS and Web.

What should we do here?

Personally I think at this point that the current run_return is no longer a well-suited API for polling - even though it historically evolved from a polling API.

I'd still advocate for removing run_return and instead letting run return a Result and being clear in the documentation that the API is not designed / intended for polling.

For the polling use case we could add a separate set of event loop APIs like start(), poll_events(), finish() for Linux, Android, Windows and maaybe MacOS which would be similar to run_return but with various caveats including:

It's not obvious how to define when poll_events() would return. Would it take a timeout if that can be somehow supported on each platform, or maybe it would return after a single event loop iteration (which could take an indefinite amount of time if no events are delivered).

With these APIs then StartCause::Init and Resumed dispatching would only happen once within start() and LoopDestroyed would only be dispatched in finish().

I'd still be a bit concerned that it's confusing / misleading for a cross-platform window system API to be supporting a completely distinct event loop model that isn't portable, but at the same time I do sympathize with wanting to support an existing code based that's currently based around a polling model and doesn't support the full set of platforms that Winit does.

notgull commented 1 year ago

In the case of RunLoop integration on MacOS / iOS there's no underlying, low-level kqueue file descriptor for the loop like you'd have on Linux, and so you have to rely on the RunLoop implementation (as a black box) to handle efficiently waiting when there are no events.

If you dive deep enough down, the MacOS event loop is based on Mach ports, which can be inserted into a kqueue and polled just like a TcpStream. The main issue with doing that is that the API required to access the Mach port is unstable, and Apple generally doesn't keep their internal API's consistent, as anyone who works on the Go standard library can attest.

However, for all of the other OSes, it's pretty much impossible to setup a polling loop without threading and channel hacks.

I would actually support keeping run_return around, since I like having an API difference between "this platform can return from their event loop" or not.

rib commented 1 year ago

f you dive deep enough down, the MacOS event loop is based on Mach ports, which can be inserted into a kqueue and polled just like a TcpStream.

Yeah ,that makes sense. It's been ages since I went digging but recall that each time I went looking I didn't find any stable/public APIs that would let me consolidate a RunLoop into a file descriptor.

In terms of separate run vs run_return entry points, the main trade off I see is that it would force you to have conditional code for application entrypoints, between web/ios and other platforms which would quite likely be redundant. The fewer things that applications have to change per-platform the better ideally I think.

There's a separate Android issue open here that also advocates for consolidating run + run_return: https://github.com/rust-windowing/winit/issues/2709. My thinking has generally been that there's a small trade off here but overall I think it's reasonable that a run method that can return a Result doesn't necessarily have to actually ever return on iOS and Web. It's comparable to how UIApplicationMain is technically a function that returns an int32 even though it's also documented to never return on iOS.

At least if there's a preference to keep run and run_return then on Android I think we should remove run since there's no real way we can not return without just entering an infinite loop which doesn't have the same result of effectively terminating the application. The current behaviour of killing the whole process isn't appropriate on Android.

rib commented 1 year ago

Being able to return a Result does also leave wiggle room for being able to return an error before reaching the point of no return on platforms like Web / iOS

notgull commented 1 year ago

In terms of separate run vs run_return entry points, the main trade off I see is that it would force you to have conditional code for application entrypoints

The idea is that there shouldn't be any conditional code between run and run_return. If your code chooses between run and run_return, it's likely that you can just take your run_return branch and turn it into a run statement. The strategy behind run_return is that, if you use it, you're intrinsically doing something that's not possible on a platform where the event loop runs forever, so you should just compile-error on that platform anyhow.

I think it's reasonable that a run method that can return a Result doesn't necessarily have to actually ever return on iOS and Web. It's comparable to how UIApplicationMain is technically a function that returns an int32 even though it's also documented to never return on iOS.

The main advantage of Rust as a whole is that you can enforce API invariants at compile time using the type system. exit() never returns, so its return type is !. C/ObjectiveC can’t express this on a type level, which is why it returns an int. This is one of Rust’s strengths; it is nearly impossible to run into the footgun where someone has code after the run() function. Consolidating run_return() into run() removes this advantage.

rib commented 1 year ago

It's kind of interesting seeing that SDL is wrestling with figuring out how to go the other way and better support the callback model instead of the polling model: https://github.com/libsdl-org/SDL/issues/6785

SDL has a lot more baggage than Winit so it's not really surprising that SDL still generally revolves around their old polling model but this has required them to add various hybrid workarounds for handling callbacks where needed, such as https://wiki.libsdl.org/SDL2/SDL_iPhoneSetAnimationCallback

SDL also has the problem on Windows that the event loop gets blocked by moving / resizing windows, which they have also seen could be fixed by moving to a callback model: https://github.com/libsdl-org/SDL/issues/1059

Looking at how SDL polls on MacOS it feels pretty unpleasant seeing how they have a hand-wavy kludge timeout of 0.000002 seconds like this:

    /* Let the run loop run for a short amount of time: long enough for
       touch events to get processed (which is important to get certain
       elements of Game Center's GKLeaderboardViewController to respond
       to touch input), but not long enough to introduce a significant
       delay in the rest of the app.
    */
    const CFTimeInterval seconds = 0.000002;

    /* Pump most event types. */
    SInt32 result;
    do {
        result = CFRunLoopRunInMode(kCFRunLoopDefaultMode, seconds, TRUE);
    } while (result == kCFRunLoopRunHandledSource);

but Winit could potentially do something similar too :/

rib commented 1 year ago

In terms of separate run vs run_return entry points, the main trade off I see is that it would force you to have conditional code for application entrypoints

The idea is that there shouldn't be any conditional code between run and run_return. If your code chooses between run and run_return, it's likely that you can just take your run_return branch and turn it into a run statement. The strategy behind run_return is that, if you use it, you're intrinsically doing something that's not possible on a platform where the event loop runs forever, so you should just compile-error on that platform anyhow.

Please consider that Android can't really support run() -> ! properly and iOS + Web can't support run_return() so neither is currently usable in all cases.

Additionally most platforms can return an exit status and if your application would ideally like to get that exit status where possible then you're forced to have conditional code even if you're perfectly happy for the the code that looks at the result to never be reached on iOS + Web.

If we were to consolidate and just have a run function that could return a Result then we could consistently use that on all platforms.

If run is considered as a cross-platform, portable API (since that's essentially the purpose of Winit) its reasonable to say that it can return a Result. It can be seen as a platform-specific, implementation detail that on iOS and Web the function actually won't return.

It's nice that Rust supports the notion that function doesn't return via -> ! for something like process::exit() but it would also be OK to not use that in a situation like this where the API only sometimes doesn't return, depending on the backend. It's a trade off that offers cross-platform portability which seems worthwhile to me.

rib commented 1 year ago

it is nearly impossible to run into the footgun where someone has code after the run() function. Consolidating run_return() into run() removes this advantage.

unfortunately there are footguns hidden in many places though :)

This design has led to us being shot in the foot with the Android backend instead.

rib commented 1 year ago

Oh, I also just noticed that run() -> ! is problematic for the web backend too because it's awkward to strictly avoid returning, see: https://github.com/rust-windowing/winit/pull/2208

The web backend actually has yet another spawn() alternative to run() that avoids the -> !

Even more so it seems like it could be good to consolidate run, run_return and spawn too.

kchibisov commented 1 year ago

Please consider that Android can't really support run() -> ! properly and iOS + Web can't support run_return() so neither is currently usable in all cases.

I question the IOS here, do you really can't back to the app at some point of time at all, maybe? I'd assume on web you just pass some sort of closure and get 'fed` with events.

I think what we can do is to separate the web and ios platforms, in their own park as in not having run on them, and having the separate call to do so. The thing is that you must have some way to stuff to de dropped and you have way less chances of missing drop with !, then with a function that does never return.

So I tend to think that we might want to change the way run works in a way that it behaves similar to the current run return and have some special platforms which transfer control flow.

Likely such platforms are the minority, and you need special handling for them as well to even start working with them (You can't just write stuff for ios, that's not how it works, and you don't have everything in place on web anyway, in fact, we already have special methods just for the web backend).

What I think we should do is for run to take the entire event loop consuming it and returning on finish, meaning that you won't be able to rerun and get your exit status/process the exit.

This is not ideal suggestion, but it'll work around some of the issues, and I really believe that you don't need to store the event loop at all anywhere in your program. You get the handle to the loop anyway in each callback.

If run is considered as a cross-platform, portable API (since that's essentially the purpose of Winit) its reasonable to say that it can return a Result. It can be seen as a platform-specific, implementation detail that on iOS and Web the function actually won't return.

I don't like that, it'll screw the destructors assumption, you might consider that stuff will run and try to use them after the run, because typesystem doesn't tell you that the function is actually no return, so I'd suggest to have separate method for the ios and the web is to have a separate method, probably even named the same, but on some trait, so you could still theoretically write cross platform code, but on special platforms you'll be warned that your code is dead due to reason.

rib commented 1 year ago

I question the IOS here, do you really can't back to the app at some point of time at all, maybe? I'd assume on web you just pass some sort of closure and get 'fed` with events.

yeah it's hard to be 100% sure that there's no corner case where UIApplicationMain can technically return. At least the docs seem to state quite clearly that the function will never return.

considering my latest realization that even for web the run() ->! interface is problematic then it doesn't seem great that we impose the -> ! across all platforms - basically just because that's how iOS works.

kchibisov commented 1 year ago

Yeah, ios can live in its shenanigans with its own trait to invoke the -> ! function, I don't really care about it, because it's a locked out platform you can't write software without paying a buck to apple. It's not easy at testing and sometimes I have a feeling that it's barely tested except some "people" who know how to cook it right.

Given that winit dominates on desktop, I'd rather tune for the desktop API then try to use the worst possible API because of how the IOS works.

rib commented 1 year ago

I don't like that, it'll screw the destructors assumption, you might consider that stuff will run and try to use them after the run, because typesystem doesn't tell you that the function is actually no return, so I'd suggest to have separate method for the ios and the web is to have a separate method, probably even named the same, but on some trait, so you could still theoretically write cross platform code, but on special platforms you'll be warned that your code is dead due to reason

I tend to think this fear is being over stated, but having a separate function just for ios also sounds kinda fine.

like you say, you're probably going to have a completely separate entry point for supporting ios anyway - like you would for Android and Web too.

Given that winit dominates on desktop, I'd rather tune for the desktop API

haha, that's not the right attitude ;-p

Mobile should be a first class citizen imho, but it's reasonable to account for the fact that there will be bespoke entrypoints for some platforms.

rib commented 1 year ago

(though tbh I do agree with the apple tax complaint as a reason to care less about ios when that makes it harder to test + maintain the backend)

At least that's not the case with Android.

rib commented 1 year ago

What I think we should do is for run to take the entire event loop consuming it and returning on finish, meaning that you won't be able to rerun and get your exit status/process the exit.

This would break the use case described in https://github.com/rust-windowing/winit/issues/2431 though? or maybe I'm confused and you're only talking about what to do for iOS here?

Ignoring iOS / Web, I think we want one function that's more or less like run_return as it is now (except it will be clear that it's not for polling).

It's potentially desirable to be able to call run_return more than once for the #2431 use case, but that's not the same as using it for polling.

kchibisov commented 1 year ago

Mobile should be a first class citizen imho, but it's reasonable to account for the fact that there will be bespoke entrypoints for some platforms.

I can run my winit app on my pinephone via Wayland, sure. Run alacritty like that will everything including virtual keyboards and such working, no extra glue.

The mobile you're talking about is Android, and for Android I'm fine with the stuff around it, for ios I don't care. So your mobile is one platform, while desktop is like 4 and the primary audience of winit is either Linux guis, game engines, and might be some stuff.

We are at the point, where we have no clue whether the iOS backend even works, no one even bothered to contribute ios backend into glutin, and in fact it was broken way before I've removed it, so things were never tested in the first.

My attitude is like that towards desktops is that it's our reality and users, besides Android there's no good backend inside the winit, and that's likely before someone has interest in Android backend.

(though tbh I do agree with the apple tax complaint as a reason to care less about ios when that makes it harder to test + maintain the backend)

Be aware that I for example can't even pay for such thing if I wanted to, so the platform don't even exists for me. You must also have a macOS to even write for it.

This would break the use case described in #2431 though? or maybe I'm confused and you're only talking about what to do for iOS here?

The mentioned issue is a 'polling' issue, not? And we want to rip polling all together. macOS is very restrictive on where you create and drop window, you must do that inside the callback.

For that use case if we agree on the polling API we could simply have a separate call with &mut and an interface for such polling strategy as you suggest?

I might be wrong here, maybe you're allowed to pause and run event loop ondemand cross platform, but it all sounds just like polling.

rib commented 1 year ago

To my mind the #2431 case isn't conceptually polling.

Polling implies two notable things that make it problematic:

  1. you're trying to handle events outside of winit (like in the example linked in this issue) and therefore your event handling can't be automatically synchronized with the system when needed and the backend has to buffer callback events.
  2. The OS/window system needs some efficient mechanism for pumping/polling/checking events in a way that doesn't indefinitely block an outer/external loop.

Neither of these apply to the #2431 usecase. Event handling remains within winit callbacks as normal and there's nothing unusual with how the loop runs while active and the loop will continue running as long as it's needed. For #2431 it's just a matter of being able to re-run the loop. There's also no expectation that windows need to survive between separate runs. The persistent loop would juat maintain a display server connection between runs. That seems like it could Just Work on Windows, Linux and MacOS without much fuss. It wouldn't work on Android but that usecase also just doesn't apply to a mobile OS.

Suggesting polling for this usecase would actually be problematic because it will be tricky to avoid burning 100% of the cpu, esp on MacOS if they have nothing to put within an external event loop that will naturally throttle it. (That's also what they see with their current workaround)

kchibisov commented 1 year ago

Well, the polling that run_return does is that you call to it with Poll and it'll instantly return back to you the control. So some apps check it in demand, collect the events into some sort of vector and process outside of the winit. It's not efficient in a way you mean, but it's also called polling, since you poll the events from time to time.

The case mentioned is that it runs event loop from time to time to do the things needed, which is how the polling right now is done.

So the polling we're talking is a traditional polling definition as in actively sampling the source, where the source is run_return closure user runs ondemand, that 'ondemand` is a user poll interval.

The thing you mean as in 'epoll' like efficient, kernel aided polling, which is not possible on all platforms, and winit never aimed to support it.

Neither of these apply to the #2431 usecase. Event handling remains within winit callbacks as normal and there's nothing unusual with how the loop runs while active and the loop will continue running as long as it's needed.

Once you're out of run_return nothing is being run, the event loop was simply stopped.

There's also no expectation that windows need to survive between separate runs. The persistent loop would juat maintain a display server connection between runs. That seems like it could Just Work on Windows, Linux and MacOS without much fuss. It wouldn't work on Android but that usecase also just doesn't apply to a mobile OS.

Yes, that's why I'm not even sure why run_return even exists on Android, it wasn't there in the first place. The goal for run_return originally was to do what I've described as Polling, yet most parts of it where forgotten and now the run_return flow of events looks a bit weird.

kchibisov commented 1 year ago

Suggesting polling for this usecase would actually be problematic because it will be tricky to avoid burning 100% of the cpu, esp on MacOS if they have nothing to put within an external event loop that will naturally throttle it. (That's also what they see with their current workaround)

I'm not suggesting, it's actually a polling what they show in both approaches, they sample winit from time to time and said that it doesn't work for them, so they must actually run the thing all the time in a busy loop.

The issue is that when they have e.g. poll interval defined as "I'll poll once I want to create a window" it doesn't work.

Be aware that when nothing is being run no callbacks dispatched.

rib commented 1 year ago

For the #2431 use case I dont think they want to poll by any definition. They want to run a temporary event loop for a temporary UI. The first window might be opened and used for 1hr or however long the user wants it and then close before again requesting to again run the event loop for a new UI that could be used for another hour.

My interpretation is that they may be embedding the use of Winit within a larger application that may not be Winit based, e.g. for showing a profiling tool that visualizes application-specific stats.

Via some action they want to spin up an event loop to show a window and keep that running like a normal event loop until the window is destroyed. In the future the user might repeat the action to show that UI and they want to re-run the event loop.

For that use case it's OK, and expected that there will be no callbacks or events etc from the event loop, while it's not running, while there is no UI showing - the event loop is effectively idle and isn't needed.

kchibisov commented 1 year ago

For that use case it's OK, and expected that there will be no callbacks or events etc from the event loop, while it's not running, while there is no UI showing - the event loop is effectively idle and isn't needed.

Yeah, but you don't start fresh each time, you carry the state you've gathered and you will get things enqueued into the queue even when it's being idle on resume.

For the #2431 use case I dont think they want to poll by any definition. They want to run a temporary event loop for a temporary UI.

They keep the event loop around though, they don't drop it, so it's the same loop, you just interact with it from time to time, as in ondemand. If they dropped the loop and then created a new one, then it won't be polling.

rib commented 1 year ago

Yes, that's why I'm not even sure why run_return even exists on Android, it wasn't there in the first place. The goal for run_return originally was to do what I've described as Polling, yet most parts of it where forgotten and now the run_return flow of events looks a bit weird.

Based on the historic use case for run_return as a polling APi, yeah I agree it probably shouldn't have been added to the Android backend, but at the same time run() ->! also shouldnt have been implemented either and a separate API that wasnt for polling but could return should have been added ideally. Instead the scope of run_rerurn has got watered down and run_return has gained a dual purpose. run_return is currently trying to be an alternative to run that let's you return an exit status, and it's also trying to support polling.

Just to clarify here, when I said this wouldn't work on Android, I was only talking about the #2431 use case. And to reiterate here, Android can support returning an exit status from an api like run_return and it can't properly support a function like run() -> ! that doesn't return.

kchibisov commented 1 year ago

Just to clarify here, when I said this wouldn't work on Android, I was only talking about the #2431 use case. And to reiterate here, Android can support returning an exit status from an api like run_return and it can't properly support a function like run() -> ! that doesn't return.

Yeah, I get that. I hate when lib does -> ! without a reason.

rib commented 1 year ago

For the #2431 use case I dont think they want to poll by any definition. They want to run a temporary event loop for a temporary UI.

They keep the event loop around though, they don't drop it, so it's the same loop, you just interact with it from time to time, as in ondemand. If they dropped the loop and then created a new one, then it won't be polling.

Yeah, but in this case the persistent event loop is basically just acting as a persistent display server connection - they don't want to maintain windows between runs.

We'll just be talking semantics if we worry about whether to call this "polling" or not. I think it's helpful for this discussion though that we use a different term for this to be clear that this use case has significantly different requirements from the polling model that we've been discussing above. So whatever we call this, the #2431 issue wouldn't need to use the .start(), .poll_events() and .finish() APIs as proposed above. These wouldn't be a good solution to #2431. #2431 should be solvable with an API like run() that is able to return (not designed to be used for event polling in the way discussed earlier)