Linux: scm_event strange delay

zwieblum commented 10 months ago

In x11_microgl.c same event sequences cause (hook:event t 0 0) in ecentloop.scm to add a strange delay. This can be observed with the DemoCube Application: Moving the mouse inside the window renders each frame at ~ 0.05s, not moving the mouse renders each frame at 0.5 seconds (10 times slower).

Details: If the Sequence of XEvents from here: https://github.com/part-cw/lambdanative/blob/98e806bddaf1e6e9ae2e6a99bdabdd1dc10c986c/loaders/x11/x11_microgl.c#L202 consists of a series of MotionNototify followed by a Expose events, then the call of (hook:event ...)
https://github.com/part-cw/lambdanative/blob/98e806bddaf1e6e9ae2e6a99bdabdd1dc10c986c/loaders/x11/x11_microgl.c#L276 takes ~ 0.05 seconds to finish. When the motion events stop and only Expose events arrive, then the calls to (hook:event ...) take successivly longer to complete till they settle at 0.5 seconds.

I am stuck at debugging this. Where does p2 come from, that is set here in this line: https://github.com/part-cw/lambdanative/blob/98e806bddaf1e6e9ae2e6a99bdabdd1dc10c986c/modules/eventloop/eventloop.scm#L284

Pleas note: this is not an issue of optimizing the event loop. This is an issue inside the procedure that is called as hook:event

mgorges commented 10 months ago

Please look if this relates to https://github.com/part-cw/lambdanative/pull/330 and modules/ln_glgui/glgui.scm#L338, which you can disable by calling (glgui-timings-at-10msec!) which we do in a lot of apps as the inconsistent timing thing broke a bunch of things for us, but people wanted the default changed?

zwieblum commented 10 months ago

Ok ... by some obscure reson there are delays built into the render loop. If that should help preventing long running renderings to pile up: nop, this wont. In fact it makes things worse.

So the quick fix is to remove/comment out these two lines (this makes the GUI quite responsive): https://github.com/part-cw/lambdanative/blob/98e806bddaf1e6e9ae2e6a99bdabdd1dc10c986c/modules/ln_glgui/glgui.scm#L320 https://github.com/part-cw/lambdanative/blob/98e806bddaf1e6e9ae2e6a99bdabdd1dc10c986c/modules/ln_glgui/glgui.scm#L322

This removes some variables from their right to exist. I'll post a patch to clean up the code later.

mgorges commented 10 months ago

This is all part of the pull request we merged from #330, so please read that long discussion of how we got to what I eventually merged. I believe that calling (glgui-timings-at-10msec!) once on initialization disables all of this? Or try (glgui-timings-at-sec! 0.0)

zwieblum commented 10 months ago

I did. (glgui-timings-at-sec! 0.0) works, too. But it still leaves the problem that a delay inside a function that should complete as fast as possible is a bad idea to begin with. Last time I checked the code (2018?) the delay was not there and the render loop worked as expected on X11.

It's not clear to me which problem the delay should solve - if it solves any at all. If it should prevent extensive redraw events, then this should be handled in platform specific code, as it it different for each platform.

e.g on X11 the correct way to handle long running redraw events would be to read all events and just call redraw once - just like it's done with Expose

mgorges commented 10 months ago

@zwieblum maybe you should discuss this with @0-8-15 who authored all those changes as I assume there were good intentions for it?

zwieblum commented 10 months ago

@0-8-15: could do, but how? Do you have an email address of him?

zwieblum commented 10 months ago

Ok, this is kind of funny. IMO the delay problem is limited to X11/Win32 and console. #330 with commit f0b69b1 removes the delay from the main loop for microgl and console. The delay is reinstalled in the scheme part of the redraw event, so basicly it's moved to a different place. I can see why it's done (delay and moving it), but it cures the symptom, not the desease.

The reason the event loop getting flooded with `Expose events (and consumes 100% of one core) are these lines: https://github.com/part-cw/lambdanative/blob/9d0830bb23d87f9142e1c2f3f132f59cf16df415/loaders/common/main.c#L172 https://github.com/part-cw/lambdanative/blob/9d0830bb23d87f9142e1c2f3f132f59cf16df415/loaders/common/main.c#L174 non-GUI - I have not figured out why it's there at all: https://github.com/part-cw/lambdanative/blob/9d0830bb23d87f9142e1c2f3f132f59cf16df415/loaders/common/main.c#L179 Most of these events are then removed in x11_microgl.c and the delay in the scheme part reduces the cpu usage to a sane level.

IMO this was introduced to make the GUI program as responsive as possible, but as a sideffect one core ran on 100%. The usual way to work around this problem is to use a timer procedure that injects Expose events at a fixed rate e.g. 60Hz. so the queue is not flooded - or let the user choose a sane strategy. The non-GUI part of the program would run in a seperate thread unaffected by the GUI -but that's up to the user/application programmer.

Did I miss a crutial part? Does the 100% CPU usage occure on android/ios, too? IMO the scheme code of #330 does not use the throtteling on android/ios, and the android template fires a timer event every 250ms.

mgorges commented 10 months ago

As far as I know, this doesn't happen on mobile platforms or macOS; I am personally quite happy with the high CPU load, but others had significant concerns.

zwieblum commented 10 months ago

Ok, this makes the framerate stabel and predictable at ~ 50Hz - just like you would need for games. Set (glgui-timings-at-sec! 0.0) or any value smaller 16ms to disable #330. If you don't, then the behaviour is widely unchanged, but the eventqueue is not saturated with Expose events. Note: this only works on X11, win32 microgl is not changed. x11_microgl.c.patch.txt

mgorges commented 10 months ago

Thanks, I'll try the patch on the weekend as I don't fully understand the select piece by simply reading the patch. Regarding the tv.tv_usec = (last_time+10000-now) % 10000; approach, https://github.com/part-cw/lambdanative/commit/2815ec57a61835520909647103e97850d96a06a1 switched the scheduler from having an offset-based approach to a difference-based one, which improved the accuracy of the timing. Also, isn't that 10 msec, not 20 msec?

zwieblum commented 10 months ago

I refined the patch a bit. When you activate line 218 after applying the patch you get the time in us between frame redraws. Timeout ignores other X11 events now, so the framerate has less jitter.

I set the timout to 10ms, but the actual timer timeout resolution is lower --> you end up at ~ 16.6 ms, even for 20ms timeout (the jitter just gets worse on 20ms). This corresponds to you X11 framerate. Realtime accuracy with jitter < 10us is a different kind of beast (look at linuxcnc).

Framerate test with DemoCube: activate line 218, add (glgui-timings-at-sec! 0.0) to DemoCube/main.scm, then run ~/.cache/lambdanative/linux/DemoCube/DemoCube | awk '{print int($0/1000);}'|sort|uniq -c to get a histogram with ms resolution.

@ select: you can treat the X11 Display/Event queue as a file and therefor wait for events comming down the stream - and timeout when none come. @ sheduler: I might be wrong, but the sheduler runs on a different time scale than the X11 eventloop. x11_microgl.c.patch.txt

0-8-15 commented 9 months ago

@zwieblum maybe you should discuss this with @0-8-15 who authored all those changes as I assume there were good intentions for it?

As the saying goes: The worst harm may often result from the best intentions.

For use cases as mine (desktop app and mobile) the 100% CPU load was a show stopper.

The cure was to reduce the frame frequency to "reasonable", which meant some kind of configurable.

In order to keep the changes small enough and not rewrite too much, it became the delay.

At the moment I have only so much recall of the topic. Android was somewhat different, I believe. Though I might confuse things here as I tried to heavily modify the android way here.

zwieblum commented 9 months ago

As far as I have seen android was never affected. But the idle timer runs at 250ms intervals https://github.com/part-cw/lambdanative/blob/0ec5fa6cc8f43a9b7735733cb8962e722baca415/loaders/android/bootstrap.java.in#L88

0-8-15 commented 9 months ago

In order to keep the changes small enough and not rewrite too much, it became the delay.

BTW: The delay was actually not conceived as a delay, but rather as (an approximation of) a suspension until the application logic would require a redraw by calling glgui-wakeup!.

However the master branch of the fork I'm using it is already gone.

0-8-15 commented 9 months ago

As far as I have seen android was never affected. But the idle timer runs at 250ms intervals

https://github.com/part-cw/lambdanative/blob/0ec5fa6cc8f43a9b7735733cb8962e722baca415/loaders/android/bootstrap.java.in#L88

When it comes to Android (but elswhere too), there are quite a few more things I changed over there. Though I recall this never became good enough for a pull request.

0-8-15 commented 9 months ago

On Oct 30 2023, zwieblum wrote:

As far as I have seen android was never affected. But the idle timer runs at 250ms intervals https://github.com/part-cw/lambdanative/blob/0ec5fa6cc8f43a9b7735733cb8962e722baca415/loaders/android/bootstrap.java.in#L88

Which is easily disalbled (via https://github.com/part-cw/lambdanative/blob/0ec5fa6cc8f43a9b7735733cb8962e722baca415/loaders/android/bootstrap.java.in#L211 ) by adding idle_tmSchedulRate = 0; in the apps ANDROID_java_oncreate file, if you don't have a special need for the timer.

zwieblum commented 9 months ago

@mgorges : did you have time to look into the patch? @0-8-15 : I'm still stuck at compiling for android (#440) and iOS (#443) :/

part-cw / lambdanative

Linux: scm_event strange delay #444