lem-project / lem

Common Lisp editor/IDE with high expansibility
http://lem-project.github.io/
MIT License
2.4k stars 180 forks source link

Slow SDL2 GUI (found the cause) #1092

Closed vindarel closed 4 months ago

vindarel commented 1 year ago

Since this commit, I have a severe lag in the GUI:

https://github.com/lem-project/lem/commit/f97c2482

with this, ALL Lem commands have a severe lag, for example typing anything in M-x and displaying the directory listing of the Lem project root takes 2 seconds. Without it, it's instantaneous. In all cases it works nice as always in the ncurses version.

This doesn't affect the Lem 2.1 release.

@timmydo What was this commit fixing?

sdl2: Run paint operations on SDL2 main thread

1 file changed, 15 insertions(+), 17 deletions(-)
frontends/sdl2/main.lisp | 32 +++++++++++++++-----------------

modified   frontends/sdl2/main.lisp
@@ -144,8 +144,9 @@
 (defun current-renderer () (display-renderer *display*))

 (defun call-with-renderer (function)
-  (bt:with-recursive-lock-held ((display-mutex *display*))
-    (funcall function)))
+  (sdl2:in-main-thread ()
+    (bt:with-recursive-lock-held ((display-mutex *display*))
+      (funcall function))))

 (defmacro with-renderer (() &body body)
   `(call-with-renderer (lambda () ,@body)))
@@ -868,21 +869,18 @@
                               (sdl2:push-quit-event)))))
              (declare (ignore editor-thread))
              nil)))
-    (if (sbcl-on-darwin-p)
-        (progn
-          ;; called *before* any sdl windows are created
-          (sdl2:set-hint :video-mac-fullscreen-spaces
-                         ;; the sdl2 library expects zero or one NOTE since this
-                         ;; is a preference let's not change the default here
-                         ;; because it's easy enough to change it via a user's
-                         ;; config
-                         (if (lem:config :darwin-use-native-fullscreen) 1 0))
-          (sdl2:make-this-thread-main (lambda ()
-                                        (create-display #'thunk)
-                                        (cffi:foreign-funcall "_exit"))))
-        (let ((thread (bt:make-thread (lambda ()
-                                        (create-display #'thunk)))))
-          (bt:join-thread thread)))))
+    (progn
+      ;; called *before* any sdl windows are created
+      (sdl2:set-hint :video-mac-fullscreen-spaces
+            ;; the sdl2 library expects zero or one NOTE since this
+            ;; is a preference let's not change the default here
+            ;; because it's easy enough to change it via a user's
+            ;; config
+            (if (lem:config :darwin-use-native-fullscreen) 1 0))
+      (sdl2:make-this-thread-main (lambda ()
+                   (create-display #'thunk)
+                   (when (sbcl-on-darwin-p)
+                     (cffi:foreign-funcall "_exit")))))))

 (defmethod lem-if:get-background-color ((implementation sdl2))
   (with-debug ("lem-if:get-background-color")

I am on LinuxMint (edit: Mate desktop), SBCL 2.1.4, running Lem from sources, SDL 2.0.10.

Best,

timmydo commented 1 year ago

@vindarel it was fixing SDL2 for a lot of people including myself. Before the change Lem failed to draw anything. SDL2 drawing operations should only be coming from a single thread. I don't think Lem has this concept which is one of the reasons why the SDL2 implementation is buggier than the curses implementation. I don't know if there is a cleaner way to fix it given the way the drawing code is written.

vindarel commented 1 year ago

Thx. People impacted were on different platforms, either on Linux than macOS or W$ ?

timmydo commented 1 year ago

I think #630 was the main thread for it. There might have been one or two other issues where people reported problems. I think everyone that posted in that thread was running a Linux variant. IIRC, Lem was using at least two threads for drawing operations. One thread has the SDL event loop and then Lem has its own event loop in another thread.

https://wiki.libsdl.org/SDL2/FAQDevelopment#can_i_call_sdl_video_functions_from_multiple_threads

vindarel commented 1 year ago

more precisely, in my case, this is slow:

;; the commit change
(sdl2:in-main-thread ()                                   
  (bt:with-recursive-lock-held ((display-mutex *display*))
    (funcall function)))                                  

this is fast:

;; before change
(defun call-with-renderer (function)
  (bt:with-recursive-lock-held ((display-mutex *display*))
    (funcall function)))
seanfarley commented 1 year ago

This is so, so difficult. Daniel Kochmański wrote a great summary of using SDL2 as a backend for McCLIM which I'll try my best to summarize the important bits here.

The main limitation of the library is that it is not thread safe - all SDL2 functions are expected to be called from a single thread

This right here is the biggest source of problems.

lem tries to follow McCLIM's approach of "have a singular loop and communicate with it using a thread-safe channel" but there is another caveat and that's:

Some window managers require all drawing operations to be on the main thread (usually this is macOS but it happens on some linux desktop environments, too; no idea about Windows). Combining all these clues, I'm guessing that @vindarel is experiencing some deadlock / timeout / software rendering bug.

Ultimately, I think we'll need to really hammer out the locking mechanism in lem to coordinate all the drawing operations on the main thread. (so, backing out https://github.com/lem-project/lem/commit/f97c2482 isn't really a solution here ☹️)

timmydo commented 1 year ago

Yeah, in an ideal world, there'd be a debug assertion where anything using SDL2 rendering could check if it's the main thread, and if not, fail fast. That would help weed out the bugs. When I was looking at it originally, it was a pretty big refactor because I think there were two different event loops and I wasn't comfortable making architectural changes that changed other UI frameworks.

Sasanidas commented 11 months ago

Umm given that the cause what found, should we close this issue? or the problem continues? (ping @vindarel )