Closed vindarel closed 4 months ago
@vindarel it was fixing SDL2 for a lot of people including myself. Before the change Lem failed to draw anything. SDL2 drawing operations should only be coming from a single thread. I don't think Lem has this concept which is one of the reasons why the SDL2 implementation is buggier than the curses implementation. I don't know if there is a cleaner way to fix it given the way the drawing code is written.
Thx. People impacted were on different platforms, either on Linux than macOS or W$ ?
I think #630 was the main thread for it. There might have been one or two other issues where people reported problems. I think everyone that posted in that thread was running a Linux variant. IIRC, Lem was using at least two threads for drawing operations. One thread has the SDL event loop and then Lem has its own event loop in another thread.
https://wiki.libsdl.org/SDL2/FAQDevelopment#can_i_call_sdl_video_functions_from_multiple_threads
more precisely, in my case, this is slow:
;; the commit change
(sdl2:in-main-thread ()
(bt:with-recursive-lock-held ((display-mutex *display*))
(funcall function)))
this is fast:
;; before change
(defun call-with-renderer (function)
(bt:with-recursive-lock-held ((display-mutex *display*))
(funcall function)))
This is so, so difficult. Daniel Kochmański wrote a great summary of using SDL2 as a backend for McCLIM which I'll try my best to summarize the important bits here.
The main limitation of the library is that it is not thread safe - all SDL2 functions are expected to be called from a single thread
This right here is the biggest source of problems.
lem
tries to follow McCLIM's approach of "have a singular loop and communicate with it using a thread-safe channel" but there is another caveat and that's:
Some window managers require all drawing operations to be on the main thread (usually this is macOS but it happens on some linux desktop environments, too; no idea about Windows). Combining all these clues, I'm guessing that @vindarel is experiencing some deadlock / timeout / software rendering bug.
Ultimately, I think we'll need to really hammer out the locking mechanism in lem to coordinate all the drawing operations on the main thread. (so, backing out https://github.com/lem-project/lem/commit/f97c2482 isn't really a solution here ☹️)
Yeah, in an ideal world, there'd be a debug assertion where anything using SDL2 rendering could check if it's the main thread, and if not, fail fast. That would help weed out the bugs. When I was looking at it originally, it was a pretty big refactor because I think there were two different event loops and I wasn't comfortable making architectural changes that changed other UI frameworks.
Umm given that the cause what found, should we close this issue? or the problem continues? (ping @vindarel )
Since this commit, I have a severe lag in the GUI:
https://github.com/lem-project/lem/commit/f97c2482
with this, ALL Lem commands have a severe lag, for example typing anything in M-x and displaying the directory listing of the Lem project root takes 2 seconds. Without it, it's instantaneous. In all cases it works nice as always in the ncurses version.
This doesn't affect the Lem 2.1 release.
@timmydo What was this commit fixing?
I am on LinuxMint (edit: Mate desktop), SBCL 2.1.4, running Lem from sources, SDL 2.0.10.
Best,