joaotavora / sly

Sylvester the Cat's Common Lisp IDE
1.23k stars 139 forks source link

Floating-point errors with GTK #589

Closed bradrn closed 1 year ago

bradrn commented 1 year ago

Today I (as a relative newcomer to Common Lisp) decided to play around a bit with cl-gtk4, but quickly ran into a bunch of FLOATING-POINT-INVALID-OPERATION and DIVISION-BY-ZERO errors. In fact, these conditions were triggered on nearly every program I tested, including those distributed as examples with cl-gtk4. A bit of poking around revealed that those errors only occurred when the program was compiled and loaded into SLY and cl-gtk4 had been loaded using one of the following methods:

It did not occur, however, when I evaluated (ql:quickload :cl-gtk4) within the SLY REPL before compiling and loading my program. Furthermore, it did not occur when I loaded my program into SBCL directly using sbcl --load. And it appeared that cl-gtk4 was not the only affected system: cl-cffi-gtk showed the same problem. In fact, I wouldn’t be surprised if this issue was common to all systems using the FFI, though aside from those two I’m not sure what I could test.

For reference, I’m on Arch Linux (well, EndeavourOS, but they’re basically the same), using SLY 1.0.43 with Emacs 28.2 and SBCL 2.3.3. The following program is sufficient to trigger the bug for me:

(eval-when (:compile-toplevel :load-toplevel :execute)
  (ql:quickload :cl-gtk4))

(gtk4:define-application (:name my-app)
  (gtk4:define-main-window (window (gtk4:make-application-window :application gtk4:*application*))
    (setf (gtk4:window-title window) "Test")
    (let ((box (gtk4:make-box :orientation gtk4:+orientation-vertical+
                              :spacing 4)))
      (let ((view (gtk4:make-button :label "add")))
        (gtk4:box-append box view))
      (setf (gtk4:window-child window) box))
    (unless (gtk4:widget-visible-p window)
      (gtk4:window-present window))))

(my-app)

A window does appear, but it triggers FLOATING-POINT-INVALID-OPERATION when focussed, and ,restart lisp is required to close it.

joaotavora commented 1 year ago

This could be some kind of problem related to threads, because it doesn't seem to occur when you quickload from the repl. When you say that minimal program triggers the error, exactly how are you loading it and what is the backtrace of the error? You didn't specify if the problem happens if you type that in a file and eval it, or merely compile etc. I think we need an unambiguous recipe including every action you take.

bradrn commented 1 year ago

That was impressively quick!

exactly how are you loading it

Just sly-compile-and-load-file.

what is the backtrace of the error?

I can’t seem to figure out how to copy from the SLY error buffer, so I hope this screenshot is acceptable:

image

I think we need an unambiguous recipe including every action you take.

  1. Make a file with the code given earlier
  2. M-x sly-compile-and-load-file RET

That’s it, as far as I’m aware.

EDIT: And I forgot that (of course) you need to start SLY too, which I’m doing with M-x sly RET after navigating to the file in Emacs.

joaotavora commented 1 year ago

Screenshot ok, but copying from the error buffer is just copying from any text buffer in Emacs, activate region and M-w.

What happens if you evaluate my-app from the repl, instead of having it as a top level form?

What happens if you load the resulting SLY-compiled fasl file separately using load (but now again with the top-level form)? From within SLY (the repl)? From outside SLY (sbcl --load)?

bradrn commented 1 year ago

copying from the error buffer is just copying from any text buffer in Emacs, activate region and M-w.

I use evil-mode though, which makes these things a bit trickier…

What happens if you evaluate my-app from the repl, instead of having it as a top level form?

Same thing. (In fact this is what I’ve been doing all along; I only tried it at the top level when writing this issue.) [EDIT: turns out this was incorrect, see below]

What happens if you load the resulting SLY-compiled fasl file separately using load From within SLY (the repl)? From outside SLY (sbcl --load)?

From outside SLY, sbcl --load works without a problem, as I said in my initial post. I’ll try the others a bit later when I get some more time.

joaotavora commented 1 year ago

From outside SLY, sbcl --load works without a problem, as I said in my initial post. I’ll try the others a bit later when I get some more time.

Please post exactly the sbcl invocation. I am talking about loading a SLY-compiled FASL file, not a lisp file.

bradrn commented 1 year ago

Ah, I see, I was using sbcl --load my-program.lisp, rather than loading the FASL file. I’ll try that later.

bradrn commented 1 year ago

OK, so loading the FASL file works without problems, no matter how I do it: whether I run sbcl --load file.fasl directly, or do (load "cl-test.fasl") within SLY. (And for good measure, (load "cl-test.fasl") from SBCL works too!)

joaotavora commented 1 year ago

Ok, that's interesting. Thanks.

Coming back to the original recipe, what if you modify the first form to:

(eval-when (:compile-toplevel)
  (ql:quickload :cl-gtk4))
bradrn commented 1 year ago

Seems like that gives basically the same result.

Another interesting data point: I tried using SLIME instead of SLY, and it behaved even worse — it gave a FLOATING-POINT-INVALID-OPERATION in all cases, even when I ran (ql:quickload :cl-gtk4) in the REPL before loading, whereas as already mentioned SLY handles this case correctly. SLIME coped fine with loading the FASL file, though.

bradrn commented 1 year ago

After further investigation prompted by my previous post, I discovered I was wrong about something — contradicting what I said earlier, running (my-app) at the top level does give different results to running it in the REPL! Specifically, when I run (ql:quickload :cl-gtk4) in the REPL, then compile+load my file, then run (my-app) at the top level, it does not work. However, when I do the same thing but run (my-app) in the REPL, it does work. Thus, I suspect the problem with SLIME could well have been because I was running (my-app) at the top level, not because SLIME behaves differently to SLY.

(Oh, and for good measure, doing (eval-when …) still prevents it from working correctly in all cases.)

joaotavora commented 1 year ago

running (my-app) at the top level does give different results

You've now confused me. I don't know what is what. It's probably not your fault, I'm easily confused by these kinds of incremental reports, I don't know what to map to {failure|success} anymore. I need a summary of all your (most relevant) findings so far, in the most unambiguous way possible.

bradrn commented 1 year ago

That’s very fair; I’m getting a bit confused myself. Let me attempt to summarise:

In SLY, the various combinations produce the following results:

Package \ Function evaluate in SLY REPL top-level form
ql:quickload in REPL :heavy_check_mark: FLOATING-POINT-INVALID-OPERATION
ql:quickload at top-level READ error during COMPILE-FILE: Package GTK4 does not exist READ error during COMPILE-FILE: Package GTK4 does not exist
eval-when wrapping ql:quickload at top-level FLOATING-POINT-INVALID-OPERATION FLOATING-POINT-INVALID-OPERATION

In SBCL, they instead produce the following results:

Package \ Function evaluate in SBCL REPL top-level form
ql:quickload in REPL :heavy_check_mark: :heavy_check_mark:
ql:quickload at top-level hangs :heavy_check_mark:
eval-when wrapping ql:quickload at top-level :heavy_check_mark: :heavy_check_mark:

Meanwhile, it seems that directly loading the FASL works everywhere, though I haven’t tested it nearly as exhaustively.

bradrn commented 1 year ago

Incidentally, thanks for the edits @joaotavora! I’m still fairly new to Common Lisp and sometimes am unsure of the correct terminology to use.

joaotavora commented 1 year ago

No problem, I'm still unclear on what exactly some cells means, though.

For example, in the second table, the topmost leftmost cell, I assume that's:

sbcl --load file-where-there-is-no-top-level-my-app-form.lisp
;;; wait for REPL to appear
;;; type (my-app) and ENTER
bradrn commented 1 year ago

Yep, that’s correct, although for consistency with the other cells I used sbcl followed by (load "my-file.lisp"). (Though either way it should have the same effect.)

bradrn commented 1 year ago

Just found that this issue has already been reported as https://github.com/crategus/cl-cffi-gtk/issues/85. The explanation there makes sense, and it appears the recommendation is to mask floating-point traps.

joaotavora commented 1 year ago

Brilliant, and I'm patting myself on the back that my first guess wasn't far off:

The issue isn't directly related to SLIME. What happens is that if you use C-c C-k in Emacs to compile a file that loads the GTK library, the compilation happens in a different thread