joaotavora / sly

Sylvester the Cat's Common Lisp IDE
1.23k stars 139 forks source link

Undefined behavior due to modifying shared constants #626

Closed mfiano closed 6 months ago

mfiano commented 6 months ago

Over the last few days, I have been trying to find the source of a very strange bug, that at first seemed like memory corruption, or a related bug in the compiler; SBCL. It turns out, the mrepl class initialization is mutating a cons cell that is a constant -- and not just a constant; one that is common enough to be shared by the SBCL compiler itself: '(*). The body of code responsible for this bug is located here: https://github.com/joaotavora/sly/blob/9c43bf65b967e12cef1996f1af5f0671d8aecbf4/contrib/slynk-mrepl.lisp#L32-L40

It turns out, this bug goes un-noticed because it mutates some internal constant cons cell, and doesn't cause any mishaps that I have seen (even if it is scary that the whole Lisp image could possibly be tainted), however, upon compiling code containing calls to specific functions, such as, but not limited to, slynk-mrepl:send-prompt, the SBCL compiler is actually reading REPL history evaluation results as part of its codegen, and getting confused, because the standard CL variable * is somehow getting read in the compilation of calls to said functions.

It was a nightmare trying to debug this, but I think the easy fix is to simply put a copy-tree around that back-quoted form. I'll leave it up to you to decide what is best, but currently, it is very bad that we are not only mutating a constant cons cell, but a shared one.

joaotavora commented 6 months ago

Hmm, where exactly is the mutation of the cons cell? Where it the value of its car or cdr being changed? Is it in this snippet or somewhere else?

Also, as far as I can tell, even though there are multiple immutable cons cells being created in the evaluation of the backquoted form you show in the snippet, these cells are all "mine". How could they be shared?

mfiano commented 6 months ago

You can thank stassats for doing most of the debug work. He stated it is in this call, wherever it may be: (setf (cdr (assoc special-sym (slot-value repl 'slynk::env))) (symbol-value special-sym))

mfiano commented 6 months ago

You can reproduce the bug by doing the following in a freshly started Lisp image/mrepl: https://gist.github.com/mfiano/b658619a26825ed383baae3779daaa1f

mfiano commented 6 months ago

(replacing 42 with anything obvious, and you will see the mess in the IR1 pass with this value).

mfiano commented 6 months ago

They aren't yours. They are constant data, because the were constructed with quote (well, backquote in this case). Igoring the surrounding noise, '(*) is equivalent to the literal cons cell '(cl:* . nil)

mfiano commented 6 months ago

Even if they were yours, the compiler could do strange things since it is undefined behavior to modify any literal data.

mfiano commented 6 months ago

Either build up the list functionally, or just throw a copy-tree around it.

joaotavora commented 6 months ago

What I wrote is that the ons cells being created are mine, and from what I can see the first two are and the last one is.

You can thank stassats for doing most of the debug work. He stated it is in this call, wherever it may be: (setf (cdr (assoc special-sym (slot-value repl 'slynk::env))) (symbol-value special-sym))

Well, this code shows up here.

(dolist (special-sym '(*package* *default-pathname-defaults*))
          (setf (cdr (assoc special-sym (slot-value repl 'slynk::env)))
                (symbol-value special-sym)))

So, I can't see how (*) is being modified. Only the cdr of the first two cons cells, which are "mine", are being modified.

So maybe stassats can shed light on that, maybe in multiple one-liners.

mfiano commented 6 months ago

We can try pinging @stassats .

mfiano commented 6 months ago

In the meantime, you can try my reproducible example in the gist above, and try tracing to figure out why it does. I couldn't get very far, but stassats said he could reproduce on SBCL HEAD today.

(I tried several older versions of SBCL to try to figure out the offender)

joaotavora commented 6 months ago

I don't have time for that sorry. But I believe you guys! I just don't understand the analysis/conclusion about the (*) yet, and I want to understand it before I put in a fix.

mfiano commented 6 months ago

I agree. I don't like committing anything without understanding everything. I'm sorry I don't have more information for you. This was pretty draining on me as it is.

mfiano commented 6 months ago

I might be able to clear up some confusion, with my current understanding of the CL standard. It states, and I can find this if you don't agree with it, that it is undefined behavior to mutate any literal data. Any cons cell constructed with the reader macro ' or ` is a literal cons cell. The undefined behavior gives compilers the opportunity to condense all these read-time constant cons cells down into a single object, and dish out references to the singleton. They are also free to place them in read-only memory! Undefined, is simply undefined.

joaotavora commented 6 months ago

Two notes:

stassats commented 6 months ago

https://github.com/joaotavora/sly/blob/9c43bf65b967e12cef1996f1af5f0671d8aecbf4/slynk/slynk.lisp#L579

mfiano commented 6 months ago

That makes more sense now.

joaotavora commented 6 months ago

He stated it is in this call, wherever it may be: (setf (cdr (assoc special-sym (slot-value repl 'slynk::env))) (symbol-value special-sym))

[incredibly polite non-greeting entity replies] do (setf (cdr binding) (symbol-value (car binding)))))))))

Ah, that's more like it. So copy-tree it is. Make a PR please @mfiano.

mfiano commented 6 months ago

I gotcha, no prob. Tomorrow though. Again, drained :)

joaotavora commented 6 months ago

I gotcha, no prob. Tomorrow though. Again, drained :)

Non, I gotch a, go to sleep :-)

I might be able to clear up some confusion, with my current understanding of the CL standard. It states, and I can find this if you don't agree with it, that it is undefined behavior to mutate any literal data. Any cons cell constructed with the reader macro ' or ` is a literal cons cell. The undefined behavior gives compilers the opportunity to condense all these read-time constant cons cells down into a single object, and dish out references to the singleton. They are also free to place them in read-only memory! Undefined, is simply undefined.

This is all correct, you can rest easy your understanding is correct.

mfiano commented 6 months ago

Thanks!