Losing connection to lisp sessions

rpgoldman commented 1 year ago

Here's an example of the problem: I have started interactively editing CL code with a repl that I call "m1-allegro" (it's running the beta of Allegro CL for Apple Silicon). Eventually, I get the image so messed up that I kill it and want to restart. At this point I start having troubles:

When I do "sayoonara" and then try to restart, I get an odd "unmatched right parenthesis error." This goes away if I kill the original mrepl buffer.
Instead of getting a new *sly-mrepl for m1-allegro* buffer, I get *sly-mrepl for m1-allegro*<2>.
When I try to send expressions to the lisp, edit definitions, etc. I get "Current connection sly-1 is closed."

I also get odd behaviors if I have two MREPL buffers open (e.g., I was trying to check the behavior of my code on both Allegro and SBCL).

I get the same error plus an empty window with a "Waiting for creation ack for channel ..." (for n a small number) if I try to use the "C-c C-z" keybinding to go to the current connection.

Setting the default connection using sly-list-connection and d does not change this behavior:

I note also that sayoonara on my initial connection (in the picture above, the first allegro line) did not remove it from the list in sly-list-connection, which probably has something to do with why this is all getting so confused.

Sorry to keep up with the questions. I will try to fix these issues myself.

rpgoldman commented 1 year ago

OK, I see the proximate cause: the buffer-local value of sly-buffer-connection is still set to sly-1, which no longer exists.

If I set that variable to nil, and I have set the default connection per the screenshot in my original issue write-up, then things go back to working normally -- we fall through the case for the bad connection and get to the default one.

So I guess what I am looking for is some more graceful way of reconnecting a source buffer to a lisp session. And I'm wondering why these buffer variables persist over the "sayoonara" command.

It looks like (sly-connection) errors out when it sees a bad connection, instead of ignoring it and moving on to the next possible connection. So in sly-eval the test (eq (process-status (sly-connection)) 'open) doesn't work, because (sly-connection) raises an error instead of returning something for which process-status will return somethin glike kaput.

Would it make sense to revise sly-connection? It seems like that is likely to be a very basic function and messing with it might have a lot of unintended consequences.

fstamour commented 1 year ago

I just stumbled a similar situation.

I restarted the lisp process and then (sly-current-connection) was still returning the old connection.

(defun sly-current-connection ()
  "Return the connection to use for Lisp interaction.
Return nil if there's no connection."
  (or sly-dispatching-connection
      sly-buffer-connection
      sly-default-connection))

sly-dispatching-connection contained the old connection
sly-buffer-connection was nil
sly-default-connection had the right one

joaotavora commented 1 year ago

Until a some kind of controlled and reproducible experiment demonstrates this problem, say at least 1 in 5 times, it's hard for me to comment on such abstract problems.

fstamour commented 1 year ago

np, I'll keep an eye out to see if it happens again. It's the first time I encounter this bug.

Also, in the end I restarted my whole emacs to "fix" it. :/

fstamour commented 1 year ago

Uh, it happened again.

I restarted the lisp process, (same as first time, by using , restart-lisp in the repl). And now I can't use sly at all.

Now it's in a weird state....

I wasn't able to quit-lisp or restart-lisp from the repl.
I closed all of sly's buffer, including the sbcl process
I was able to start another process with M-x sly
But I still can't evaluate anything in any buffers (even the freshly created repl)

Finally, I evaluated (setf sly-dispatching-connection nil) in the *scratch* buffer and now everything seems to work correctly.

I tried restart-lisp another bunch of times and I couldn't reproduce again...

zellerin commented 7 months ago

Happens to me as well. I can reproduce this way:

M-x sly
wait for repl (maybe M-x sly-mrepl call needed)
in repl, run (error "E")
sly-db buffer appears
switch to inferior lisp for XXX buffer
kill the buffer, and confirm killing running process
here you end up with sly-dispatching-connection being invalid

Edit: So it cannot be used to reproduce on another machine. I will check my configuration to see why it does lead there to the described situation.

joaotavora commented 7 months ago

switch to inferior lisp for XXX buffer kill the buffer, and confirm killing running process

Hmmm. Why do you do these things?? Is killing inferior lisp a part of your normal workflow??

Because I can "reproduce" a horrible errors in even the stablest of computer programs if one if the steps of my reproduction recipe is "take out hammer and smash computer". ;-)

zellerin commented 7 months ago

Hmmm. Why do you do these things?? Is killing inferior lisp a part of your normal workflow??

No, this was how to reproduce. In real situations, sbcl dies or ends up in low-level debugger (and yes, it happens ocasionaly), and then I kill the buffer.

joaotavora commented 7 months ago

No, this was how to reproduce.

A valid reproduction recipe for demonstrating a bug on software XYZ means showing legitimate interactions with XYZ and proving that XYZ mishaves as a consequence. Forcibly tripping SLY by deleting one of its structural underpinnings, the inferior lisp process, is not a legitimate interaction.

sbcl dies or ends up in low-level debugger

I'd say that's a bug in SBCL right? There's very little SLY can do to recover from such a drastic situation.

fstamour commented 7 months ago

There's very little SLY can do to recover from such a drastic situation.

I really don't think it's a drastic situation, it can happen really easily if you make a mistake in a loop.
I think it's fair that SLY should be able to handle some bad state, especially bad state caused by a child process dying unexpectedly.

Maybe it could be "as simple" as adding validation to sly-current-connection so that the next sly commands are not completely broken.

Just imagine someone new to common lisp, trying to learn CL, emacs and sly... They would probably not understand any of this... It would be good for the "user experience" to try to make sly more robust.

What do you think?

joaotavora commented 7 months ago

I really don't think it's a drastic situation, it can happen really easily if you make a mistake in a loop.

Really? What mistake in a LOOP would lead SBCL to drop into the debugger (EDIT: I meant sbcl's lldb, obviously)

I think it's fair that SLY should be able to handle some bad state, especially bad state caused by a child process dying unexpectedly.

That already happens.

M-x sly

; Dedicated output stream setup (port 43855)
; Redirecting all output to this MREPL
; SLY 1.0.43 (#<MREPL mrepl-1-1>)
CL-USER> 

;; drop to shell
$ killall sbcl
;; back in Emacs
Process sly-pty-1-1 killed

; Lisp connection closed unexpectedly: connection broken by remote peer
; --------------------------------------------------------

Maybe it could be "as simple" as adding validation to sly-current-connection so that the next sly commands are not completely broken.

I don't know what "next sly commands" you are talking of, but if they include talking to the Lisp process, and the vast majority of them do, it's impossible, because the Lisp process has died.

It's fair to say that your problem, whatever it is that I haven't understood yet, is completely unrelated to the original problem described by the original poster. So please start a new issue or discussion explaining exactly what you want to recover from and how.

zellerin commented 7 months ago

Let me try again to explain with the context, and supplement a bit.

Story so far:

There is a problem described by the original poster that I also observe ocassionally (cant run new repl, current connection sly-1 is closed, etc)
as fstamour posted, this can happen when sly-dispatching-connection is non-nil (and invalid), and the issue can be solved by setting it to nil again
I thought I found a replicable way how to reach the state; (it turns out that it was replicable on one machine, but is not on another, so quite possibly it did not cause the problem to you; I will edit it to clarify). I did not mention the final problem explicitly, assuming that it is clear that this causes the behaviour described at the beginning. This might have been a mistake.
you reply that you think that the killed/dying inferior lisp is nor sly's problem.

Now I am not sure whether you meant that once I kill one instance of inferior lisp, sly cant be expected to work with other inferior lisp instances, if so, fair enough. And I still think that this problem is closely related to the problem of the original poster, if not same.

Anyway, I did some more digging, and apparently sly-db-setup under some circumstances (always on one of my machines, so I might have a different issue) enters recursive-edit, so the dynamic binding of the sly-dispatching-connection from its caller is never released.

I do not see how it can happen normally, except when code run in sly-eval ends up in the debugger. Now there is a warning against it (... that shouldn't trigger errors... in comment block in sly.el), but personally I know I might have done that - so I would agree that this is error on user side, but for different reason. Even more since sly-eval is not mentioned in manuals.

Still, would you principially object to a patch to fix this - admitedly user induced - misbehaviour if I get to writing it?

joaotavora commented 7 months ago

you reply that you think that the killed/dying inferior lisp is nor sly's problem.

Exactly. Just like a house catching fire, dog eating homework, etc. Further I showed how SLY reacts reasonably when that happens. Even when it happens illegitimately (such as you killing it from the inferior buffer), SLY still reacts fine.

Still, would you principially object to a patch to fix this - admitedly user induced - misbehaviour if I get to writing it?

I'm fine if you want to show code. In fact, you should show some code because most of what you write is incomprehensible to me. If you show some code (and it is simple) maybe I will understand what you mean. And please, in a separate issue. A GitHub PR is fine.

joaotavora / sly

Losing connection to lisp sessions #549