joaotavora / sly

Sylvester the Cat's Common Lisp IDE
1.26k stars 142 forks source link

Helm + SLY hangs (workaround) #303

Closed pouar closed 4 years ago

pouar commented 4 years ago

In case you aren't aware, flex completion in Sly when using Helm was working again, at least until this commit, I don't remember whether the fix was on Sly's side or Helm's side.

In d4e52fe7fd31ed408bc60608416f785949b95133, matches sometimes show up and sometimes don't. Reverting the commit seems to fix it.

EDIT BY SLY's AUTHOR: This problem has a solution in this comment in terms of a small ad-hoc fix to Helm's code. EDIT2: The problem now has a fix in SLY proper.

pouar commented 4 years ago

Not sure if reverting it is the proper fix though, but that seems to be where the bug was introduced, at least according to git bisect

joaotavora commented 4 years ago

Yeah, I'm going to really exhaust all possible alternatives before I'm reverting that one.

Are you sure you're really cleaning up everything between each step of the bisection?

It looks really unrelated to Helm completion interfaces. Though it could very well be, that's for sure. Stranger things have happened...

Also, I'm really quite busy at the moment, so no time to analyse this, even less if Helm is a requirement.

João

On Fri, Jan 10, 2020 at 4:51 PM pouar notifications@github.com wrote:

Not sure if reverting it is the proper fix though, but that seems to be where the bug was introduced, at least according to git bisect

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/joaotavora/sly/issues/303?email_source=notifications&email_token=AAC6PQYFPA7QILMWIMIALOLQ5CRRFA5CNFSM4KFKPHDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIUQWEY#issuecomment-573115155, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC6PQ4XS4HOFNLT72V7MKTQ5CRRFANCNFSM4KFKPHDA .

-- João Távora

pouar commented 4 years ago

so far. Also tried it on master with and without the commit reverted and recompiled the elisp files and restarted emacs each time.

joaotavora commented 4 years ago

Emacs 27/26/master?

On Fri, Jan 10, 2020 at 5:08 PM pouar notifications@github.com wrote:

so far. Also tried it on master with and without the commit reverted and recompiled the elisp files and restarted emacs each time.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

-- João Távora

pouar commented 4 years ago

Emacs 27, haven't checked 26

pouar commented 4 years ago

or Emacs master

pouar commented 4 years ago

Just did some experimenting with the issue and changing

(while (sit-for 30))
(setq cancelled t)

to

(let ((inhibit-quit t))
  (while (sit-for 30))
  (setq cancelled t))

Seems to make the problem go away

joaotavora commented 4 years ago

Good concise description of a possible solution.

I don't have time to analyse right now, i.e. to begin to (re)understand why inhibit-quit in the sit-for should be needed at all, so help me Obi-Wan-@monnier, you're my only hope.

monnier commented 4 years ago

Sorry, but there's some interference in the force right now.

I think I'd need some more detailed explanation of the way the control flows when things "work" and the way it flows when things "don't work".

The only description I see of the problem is "matches sometimes show up and sometimes don't" which is fairly vague, so don't really know where to start.

BTW, while we're moving the chairs on the deck, have you tried instead of sit-for to use accept-process-output (passing a specific process object to it) and wrapping it inside a while-no-input. sit-for has various problems so I generally try to stay away from it.

    Stefan
joaotavora commented 4 years ago

On Mon, Jan 20, 2020 at 6:25 PM monnier notifications@github.com wrote:

Sorry, but there's some interference in the force right now.

I think I'd need some more detailed explanation of the way the control flows when things "work" and the way it flows when things "don't work".

The only description I see of the problem is "matches sometimes show up and sometimes don't" which is fairly vague, so don't really know where to start.

BTW, while we're moving the chairs on the deck, have you tried instead of sit-for to use accept-process-output (passing a specific process object to it) and wrapping it inside a while-no-input. sit-for has various problems so I generally try to stay away from it.

That's funny I have the exact same experience with while-no-input. I think I've been going back and forth between the two alternatives in that function. Maybe not systematically though. Anyway, I'll try again, but not anytime soon.

@pouar if you'd like to test this out, be my guest. But I think giving Stefan or Helm's author a killer, from-scratch reproduction recipe is also a good move. If you do try the change Monnier suggestgs, make sure to also give it plenty of non-Helm, company-heavy usage to increase confidence in the change.

Thanks, João

pouar commented 4 years ago

You mean like this?

(cancel-on-input
 (while-no-input
  (accept-process-output (sly-connection) 30)
  (setq cancelled t)
  (funcall check-conn)))
joaotavora commented 4 years ago

I think so, but can you give me context? A diff would be fine.

pouar commented 4 years ago
diff --git a/sly.el b/sly.el
index baf18aa4..d174139a 100644
--- a/sly.el
+++ b/sly.el
@@ -2404,9 +2404,10 @@ wants to input, and return CANCEL-ON-INPUT-RETVAL."
                      (throw catch-tag
                             (list #'error "Synchronous Lisp Evaluation aborted")))))
                 (cond (cancel-on-input
-                       (while (sit-for 30))
-                       (setq cancelled t)
-                       (funcall check-conn))
+                       (while-no-input
+                        (accept-process-output (sly-connection) 30)
+                        (setq cancelled t)
+                        (funcall check-conn)))
                       (t
                        (while t
                          (funcall check-conn)
joaotavora commented 4 years ago

Thanks . I think it has to loop forever in accept-process-output, too.

it has to be (while (not (accept-process-output ... 30 or something to that effect, I think.

joaotavora commented 4 years ago

inside the while-no-input that is.

pouar commented 4 years ago

Like this?

diff --git a/sly.el b/sly.el
index baf18aa4..beb9a6dd 100644
--- a/sly.el
+++ b/sly.el
@@ -2404,9 +2404,10 @@ wants to input, and return CANCEL-ON-INPUT-RETVAL."
                      (throw catch-tag
                             (list #'error "Synchronous Lisp Evaluation aborted")))))
                 (cond (cancel-on-input
-                       (while (sit-for 30))
-                       (setq cancelled t)
-                       (funcall check-conn))
+                       (while-no-input
+                        (while (not (accept-process-output (sly-connection) 30)
+                                    (setq cancelled t)
+                                    (funcall check-conn)))))
                       (t
                        (while t
                          (funcall check-conn)
joaotavora commented 4 years ago

No, that doesn't make sense (not will only accept one arg). Don't worry, use this

diff --git a/sly.el b/sly.el
index 0ff8c0e0..304fc7f3 100644
--- a/sly.el
+++ b/sly.el
@@ -2384,9 +2384,10 @@ wants to input, and return CANCEL-ON-INPUT-RETVAL."
                                          (sly-continuation-counter))))
          (sly--stack-eval-tags (cons catch-tag sly--stack-eval-tags))
          (cancelled nil)
+         (connection (sly-connection))
          (check-conn
           (lambda ()
-            (unless (eq (process-status (sly-connection)) 'open)
+            (unless (eq (process-status connection) 'open)
               (error "Lisp connection closed unexpectedly"))))
          (retval
           (unwind-protect
@@ -2404,7 +2405,8 @@ wants to input, and return CANCEL-ON-INPUT-RETVAL."
                      (throw catch-tag
                             (list #'error "Synchronous Lisp Evaluation aborted")))))
                 (cond (cancel-on-input
-                       (while (sit-for 30))
+                       (while-no-input
+                         (while (not (accept-process-output connection 30))))
                        (setq cancelled t)
                        (funcall check-conn))
                       (t

Now give this a good beating with Helm, company, whatever and report back here.

pouar commented 4 years ago

I kinda meant to wrap that argument in a progn

pouar commented 4 years ago

or whatever Emacs Lisp has as an equivalent

joaotavora commented 4 years ago

it has progn but it doesn't make sense anyway. You want that inner loop to continue forever and ever until there is output from Lisp process.

  1. When there is ouput from the SLY process you'll experience a non-local exit to that catch-tag.
  2. When there is a input from the user (types a char, for example), while-no-input will break.

Notice that in 1, you could theoretically do it with a while t. The reason we don't is that I think I tried that before and theory somehow doesn't match practice in Emacs, because C-reasons. So we're basically just hunting in the dark here, following Stefan's heuristic.

pouar commented 4 years ago

your patch seems to be working so far, although I'm not sure what d4e52fe7fd31ed408bc60608416f785949b95133 was supposed to fix as I didn't run into anything, so I'm not sure what I'm looking for. is a "stale continuation" something like an infinite loop?

pouar commented 4 years ago

or did it drop the process on the Emacs side or something?

pouar commented 4 years ago

The only definition of continuation I'm aware of is the one from Scheme

joaotavora commented 4 years ago

although I'm not sure what d4e52fe was supposed to fix as I didn't run into anything

You probably didn't mean this as a criticism, but if you did, I would accept it. It wasn't fixing anything, it was just a "feel good" change that made SLY behave more like jsonrpc.el in Emacs. Until your Helm troubles I didn't experience any troubles.

A stale continuation is a request in Emacs that never seems to have gotten a reply from the server side, even an error response, including timeouts. It should never happen. An RPC request either succeeds or doesn't, by definition.

As for the nomenclature, continuations are not sophisticated as in Scheme's continuations, but they do point to the same effect: stop code execution and resume it later on seemingly magically. So when you

(sly-eval-async '(common-lisp-function-returning-foo-and-bar) (lambda (results) (cl-destructuring-bind (foo bar) ...)

the lambda is called a continuation.

Emacs-lisp does have something very close to Scheme continuations I think, see generator.el. All of this is beside the issue of course. Please keep testing for a few days if possible with the latest code.

pouar commented 4 years ago

It wasn't really a criticism. I just didn't know what was going on.

joaotavora commented 4 years ago

It wasn't really a criticism.

I know. But if it had been, it would have been a good one :-)

pouar commented 4 years ago

ok, apparently it's still broken with this patch, but not as bad, as the problem now occurs less often

pouar commented 4 years ago

ok, maybe I didn't narrow it down to that last line, as it still shows up at about the same rate as using inhibit-quit

pouar commented 4 years ago

tbh, I have no idea what's going on

joaotavora commented 4 years ago

Thanks @pouar for your testing. As soon as I have some free time, I will re-focus on this. I will start by examining the commit sha in the subject of this issue very carefully, and possibly revert it.

joaotavora commented 4 years ago

@thierryvolpiatto writes in #303 that he can now reproduce this consistently. Thierry, can you cook up the smallest .emacs that demonstrates this bug, for those who don't have Helm installed (but may have a git clone of it somewhere)?

thierryvolpiatto commented 4 years ago

João Távora notifications@github.com writes:

@thierryvolpiatto writes in #303 that he can now reproduce this consistently. Thierry, can you cook up the smallest .emacs that demonstrates this bug, for those who don't have Helm installed (but may have a git clone of it somewhere)?

Sure.

1) Clone Sly.

git clone https://github.com/joaotavora/sly.git

2) Clone and install Async.

git clone https://github.com/jwiegley/emacs-async.git cd emacs-async make

3) Clone and install Helm.

git clone https://github.com/emacs-helm/helm.git cd helm make

3) Start Emacs

emacs -q

4) Configure helm

(add-to-list 'load-path "/path/to/async") (add-to-list 'load-path "/path/to/helm") (require 'helm-config) (helm-mode 1)

5) Configure Sly

(add-to-list 'load-path "/path/to/sly") (require 'sly-autoloads) (setq inferior-lisp-program "/usr/bin/sbcl") (add-hook 'sly-mode-hook (lambda () (sly-symbol-completion-mode -1)))

6) M-x sly

Enter something at repl prompt e.g. (sly and hit TAB Emacs is hanging for about 2 minutes and then fail silently to complete.

I used Emacs-27.1 on Linuxmint to reproduce this bug.

Thierry (Edited by @joaotavora)

joaotavora commented 4 years ago

Thanks very much @thierryvolpiatto for the thorough recipe.

joaotavora commented 4 years ago

NOTE: If you don't want to "make install" you will have to specify where async and helm are to load-path in 4).

Yes I think I prefer that. I'll edit your recipe, if you don't mind

joaotavora commented 4 years ago

I've started debugging this. The reproduction recipe that you gave original had sudo, which I find a bit intrusive for Emacs stuff. I removed the sudo, but it's still not perfect and needs an edit to Helm's Makefile to add the load path for "emacs-async". After that, the command line:

emacs -Q -l <sly>/sly-autoloads.el -L <helm> -L <emacs-async> -l helm-config -f helm-mode -f sly -f sly-symbol-completion-mode

seems to start up and emacs where the bug can be reproduced.

joaotavora commented 4 years ago

There's some news here: I can't reproduce this in Emacs 26.3: it seems to work fine there. Something happened starting Emacs 27.1, where the bug is reproducible, but I can apparently recover if I send SIGTERM to the process.

joaotavora commented 4 years ago

So it seem this has to be debugged at the C level, probably with the help of Eli Zaretskii, the Emacs HEAD maintainer and C specialist.

joaotavora commented 4 years ago

~I did find an bug in SLY's :exit-function but that is unrelated to the hang, just a bog-standard bug.~ Scratch that, there is no bug: I was loading Emacs 27.1 .elc's into Emacs 26.3 which brings some problems.

joaotavora commented 4 years ago

More progress. Even though this could be debugged at the C level and could be seens as a Emacs bug, I think it's also an Helm problem. Helm uses while-no-input, or rather its own specific version of it. When I remove it, things seem to work OK with SLY. It's worth noting that Helm bypasses this while-no-input when talking to Tramp apparently. Perhaps it should also do so when talking to Sly. Reading its source, it's got so many special cases that I guess another one wouldn't hurt.

joaotavora commented 4 years ago

I'm starting to lean towards the possibility that the problem is on Helm's side, since SLY works work with bare Emacs, fido-mode, company, etc. Sly used to mess with inhibit-quit and quit-flag, and now it doesn't. Maybe Helm should follow suit? Anyway see https://github.com/emacs-helm/helm-sly/issues/2 for the possible beginnings of a patch for Helm.

thierryvolpiatto commented 4 years ago

João Távora notifications@github.com writes:

I'm starting to lean towards the possibility that the problem is on Helm's side, since SLY works work with bare Emacs, fido-mode, company, etc.

No, the problem is not in helm, the problem is in sly-eval, just commenting the offending cond clause fixes the bug:

diff --git a/sly.el b/sly.el
index 020005dc..b947f1c6 100644
--- a/sly.el
+++ b/sly.el
@@ -2399,10 +2399,10 @@ wants to input, and return CANCEL-ON-INPUT-RETVAL."
                    (unless cancelled
                      (throw catch-tag
                             (list #'error "Synchronous Lisp Evaluation aborted")))))
-                (cond (cancel-on-input
-                       (while (sit-for 30))
-                       (setq cancelled t)
-                       (funcall check-conn))
+                (cond ;; (cancel-on-input
+                      ;;  (while (sit-for 30))
+                      ;;  (setq cancelled t)
+                      ;;  (funcall check-conn))
                       (t
                        (while t
                          (funcall check-conn)

So the bug comes from there, using (while (sit-for 30)) seems really hacky and is probably the cause of the problem.

Sly used to mess with inhibit-quit and quit-flag, and now it doesn't. Maybe Helm should follow suit? Anyway see emacs-helm/helm-sly#2 for the possible beginnings of a patch for Helm.

So no, disabling while-no-input in helm is not a solution.

Thanks to work on this.

-- Thierry

thierryvolpiatto commented 4 years ago

This patch fixes the problem with helm with probably not affecting others (company etc... not tested):

diff --git a/sly.el b/sly.el
index 020005dc..adbcf61a 100644
--- a/sly.el
+++ b/sly.el
@@ -2380,6 +2380,7 @@ wants to input, and return CANCEL-ON-INPUT-RETVAL."
                                          (sly-continuation-counter))))
          (sly--stack-eval-tags (cons catch-tag sly--stack-eval-tags))
          (cancelled nil)
+         (inhibit-quit t)
          (check-conn
           (lambda ()
             (unless (eq (process-status (sly-connection)) 'open)
@@ -2399,7 +2400,8 @@ wants to input, and return CANCEL-ON-INPUT-RETVAL."
                    (unless cancelled
                      (throw catch-tag
                             (list #'error "Synchronous Lisp Evaluation aborted")))))
-                (cond (cancel-on-input
+                (cond ((and cancel-on-input
+                            (not (minibufferp (window-buffer))))
                        (while (sit-for 30))
                        (setq cancelled t)
                        (funcall check-conn))
joaotavora commented 4 years ago

No, the problem is not in helm, the problem is in sly-eval, just commenting the offending cond clause fixes the bug:

And breaks the rest of SLY obviously, so it won't work. Why is sit-for 30 really hacky?

This patch fixes the problem with helm with probably not affecting others (company etc... not tested):

Why should SLY mess with inhibit-quit when it doesn't need to? Why should Helm? My point it: let's both not mess with these Emacs internals unnecessarily. SLY already does not, Helm does that sometimes, just expand the amount of times that Helm doesnt' mess with it. Simple.

joaotavora commented 4 years ago

(Sorry, I closed by accident).

Can you at explain to us what is happenning? Why does the inhibit-quit fix it?

(company etc... not tested)

Obviously that's not practical

So no, disabling while-no-input in helm is not a solution.

But you already do, I used your macro helm--maybe-while-no-input in that patch I sent you. It seems you've disabled it for TRAMP.

monnier commented 4 years ago

Fiddling with sit-for, inhibit-quite, and while-no-input is like the whack-a-mole game (as well as a fair bit of back and forth over the years as one forgets past attempts and goes through them again), so I think it's important when doing that to try and record the reasons behind those, what was tried, what were the problems, etc...

Ideally, the better way to "record" is via regression tests, but since it's often difficult to make reproducible batch tests of those problems, the second best option are comments.

monnier commented 4 years ago

Why is sit-for 30 really hacky?

If you intend to wait for user input, it's fine, but if you're waiting for a process's response accept-process-output is the less-hacky way. [ That's not to say that accept-process-output always works better for that, but if it doesn't work well, it's probably a sign that there's a bug in the C code. ]

joaotavora commented 4 years ago

I think it's important when doing that to try and record the reasons behind those, what was tried

As to what was tried: a lot of stuff.

As to why sit-for is needed there: we need something that will block until the user types or does anything. But, while blocking, we want the network process to do its job.

I decided to not use inhibit-quit and while-no-input and such functions: sit-for has existed for a long time.

If you intend to wait for user input, it's fine, but if you're waiting for a process's response accept-process-output is the less-hacky way.

Right, I do need to wait for user input, so I can CANCEL-ON-INPUT as the function promises to. I tried while-no-input+accept-process-output but it turned out more problematic for other reasons (the whack-a-mole metaphor applies). So I settled on the simplest sit-for.

thierryvolpiatto commented 4 years ago

João Távora notifications@github.com writes:

Can you at explain to us what is happenning?

With (while (sit-for n)) you block the minibuffer and when helm tries to start it fails at initial update with the computation beeing inside a while-no-input.

Why does the inhibit-quit fix it?

inhibit-quit makes with-local-quit behaving differently, prevents quitting while helm is updating its candidates. But I don't understand enough the Emacs internal to tell you the interaction with sit-for (and read-event).

Note that if you are affraid using inhibit-quit, using (while (accept-process-output nil 30)) fixes the bug as well (seems it doesn't block the minibuffer but block input), I see you are already using it in next cond clause, perhaps you can use it in this clause as well? (but perhaps I miss something).

(company etc... not tested)

Obviously that's not practical

What is not practical?

-- Thierry

thierryvolpiatto commented 4 years ago

João Távora notifications@github.com writes:

I think it's important when doing that to try and record the reasons
behind those, what was tried

As to what was tried: a lot of stuff.

As to why sit-for is needed there: we need something that will block until the user types or does anything. But, while blocking, we want the network process to do its job.

I decided to not use inhibit-quit and while-no-input and such functions: sit-for has existed for a long time.

If you intend to wait for user input, it's fine, but if you're waiting for a process's response accept-process-output is the less-hacky way.

Right, I do need to wait for user input, so I can CANCEL-ON-INPUT as the function promises to. I tried while-no-input+accept-process-output but it turned out more problematic for other reasons (the whack-a-mole metaphor applies). So I settled on the simplest sit-for.

(while (accept-process-output nil 30)) is working fine with sly-symbol-completion-mode, helm-mode and company-mode and also regular emacs vanilla completion of course.

-- Thierry

joaotavora commented 4 years ago

Thierry Volpiatto notifications@github.com writes:

(while (accept-process-output nil 30)) is working fine with sly-symbol-completion-mode, helm-mode and company-mode and also regular emacs vanilla completion of course.

It's not, Thierry, it's not working "fine" becasue it will not return immediately when the user presses a key. And that's, pardon the pun, "key" for responsive behaviour.

Didn't you find it curious that by doing that, then the documented prominently documented CANCEL-ON-INPUT in the function's docstring would be completely useless?

João