leanprover / lean4

Lean 4 programming language and theorem prover
https://lean-lang.org
Apache License 2.0
4.69k stars 422 forks source link

Watchdog sometimes gets stuck and stops processing requests for a file #1219

Open ammkrn opened 2 years ago

ammkrn commented 2 years ago

Occasionally, the extension will stop providing feedback for a given file, with the problem persisting after restarting the extension/vscode, and after removing build artifacts. The behavior goes away if I change the file name, and returns if I revert the file to the original name. I had a similar experience today, where it initially provides incorrect error/line numbers and then stops providing feedback.

I'm not sure how to trigger this or debug it through vscode so I can't provide a MWE, but this sounds like a cache issue, I can take a look if you point me in the right direction.

MacOS 11.6.6 vscode 1.67.2 extension v0.0.77 nightly-2022-05-31

gebner commented 2 years ago

I'm also seeing this issue in the neovim plugin, so maybe this is a bug in the Lean 4 LSP implementation itself. Does the Lean 4: Restart Server command help?

ammkrn commented 2 years ago

It does not. The only thing that seems to work is renaming the file.

gebner commented 2 years ago

This looks like a bug in the watchdog. Transferring to the Lean 4 repo because this bug is not editor-specific.

gebner commented 2 years ago

I'm going out on a limb here, but this could be due to this line: https://github.com/leanprover/lean4/blob/5896e6f1d6fc69441b476f3457172713e708f835/src/Lean/Server/Watchdog.lean#L308

If writing the exit message to the file worker blocks, then the file worker will never be erased. In particular it won't be restarted with didOpen, so the restart server command has no effect.

Kha commented 2 years ago

I'm not sure I've encountered this myself so far; I get plenty of hangs, but they usually seem to be Emacs' fault.

We took care to make sure that the main loop of the file worker never blocks except on reading requests. Would be interesting to get stack traces from such a hung worker.

Kha commented 2 years ago

But just to be clear, should "Restart Server" not replace the entire watchdog process?

gebner commented 2 years ago

But just to be clear, should "Restart Server" not replace the entire watchdog process?

Oh, I confused that with "Refresh File Dependencies". Maybe something different is going on as well then.

We took care to make sure that the main loop of the file worker never blocks except on reading requests. Would be interesting to get stack traces from such a hung worker.

:+1: about the stack traces. We do all kinds of (potentially) blocking operations in the main loop though: printing to stdout/stderr, handle LSP requests (most handlers spawn a task for some of the work, but this is not enforced), there's also lots of potentially contended accesses to IO.Refs which could cause busy waiting.

tydeu commented 2 years ago

This issue has been plaguing me for a while in VSCode and is very unfortunate, It kills all the nice interactive features that are major positives of Lean.

However, I do believe I have figured out a surefire way to exit this state once in it for a file in VSCode. Close all open Lean tabs and then execute Developer: Reload Window. Once VSCode has refreshed you should be able to reopen the file and see it working as normal. Note that all Lean tabs must be closed before reloading VSCode. This is why, in my experience, simply restarting VSCode does not fix the issue -- it keeps the tabs open upon restart and thereby preserves the breakage.

lovettchris commented 2 years ago

I'm seeing the same thing reported here https://github.com/leanprover/lean4/issues/1564 and I suspect the repro always involves editing through some invalid states. If I load good code and simply browse around, then everything is fine. So the scenario is I'm writing code, getting errors, and then I fix the code, lake build succeeds from the command line, but goto definition and hover tips are gone. And I can see goto definition working in other files, but not on the files I've been editing. So yes "watchdog stuck" is a great description. Refresh file dependencies did not fix it for me also. I didn't try restart server, I will try that next time, but close and restart vscode does fix it (except perhaps in the case Mac mentions above where VS code remembers which files you were editing and quickly goes right back into the stuck state).

lovettchris commented 2 years ago

We probably need to write some async state machine proofs in the LSP, I would love to do that it would be very fun and would be related to a previous project I worked on called Coyote. The best way to handle async state machines is with an inbox actor model to ensure serialization of messages across the various async actors involved in the LSP implementation, that should include actors for IO, according to Gabriel's comments above.

fpvandoorn commented 2 years ago

I can consistently reproduce this on my Windows machine, but I didn't manage to reproduce it yesterday on my Linux machine (both in VSCode).

Here is some behavior that might suggest possible problems:

Hopefully this helps. I can share a screen recording of me performing these steps if that is useful.