Closed wzyboy closed 1 year ago
Hmm, this is unexpected:
press ^C to stop the process Child processes do not exit
By default ^C
in a shell signals the entire process group rather than the parent itself. TFLint shouldn't necessarily have to clean up child processes, they can clean up themselves, at least for handling signals from a shell.
This seems to be explained by the plugin system explicitly ignoring SIGINTs:
If you enable trace logs I suspect you'd see these "plugin received interrupt signal, ignoring" logs.
This is mildly related:
https://github.com/hashicorp/go-plugin/issues/203
Systems designed for running applications rather than interactive human usage (e.g. init systems, containers) will generally signal the parent and not consider process groups. Container environments will generally SIGTERM pid 1 when stopping a container, e.g., calling docker stop
or deleting a pod in Kubernetes. So in that sense a shell isn't an ideal reproduction. Sending either INT
or TERM
directly to the tflint
process using the kill
command is a good test.
https://pkg.go.dev/os/signal#hdr-Default_behavior_of_signals_in_Go_programs
It seems like we'll need to handle this signal from the main tflint
process and make sure it propagates to the plugin system, hopefully by canceling a context.
The key parts of how Terraform propagates shutdown signals:
https://github.com/hashicorp/terraform/blob/8b210951d963a81d3e947c77858d76229e2c28fe/commands.go#L431-L447 https://github.com/hashicorp/terraform/blob/8b210951d963a81d3e947c77858d76229e2c28fe/internal/command/meta.go#L400-L418
Thanks for the triaging.
I'm not sure what signal Neovim sends to tflint --langserver
but I bet it's either SIGINT or SIGKILL or SIGTERM.
I did the experiment by sending SIGINT/SIGKILL/SIGTERM to the parent process tfling --langserver
and the behaviour is almost the same:
tflint --act-as-bundled-plugin
and tflint-ruleset-aws
) are still alive/tmp/plugin*
socket files are not cleanedThe only difference is that with SIGKILL and SIGTERM, in debug log there is one more line Killed
or Terminated
, respectively.
SIGKILL is not handleable/recoverable so generally a SIGTERM will be sent first. Kubernetes will first send a TERM and then after a configurable grace period will guarantee termination with a KILL:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination
I'm not sure what signal Neovim sends
Assuming this would be in response to quitting, e.g., :q
. GPT has this to say about how Neovim interacts with LSP plugins:
For a language server-based plugin in Neovim, the termination process involves the following steps:
BufUnload
event to the buffer associated with the plugin.BufUnload
event.shutdown
request to the language server.shutdown
request, closes any open connections, stops any ongoing processes, and performs any necessary cleanup.This workflow ensures that the language server is properly informed about the termination of Neovim and allows it to gracefully shut down, releasing resources and terminating any associated processes.
In other words, sounds like there's no process signaling at all involved here and this all happens via LSP commands:
And in theory, this is how plugins are supposed to be cleaned up:
Do you see the Shutting down...
log from the main process? Kill
should be called after that (via defer
), which should then attempt a graceful exit followed by a SIGKILL
to the process if that errors or times out:
Do you see the
Shutting down...
log from the main process?
No. I did four experiments:
^C
kill -SIGINT
the main process from another terminalkill -SIGTERM
the main process from another terminalkill -SIGKILL
the main process from another terminalThe debug output of experiments 1 and 2 are identical, just like I posted in the original post.
The debug output of experiments 3 and 4 has one more line Terminated
and Killed
. There are no other lines.
Ok, in that case I guess it would make sense that the default Go behavior is kicking in and the parent process terminates immediately. This does seems like it should work correctly under normal circumstances when the editor sends a shutdown
RPC to the language server. I'm not confident that there's a bug here that's specific to LSP mode. Language servers are just long-running and far more likely to trigger this condition.
Signal handling would technically apply to normal mode as well. If you had a huge config or a plugin that was gets stuck, you seemingly can't interrupt that run without orphaning the plugin process, at least until it finishes its work.
Thank you for the clarification. I think that this should trap signals (SIGINT, SIGTERM) in the same way as Terraform and disconnect the connection gracefully. https://github.com/hashicorp/terraform/blob/8b210951d963a81d3e947c77858d76229e2c28fe/internal/command/meta.go#L442-L446 https://github.com/terraform-linters/tflint/blob/e3e94369cfc379e11d4d2449c05f6e000cae91ae/cmd/langserver.go#L38-L44
Summary
When used as a language server (e.g. with Neovim as a client), tflint does not clean up properly when exiting. This can be reproducied by running
tflint --langserver
directly in a terminal and observe its behaviour.tflint --langserver
in a workspace^C
to stop the processps x | grep tflint
and there are X copies oftflint --act-as-bundled-plugin
andtflint-ruleset-aws
processesls /tmp/plugin*
and there are X copies of socket filesI found this after using Neovim to edit many Terraform files and the RAM was almost full. I checked the process list and found this out. Further experiment revealed that it's not related to Neovim but
tflint --langserver
itself.Command
tflint --langserver
Terraform Configuration
TFLint Configuration
Output
TFLint Version
0.47.0
Terraform Version
No response
Operating System