lexical-lsp / lexical

Lexical is a next-generation elixir language server
793 stars 78 forks source link

Crash with OTP 27.0 #788

Closed dvic closed 5 days ago

dvic commented 2 weeks ago

Not sure if support for Elixir 1.17 is not finished yet (though #760 is merged), but I'm encountering this on one of our projects:

20:07:07.305 [info] Lexical Initialized
20:07:07.307 [info] opened file:///Users/dvic/Development/pluto/pluto/lib/pluto/store.ex
20:07:19.931 [info] Compiled pluto in 12.6 seconds
20:07:36.014 [warn] The node has died. Restarting after deleting the build directory
20:07:36.016 [error] GenServer :"pluto::node_process" terminating
** (FunctionClauseError) no function clause matching in :gen.reply/2
    (stdlib 6.0) gen.erl:547: :gen.reply(nil, :ok)
    (lx_remote_control 0.5.0) lib/lexical/remote_control/project_node.ex:72: LXical.RemoteControl.ProjectNode.State.on_nodedown/2
    (lx_remote_control 0.5.0) lib/lexical/remote_control/project_node.ex:183: LXical.RemoteControl.ProjectNode.handle_info/2
    (stdlib 6.0) gen_server.erl:2173: :gen_server.try_handle_info/3
    (stdlib 6.0) gen_server.erl:2261: :gen_server.handle_msg/6
    (stdlib 6.0) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Last message: {:nodedown, :"project-pluto-25904@127.0.0.1", [node_type: :visible]}
State: %LXical.RemoteControl.ProjectNode.State{project: %LXical.Project{root_uri: "file:///Users/dvic/Development/pluto/pluto", mix_exs_uri: "file:///Users/dvic/Development/pluto/pluto/mix.exs", mix_project?: true, mix_env: nil, mix_target: nil, env_variables: %{}, project_module: nil, entropy: 25904}, port: #Port<0.8>, cookie: :lexical, stopped_by: nil, stop_timeout: nil, started_by: {#PID<0.1544.0>, [:alias | #Reference<0.0.197635.4109144966.1078788099.69902>]}, status: :started}
20:07:36.021 [error] Process :"pluto::node_process" (#PID<0.1551.0>) terminating
** (FunctionClauseError) no function clause matching in :gen.reply/2
    (stdlib 6.0) gen.erl:547: :gen.reply(nil, :ok)
    (lx_remote_control 0.5.0) lib/lexical/remote_control/project_node.ex:72: LXical.RemoteControl.ProjectNode.State.on_nodedown/2
    (lx_remote_control 0.5.0) lib/lexical/remote_control/project_node.ex:183: LXical.RemoteControl.ProjectNode.handle_info/2
    (stdlib 6.0) gen_server.erl:2173: :gen_server.try_handle_info/3
    (stdlib 6.0) gen_server.erl:2261: :gen_server.handle_msg/6
    (stdlib 6.0) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Initial Call: LXical.RemoteControl.ProjectNode.init/1
Ancestors: [LXical.RemoteControl.ProjectNodeSupervisor, :"pluto::supervisor", LXical.Server.ProjectSupervisor, LXical.Server.Supervisor, #PID<0.1533.0>]
Message Queue Length: 1
Messages: [{:EXIT, #Port<0.8>, :normal}]
Links: [#PID<0.1543.0>]
Dictionary: []
Trapping Exits: true
Status: :running
Heap Size: 10958
Stack Size: 29
Reductions: 78453
20:07:36.021 [error] Child :undefined of Supervisor LXical.RemoteControl.ProjectNodeSupervisor terminated
** (exit) an exception was raised:
    ** (FunctionClauseError) no function clause matching in :gen.reply/2
        (stdlib 6.0) gen.erl:547: :gen.reply(nil, :ok)
        (lx_remote_control 0.5.0) lib/lexical/remote_control/project_node.ex:72: LXical.RemoteControl.ProjectNode.State.on_nodedown/2
        (lx_remote_control 0.5.0) lib/lexical/remote_control/project_node.ex:183: LXical.RemoteControl.ProjectNode.handle_info/2
        (stdlib 6.0) gen_server.erl:2173: :gen_server.try_handle_info/3
        (stdlib 6.0) gen_server.erl:2261: :gen_server.handle_msg/6
        (stdlib 6.0) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Pid: #PID<0.1551.0>
Start Call: LXical.RemoteControl.ProjectNode.start_link(%LXical.Project{root_uri: "file:///Users/dvic/Development/pluto/pluto", mix_exs_uri: "file:///Users/dvic/Development/pluto/pluto/mix.exs", mix_project?: true, mix_env: nil, mix_target: nil, env_variables: %{}, project_module: nil, entropy: 25904})
Restart: :transient
Shutdown: 5000
Type: :worker
20:07:36.380 [error] GenServer :"pluto::node" terminating
** (stop) {:error, {:already_started, #PID<0.1559.0>}}
Last message: {:nodedown, :"project-pluto-25904@127.0.0.1"}
State: %LXical.Server.Project.Node.State{project: %LXical.Project{root_uri: "file:///Users/dvic/Development/pluto/pluto", mix_exs_uri: "file:///Users/dvic/Development/pluto/pluto/mix.exs", mix_project?: true, mix_env: nil, mix_target: nil, env_variables: %{}, project_module: nil, entropy: 25904}, node: :"project-pluto-25904@127.0.0.1", supervisor_pid: #PID<0.1551.0>}
20:07:36.381 [error] Process :"pluto::node" (#PID<0.1544.0>) terminating
** (exit) {:error, {:already_started, #PID<0.1559.0>}}
    (stdlib 6.0) gen_server.erl:2304: :gen_server.handle_common_reply/8
    (stdlib 6.0) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Initial Call: LXical.Server.Project.Node.init/1
Ancestors: [:"pluto::supervisor", LXical.Server.ProjectSupervisor, LXical.Server.Supervisor, #PID<0.1533.0>]
Message Queue Length: 0
Messages: []
Links: [#PID<0.1541.0>]
Dictionary: []
Trapping Exits: false
Status: :running
Heap Size: 6772
Stack Size: 29
Reductions: 849071
20:07:36.381 [error] Child {LXical.Server.Project.Node, "pluto"} of Supervisor :"pluto::supervisor" terminated
** (exit) {:error, {:already_started, #PID<0.1559.0>}}
Pid: #PID<0.1544.0>
Start Call: LXical.Server.Project.Node.start_link(%LXical.Project{root_uri: "file:///Users/dvic/Development/pluto/pluto", mix_exs_uri: "file:///Users/dvic/Development/pluto/pluto/mix.exs", mix_project?: true, mix_env: nil, mix_target: nil, env_variables: %{}, project_module: nil, entropy: 25904})
Restart: :permanent
Shutdown: 5000
Type: :worker
20:07:36.388 [error] Process #PID<0.1560.0> terminating
** (exit) {:error, {:already_started, #PID<0.1559.0>}}
    (stdlib 6.0) gen_server.erl:2038: :gen_server.init_it/6
    (stdlib 6.0) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Initial Call: LXical.Server.Project.Node.init/1
Ancestors: [:"pluto::supervisor", LXical.Server.ProjectSupervisor, LXical.Server.Supervisor, #PID<0.1533.0>]
Message Queue Length: 0
Messages: []
Links: [#PID<0.1541.0>]
Dictionary: []
Trapping Exits: false
Status: :running
Heap Size: 6772
Stack Size: 29
Reductions: 225484
20:07:36.389 [error] Child {LXical.Server.Project.Node, "pluto"} of Supervisor :"pluto::supervisor" failed to start
** (exit) {:error, {:already_started, #PID<0.1559.0>}}
Pid: #PID<0.1544.0>
Start Call: LXical.Server.Project.Node.start_link(%LXical.Project{root_uri: "file:///Users/dvic/Development/pluto/pluto", mix_exs_uri: "file:///Users/dvic/Development/pluto/pluto/mix.exs", mix_project?: true, mix_env: nil, mix_target: nil, env_variables: %{}, project_module: nil, entropy: 25904})
Restart: :permanent
Shutdown: 5000
Type: :worker
20:07:36.395 [error] Process #PID<0.1561.0> terminating
** (exit) {:error, {:already_started, #PID<0.1559.0>}}
    (stdlib 6.0) gen_server.erl:2038: :gen_server.init_it/6
    (stdlib 6.0) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Initial Call: LXical.Server.Project.Node.init/1
Ancestors: [:"pluto::supervisor", LXical.Server.ProjectSupervisor, LXical.Server.Supervisor, #PID<0.1533.0>]
Message Queue Length: 0
Messages: []
Links: [#PID<0.1541.0>]
Dictionary: []
Trapping Exits: false
Status: :running
Heap Size: 6772
Stack Size: 29
Reductions: 225484
20:07:36.395 [error] Child {LXical.Server.Project.Node, "pluto"} of Supervisor :"pluto::supervisor" failed to start
** (exit) {:error, {:already_started, #PID<0.1559.0>}}
Start Call: LXical.Server.Project.Node.start_link(%LXical.Project{root_uri: "file:///Users/dvic/Development/pluto/pluto", mix_exs_uri: "file:///Users/dvic/Development/pluto/pluto/mix.exs", mix_project?: true, mix_env: nil, mix_target: nil, env_variables: %{}, project_module: nil, entropy: 25904})
Restart: :permanent
Shutdown: 5000
Type: :worker
20:07:36.401 [error] Process #PID<0.1562.0> terminating
** (exit) {:error, {:already_started, #PID<0.1559.0>}}
    (stdlib 6.0) gen_server.erl:2038: :gen_server.init_it/6
    (stdlib 6.0) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Initial Call: LXical.Server.Project.Node.init/1
Ancestors: [:"pluto::supervisor", LXical.Server.ProjectSupervisor, LXical.Server.Supervisor, #PID<0.1533.0>]
...
dvic commented 2 weeks ago

If I fix the nil stopped_by, I just get: 20:17:10.323 [warn] The node has died. Restarting after deleting the build directory, with no further logs.

scottming commented 2 weeks ago

I noticed that this issue only occurs in OTP 27. I conducted some simple investigations but have not yet identified the specific cause.

I tried starting Node using the command line, but it does not halt. I'm not sure if OTP 27 has made changes to the implementation of port.

dvic commented 2 weeks ago

Yes, I can confirm that switching to OTP 26 fixes the issue.

Moosieus commented 2 weeks ago

Do you see any errors in .lexical/project.log when the crash occurs?

Happy path here seems to be:

However if the project node dies for any other reason, we end up at the error you shared above.

Moosieus commented 2 weeks ago

What I've done to repro:

dvic commented 2 weeks ago

Do you see any errors in .lexical/project.log when the crash occurs?

Happy path here seems to be:

  • Receive :call of {:stop, stop_timeout}
  • Stop the project node via project_rpc/3
  • Save from to state.stopped_by
  • Reply :ok once the node's down

However if the project node dies for any other reason, we end up at the error you shared above.

I only have a lexical.log, I don't see any other errors. This is the full output:

❯ tail -f .lexical/lexical.log
Shutdown: 5000
Type: :worker
10:21:52.348 [info] Child LXical.Server.Project.SearchListener of Supervisor :"pluto::supervisor" started
Pid: #PID<0.1558.0>
Start Call: LXical.Server.Project.SearchListener.start_link(%LXical.Project{root_uri: "file:///Users/dvic/Development/pluto/pluto", mix_exs_uri: "file:///Users/dvic/Development/pluto/pluto/mix.exs", mix_project?: true, mix_env: nil, mix_target: nil, env_variables: %{}, project_module: nil, entropy: 36412})
Restart: :permanent
Shutdown: 5000
Type: :worker
10:21:52.349 [info] Lexical Initialized
10:21:52.350 [info] opened file:///Users/dvic/Development/pluto/pluto/lib/pluto/store.ex
10:22:28.867 [info] Child IEx.Config of Supervisor IEx.Supervisor started
Pid: #PID<0.1571.0>
Start Call: IEx.Config.start_link([])
Restart: :permanent
Shutdown: 5000
Type: :worker
10:22:28.867 [info] Child IEx.Broker of Supervisor IEx.Supervisor started
Pid: #PID<0.1572.0>
Start Call: IEx.Broker.start_link([])
Restart: :permanent
Shutdown: 5000
Type: :worker
10:22:28.867 [info] Child IEx.Pry of Supervisor IEx.Supervisor started
Pid: #PID<0.1573.0>
Start Call: IEx.Pry.start_link([])
Restart: :permanent
Shutdown: 5000
Type: :worker
10:22:28.867 [info] Application iex started at :"manager-pluto-36412@127.0.0.1"
10:22:36.637 [info] Child of Supervisor :inet_gethost_native_sup started
Pid: #PID<0.1578.0>
Start Call: :inet_gethost_native.init([])
10:22:36.637 [info] Child :inet_gethost_native_sup of Supervisor :kernel_safe_sup started
Pid: #PID<0.1577.0>
Start Call: :inet_gethost_native.start_link()
Restart: :temporary
Shutdown: 1000
10:22:58.609 [info] Compiled pluto in 66.3 seconds
10:23:09.567 [warn] 'global' at node :"manager-pluto-36412@127.0.0.1" requested disconnect from node :"project-pluto-36412@127.0.0.1" in order to prevent overlapping partitions
10:23:09.567 [warn] The node has died. Restarting after deleting the build directory

I was also logged in when the LSP started, and it seems it started two managers?

❯ ./bin/debug_shell.sh pluto
Erlang/OTP 27 [erts-15.0] [source] [64-bit] [smp:10:10] [ds:10:10:10] [async-threads:1] [jit]

Interactive Elixir (1.17.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(manager-pluto-36412@127.0.0.1)1> :net_adm.names()
{:ok,
 [
   {~c"manager-pluto-36412", 52097},
   {~c"project-pluto-36412", 52099},
   {~c"shell", 52119}
 ]}
iex(manager-pluto-36412@127.0.0.1)2> :net_adm.names()
{:ok,
 [
   {~c"manager-pluto-36412", 52097},
   {~c"project-pluto-36412", 52099},
   {~c"shell", 52119}
 ]}
iex(manager-pluto-36412@127.0.0.1)3> :net_adm.names()
{:ok,
 [
   {~c"manager-pluto-36412", 52097},
   {~c"shell", 52119},
   {~c"project-pluto-36412", 52136},
   {~c"manager-lexical-41512", 52148},
   {~c"project-lexical-41512", 52163}
 ]}
iex(manager-pluto-36412@127.0.0.1)4>
scohen commented 1 week ago

@dvic do you have two editors running? If not having two managers seems like it's a problem.

dvic commented 1 week ago

@dvic do you have two editors running? If not having two managers seems like it's a problem.

No I believe I was not running two editors but not 100% sure, maybe I started hx in lexical. Also, now that I look at the output I previously posted the two managers are for different projects: manager-pluto-36412 and manager-lexical-41512.

scohen commented 1 week ago

@dvic I take it this project isn't open source? I haven't been able to reproduce it locally.

scottming commented 1 week ago

@dvic I take it this project isn't open source? I haven't been able to reproduce it locally.

@scohen

Are you unable to reproduce the OTP 27 node down issue locally? It seems we just need to use this version:

asdf local elixir 1.17.1-otp-27 && asdf local erlang 27.0

Then after running mix package and opening the lexical project itself, you can see these warnings and constant restarts.

10:13:18.740 [warn] The node has died. Restarting after deleting the build directory
10:14:16.274 [warn] The node has died. Restarting after deleting the build directory
dvic commented 1 week ago

@dvic I take it this project isn't open source? I haven't been able to reproduce it locally.

Unfortunately, yes. Can you reproduce it in lexical itself with the steps outlined above? Otherwise I can try to cook up a reproducible repo.

scohen commented 1 week ago

We're investigating, @zachallaun can reproduce some failures, hopefully they're the same as what you're seeing.

altdsoy commented 6 days ago

Hi there... Sorry to hijack this thread.. I see that OP has a crash when running on OTP 27.0 I'm trying Lexical but I simpy got the message that it's not even compatible with Elixir 1.17.2 and OTP 27.0.1:

[info] FATAL: Lexical is not compatible with Elixir 1.17.2

Lexical is compatible with the following versions of Elixir:

  >= 1.13.0
  >= 1.14.0
  >= 1.15.3
  >= 1.16.0

FATAL: Lexical is not compatible with Erlang/OTP 27.0.1

Lexical is compatible with the following versions of Erlang/OTP:

  >= 24.3.4
  >= 25.0.0
  >= 26.0.2

Which is also confirmed from the docs here: https://github.com/lexical-lsp/lexical/blob/main/pages/installation.md

So what's the current situation if I'm running Elixir 1.17.2 with OTP 27?

Thanks!

scohen commented 6 days ago

@altdsoy please don't hijack threads, open an issue instead. Your problem is because you're using a released version of lexical and haven't built the latest main branch. Once this issue is fixed, we'll release.

zachallaun commented 5 days ago

I believe this is fixed by #796. Would someone who's experienced this crash be able to pull the branch za-ets-compressed, re-package Lexical, and confirm that it's fixed?

dvic commented 5 days ago

I believe this is fixed by #796. Would someone who's experienced this crash be able to pull the branch za-ets-compressed, re-package Lexical, and confirm that it's fixed?

I can confirm that that branch fixes the issue, thanks @zachallaun! Too bad about the increased memory usage, but that's all we can do for now I guess. Is this a known issue with OTP 27 or do we have to report it?

zachallaun commented 5 days ago

Is this a known issue with OTP 27 or do we have to report it?

I don't believe it's a known issue. I'm working on a report now.