nvim-neo-tree / neo-tree.nvim

Neovim plugin to manage the file system and other tree like structures.
MIT License
3.84k stars 225 forks source link

BUG: Neo-Tree reproducibly segfaults on macOS with follow_current_file enabled #1126

Closed klmr closed 1 year ago

klmr commented 1 year ago

Did you check docs and existing issues?

Neovim Version (nvim -v)

NVIM v0.8.3–v0.9.1

Operating System / Version

macOS (multiple versions, incl. 12 & 13)

Describe the Bug

On macOS (but not on Linux!) I can reproducibly segfault NeoVim when Neo-Tree is opened and follow_current_file is enabled, by switching between different file buffers. It takes a bit of time, but after several buffer switches, NeoVim closes without a message. Via Console.app I can find that the cause of the crash is always due to an invalid pointer access (KERN_INVALID_ADDRESS). The invalid pointer address varies, but occasionally the addresses are wildly invalid, e.g. 0x0000000000000040 — my guess therefore is that this is due to a buffer overflow which overwrites the pointer memory, rather than off-by-one errors.

The error seems to happen inside readdir, called from inside libuv. Here’s a typical stack trace of the crashed thread:

0   libsystem_pthread.dylib                0x1a13b2558 pthread_mutex_lock + 12
1   libsystem_c.dylib                      0x1a129ebc8 readdir + 32
2   libuv.1.dylib                          0x1034ccc6c uv__fs_work + 2344
3   libuv.1.dylib                          0x1034c73e4 worker + 388
4   libsystem_pthread.dylib                0x1a13b7fa8 _pthread_start + 148
5   libsystem_pthread.dylib                0x1a13b2da0 thread_start + 8

The function on the top of the stack isn’t always the same — sometimes it’s _readdir_unlocked instead of pthread_mutex_lock. Occasionally, the actual crash instead happens inside the calling thread in pthread_kill, or inside luv_push_dirent, with the following stack trace:

0   libluv.1.43.0.dylib                    0x102d97634 luv_push_dirent + 48
1   libluv.1.43.0.dylib                    0x102d97488 push_fs_result + 780
2   libluv.1.43.0.dylib                    0x102d970d8 luv_fs_cb + 44
3   libuv.1.dylib                          0x102feaff0 uv__work_done + 192
4   libuv.1.dylib                          0x102fee3c4 uv__async_io + 320
5   libuv.1.dylib                          0x102ffe1e0 uv__io_poll + 1748
6   libuv.1.dylib                          0x102fee7bc uv_run + 244
7   nvim                                   0x102a4b298 loop_uv_run + 136
8   nvim                                   0x102b202fc os_breakcheck + 64
9   nvim                                   0x102b8a924 state_handle_k_event + 152
10  nvim                                   0x102afc2e0 nv_event + 60
11  nvim                                   0x102af506c normal_execute + 4616
12  nvim                                   0x102b8a864 state_enter + 356
13  nvim                                   0x1029865e0 main + 10228
14  dyld                                   0x1a105ff28 start + 2236

I’ve attached an exemplary macOS crash report, and I am happy to supply others on request.

Screenshots, Traceback

⬇️ nvim-2023-08-30-164412.ips.log

Steps to Reproduce

  1. Create a new directory with at least two files in it:
    mkdir x && cd x
    echo foo>foo; echo bar>bar
  2. Launch nvim with the minimal configuration from below, and the files, and open Neo-Tree:
    nvim -u repro.lua * +:Neotree
  3. Start switching between file buffers (to make this easier I rebound Return to switch to the next buffer, but manually using e.g. :bn/:bp etc. works as well). The behaviour is nondeterministic, so it might require several dozen buffer switches before nvim crashes. However, I have never needed more than ~50, and usually only around 10.

(The steps above aim to make the example self-contained; obviously you don’t need to create a new directory and files, it works equally well in any existing, non-empty directory.)

Instead of the self-contained repro.lua, the following minimal.lua also reproduces the issue:

vim.opt.runtimepath:append('.repro/plugins/neo-tree.nvim')
vim.opt.runtimepath:append('.repro/plugins/nui.nvim')
vim.opt.runtimepath:append('.repro/plugins/plenary.nvim')

require("neo-tree").setup({
  filesystem = {
    follow_current_file = { enabled = true },
  },
})

vim.keymap.set('n', '<cr>', '<cmd>bn<cr>')

Expected Behavior

No segfault occurs.

Your Configuration

-- DO NOT change the paths and don't remove the colorscheme
local root = vim.fn.fnamemodify("./.repro", ":p")

-- set stdpaths to use .repro
for _, name in ipairs({ "config", "data", "state", "cache" }) do
  vim.env[("XDG_%s_HOME"):format(name:upper())] = root .. "/" .. name
end

-- bootstrap lazy
local lazypath = root .. "/plugins/lazy.nvim"
if not vim.loop.fs_stat(lazypath) then
  vim.fn.system({ "git", "clone", "--filter=blob:none", "https://github.com/folke/lazy.nvim.git", lazypath, })
end
vim.opt.runtimepath:prepend(lazypath)

-- install plugins
local plugins = {
  "folke/tokyonight.nvim",
  -- add any other plugins here
}

local neotree_config = {
  "nvim-neo-tree/neo-tree.nvim",
  dependencies = { "MunifTanjim/nui.nvim", "nvim-tree/nvim-web-devicons", "nvim-lua/plenary.nvim" },
  cmd = { "Neotree" },
  keys = {
    { "<Leader>e", "<Cmd>Neotree<CR>" }, -- change or remove this line if relevant.
  },
  opts = {
    filesystem = {
      follow_current_file = { enabled = true },
    },
  },
}

table.insert(plugins, neotree_config)
require("lazy").setup(plugins, {
  root = root .. "/plugins",
})

vim.cmd.colorscheme("tokyonight")
-- add anything else here

vim.keymap.set('n', '<cr>', '<cmd>bn<cr>')
cseickel commented 1 year ago

Thanks @klmr for an excellent and very complete bug report. Unfortunately I can't do much with this because I don't have access to a Mac.

Can you tell me if the directory you are in has a particularly large amount of files/folders or if it is within a very large git repo? Is there anything unusual about the hardware (very old or very new?)

I think that in the case of a segfault, the fault is ultimately in Neovim itself. Neo-tree may be doing something to surface that problem, but I don't think the lua code should be able to cause a segfault. Have you checked the issues in the neovim repo?

klmr commented 1 year ago

I’ve been able to test and reproduce this on two different macOS models (both running an ARM chip, M2 — apologies, I should have mentioned this!). There’s nothing special about the folder structure. Any folder will do, including something directly in the home directory; no deep nesting, and no large subdirectory structure.

I don't think the lua code should be able to cause a segfault

Yeah, I actually agree with this. Unfortunately I haven’t been able to find any issue that looks related.

… I’m actually puzzled by this lack of bug reports, since the behaviour is fairly disruptive and has been happening for months (that’s how long it took me to be able to narrow the issue down and make it reproducible). I’m sure other people must have stumbled across it; the only reason it took me so long was that I am mostly using Linux.

Should I cross-post the issue to the NeoVim repo?

cseickel commented 1 year ago

Should I cross-post the issue to the NeoVim repo?

I think so, after checking existing issues of course.

I suppose I could definitely see how only neo-tree could find a problem with readdir because we will spawn multiple asynchronous reads. Certainly only another tree plugin would behave in this way. It would be interesting to see if Nvim-tree causes segfaults as well.

miversen33 commented 1 year ago

@klmr I would be curious, are you running one of the M1 ARM chips? Edit: I should really learn how to read. I am firing up a pi to see if I can recreate this on linux on ARM

This smells like a libuv issue as opposed to neovim directly (though of course, Neovim provides libuv and it is used heavily in the filesystem source within Neo-tree). I ask about the architecture because I have seen a handful of other weird issues in Neovim land related to running on a non-x86 architecture. I haven't tried yet, but I wonder if this can be recreated on something like a raspberry pi (also running ARM).

miversen33 commented 1 year ago

Tested this on a raspberry pi 4 running Manjaro and I was unable to replicate. So there must be something with the ARM architecture and how libuv is relaying instructions to the processor through Apples Kernel (all well beyond me). In any case, I believe this is below Neo-tree specifically :(

klmr commented 1 year ago

In the meantime I have tried and failed to reproduce the issue with the official NeoVim Universal build. Turns out, the issue only seems to exist with the build from MacPorts, so I will re-report this bug to MacPorts. They have their own build infrastructure, and they must have done something slightly differently.

I agree with the assessment that this is probably ultimately a libuv issue. In fact, there is a (fixed, luvit/luv#640) issue which sounds suspiciously similar: neovim/neovim#22694.