Error or segfault when using location list with diagnostics

dlukes commented 4 years ago

Steps to reproduce:

Open the location list with diagnostics using :OpenDiagnostic.
Place cursor on one of the lines in the location list.
Press enter to navigate to that location in your source code buffer.

What happens instead is either:

the cursor jumps to a different location in the source buffer
or this error occurs: E92: Buffer 168430090 not found
or a segfault happens

Details: The buffer number in E92 is not always the same, it tends to grow as you make more attempts to navigate from the location list, or maybe simply as you go down the list (not entirely sure which at this point). At any rate, once it gets big enough, a segfault is triggered. So the number is probably used as an invalid pointer at some point?

I'm seeing this in Rust files with rust-analyzer as the language server. AFAICS, the location list as shown in the appropriate Neovim buffer looks alright -- the filenames, line and column numbers match the highlights in the source code.

Let me know if I can provide additional details!

dlukes commented 4 years ago

When I inspect the location list with :echo getloclist(0), all the bufnrs look valid (as in, none of them are one of those suspect large numbers).

dlukes commented 4 years ago

This is starting to look like a concurrency bug (details below), I'm afraid insight into Neovim's C core will be necessary. Pinging @tjdevries, sorry, you happen to be the one core dev I've interacted with who has also contributed to this repo :) If this type of problem is not up your alley, would you please consider pinging someone else from the core team?

I thought I'd narrowed down the problem to somewhere in the qf_jump function in quickfix.c by adding some crude print debugging (sorry, I've no idea how to do it properly), but the behavior is not always consistent and I even once got the correct result (i.e., a jump to the correct location in the source file). So this seems to indicate that the problem is not something that happens inside the qf_jump function, but something that happens roughly at the same time. Also, I don't see anything suspect in the region of code I linked to (my C skills are admittedly close to non-existent, though).

Maybe the rust-analyzer language server happens to send a message to update the location list roughly while qf_jump is running, which makes the pointers qf_jump is working with invalid?

For illustration, here is the crude print debugging I added:

neovim_debug_2

And here is the "debug log" I got from one run:

neovim_debug_1

qf_fnum changes once between points c.2 and c.3, where nothing that should be able to change it happens in the code AFAICS, and a second time between c.3 and d, where there's literally nothing else in the code. So it looks like that memory is being modified by another thread at the same time.

As I said above, this behavior is not consistent, very rarely, you even get the correct result, i.e. qf_fnum stays 1 throughout and the jump leads to the correct location. Somewhat more commonly, qf_fnum stays 1, but the jump ends up in the wrong location.

tjdevries commented 4 years ago

Hey, thanks for the ping! This is very interesting.

I think this issue should be sent to Neovim's core repo, since plugins should not be able to make nvim segfault ;)

If you can paste these exact items back into a neovim core issue, that'd be great. I'm not 100% sure I'll be able to solve it, but that's the right place for it to go.

Thanks for all the investigation!

dlukes commented 3 years ago

Closing, since a) nvim-lua/diagnostic-nvim#57 and neovim/neovim#12904 seem to have addressed or at least mitigated it, and b) diagnostic-nvim is now deprecated anyway :)

nvim-lua / diagnostic-nvim

Error or segfault when using location list with diagnostics #55