ibhagwan / fzf-lua

Improved fzf.vim written in lua
MIT License
2.34k stars 150 forks source link

Some commands sometimes show empty panes in fish shell #633

Closed iovis closed 1 year ago

iovis commented 1 year ago

Info

Description

I've been trying to diagnose this one by myself, but I can't figure it out anymore. Maybe you can help me debug this better.

So, I recently switched to Fish as my shell (from ZSH). Everything was working fine, till I started to sometimes see :FzfLua buffers behave strangely:

Screenshot 2023-02-08 at 20 19 32

No error, just that lonely ^[i right there (it's always there when this happens). The thing is, it's not consistent (although once it starts, it'll stay there). I could reproduce with the minimal setup.

It also seems like it may somehow be related to commands related to vim. These are some commands that stop working when this happens:

But other commands do work normally, even when this starts happening, like:

If I run neovim from ZSH, this doesn't happen, it works as expected.

My guess is that something is breaking somewhere, but not showing the error for some reason. I checked messages and stuff, but couldn't find anything.

Do you know what could be going on or how I could debug this further?

ibhagwan commented 1 year ago

This is a very strange issue indeed due to this not being consistent, the same command and code is run by fzf-lua so the inconsistency is making it hard to identify what’s the source of the issue.

The common denominator between the commands that fail seems to be a lua table input into fzf_exec, I’ll look at the code tomorrow and try to figure out what can fail overtime and why specifically fish.

Do you know what could be going on or how I could debug this further?

In the meantime if you can fins a way to consistently reproduce this with steps then I’d be able to solve it easily from my end.

iovis commented 1 year ago

I unfortunately haven't been able to find a consistent way of reproducing it. Although something weird that I saw last night when running the buffers command (which is usually the one I use the most from the failing ones) was seeing the dreaded ^[i for a few milliseconds and then the popups being render properly in place. I ran the command multiple times after that and I didn't see it again for that session. Not sure if it helps, but I figured I'd mention it (could there be some sort of race condition?).

If you want me to add some kind of logging or print statements somewhere you're suspicious of, just let me know. I'm guessing this may be a weird interaction of my system (and likely a mistake on my part), so it's probably going to be incredibly hard to reproduce outside of it.

I'm new to fish, so I probably messed something up. One thing I'll mention given the weird interaction I had with that sort of race condition: I managed to make my fish config significantly faster than my ZSH one (startup time being ~10ms). My guess is that this probably doesn't really matter, but I guess more info doesn't hurt

ibhagwan commented 1 year ago

I unfortunately haven't been able to find a consistent way of reproducing it. Although something weird that I saw last night when running the buffers command (which is usually the one I use the most from the failing ones) was seeing the dreaded ^[i for a few milliseconds and then the popups being render properly in place. I ran the command multiple times after that and I didn't see it again for that session. Not sure if it helps, but I figured I'd mention it (could there be some sort of race condition?).

If you want me to add some kind of logging or print statements somewhere you're suspicious of, just let me know. I'm guessing this may be a weird interaction of my system (and likely a mistake on my part), so it's probably going to be incredibly hard to reproduce outside of it.

I'm new to fish, so I probably messed something up. One thing I'll mention given the weird interaction I had with that sort of race condition: I managed to make my fish config significantly faster than my ZSH one (startup time being ~10ms). My guess is that this probably doesn't really matter, but I guess more info doesn't hurt

Tysm for the details! anything helps.

I'm pretty sure this is not "your fault", any shell and any setup should work regardless of circumstanes, as long as you're patient and willing to work with me on this I can assure you we will not leave a rock unturned and get to the bottom of this issue.

I was able to identify what I believe is the common ground between the failing commands, they are all using the stdin input redirection directive < /path/to/fifo to feed contents into fzf from a FIFO generated by the mkfifo command.

Here's where I get confused: in order to avoid shell woes the fzf command is explicitly wrapped with POSIX shell using sh -c, as you can see below: https://github.com/ibhagwan/fzf-lua/blob/cddb1c3b23bf6f3e56435410a28e43c2c954b350/lua/fzf-lua/fzf.lua#L178-L184

The commands that never fails (files, etc) are not using stdin input redirection and instead using the environment variable FZF_DEFAULT_COMMAND which is spawned directly by fzf.

According to the termopen/jobstart help, supplying a table as a first argument avoids the shell alltogether, which means that fish isn't used in our scenario, yet somehow this happens only on fish.

:help jobstart

jobstart({cmd} [, {opts}])              *jobstart()*
Spawns {cmd} as a job.
If {cmd} is a List it runs directly (no 'shell').
If {cmd} is a String it runs in the 'shell', like this:
:call jobstart(split(&shell) + split(&shellcmdflag) + ['{cmd}'])
(See |shell-unquoting| for details.)

Althought this might happen because of mkfifo I'm not sure if this is the case as, again, the command is called with first argument as a table which should spwan without a shell and also the strange ^]i escape sequence seems like something more likely to happen after termopen (where mkfifo is called prior): https://github.com/ibhagwan/fzf-lua/blob/cddb1c3b23bf6f3e56435410a28e43c2c954b350/lua/fzf-lua/fzf.lua#L71-L75

This leaves me to suspect a delay (as you mentioned, sometimes it does open properly after a while) while executing the fzf command which could come from an external command which somehow uses fish?

Although I'm still unsure why this would cause such a behavior, can you post the output of the below:

This is the equivalent of echo $FZF_DEFAULT_OPTS in the shell

:lua print(vim.env.FZF_DEFAULT_COMMAND)
:lua print(vim.env.FZF_DEFAULT_OPTS)
iovis commented 1 year ago

I'm pretty sure this is not "your fault", any shell and any setup should work regardless of circumstanes, as long as you're patient and willing to work with me on this I can assure you we will not leave a rock unturned and get to the bottom of this issue.

I really appreciate the help. I love this library and have used for more than a year now, I'll do anything I can to help! I also appreciate explaining what's going on.

Here's where I get confused: in order to avoid shell woes the fzf command is explicitly wrapped with POSIX shell using sh -c

Smart. I agree this makes it more confusing that this weird behavior would be terminal dependent, but I'm still not knowledgeable enough of fish to be aware of possible differences. I'll mention that, because of how fish works, not only I managed to get a significant speedup on non-interactive startup of fish, I was also able to add more stuff that I couldn't with zsh —because otherwise I would be able to notice a performance impact—, which means I'm loading functions and aliases that I use a lot in interactive sessions that didn't use to be there before (I don't think this is the issue, but probably something to keep in mind). I can't easily "not load them in non-interactive" because they're autoloaded functions (instead of aliases).

This leaves me to suspect a delay (as you mentioned, sometimes it does open properly after a while) while executing the fzf command which could come from an external command which somehow uses fish?

Although I'm still unsure why this would cause such a behavior, can you post the output of the below:

My default command is using fd, a find alternative written in rust.

print(string.format("FZF_DEFAULT_COMMAND: %s", vim.env.FZF_DEFAULT_COMMAND))
print(string.format("FZF_DEFAULT_OPTS: %s", vim.env.FZF_DEFAULT_OPTS))
FZF_DEFAULT_COMMAND: fd -H -E '.git' -E '.keep' --type file --follow --color=always
FZF_DEFAULT_OPTS: --ansi --bind=ctrl-n:page-down,ctrl-p:page-up,alt-a:select-all,alt-d:deselect-all,alt-t:toggle-all,home:first,end:last

One (even weirder) thing, I'm currently on PTO so I'm using my work computer less and this problem is much harder to manifest in my personal computers. One of them is identical to my work computer, a macbook pro from last year, and an old intel mac mini. My dotfiles are in github and the same across the board, so the biggest difference would be me running much heavier applications on my work computer (I usually just SSH into my personal ones from my iPad, and not running things like Docker or the MDM software).

I'll also mention that I use neovim within tmux almost exclusively, although I was able to reproduce outside of tmux (in case you wanted to rule it out).

iovis commented 1 year ago

I just took my work computer to see if I could reproduce again, and the problem was there immediately.

I even saw a flash of ^[i in the files command (but fixed itself almost instantly). The buffers command is not working.

I unset all my FZF environment variables (and double checked with :lua=vim.env.FZF_DEFAULT_COMMAND) and the issue is still there. Are there any print statements or sh -c commands you want me to run?

iovis commented 1 year ago

I recorded a video, if you pay attention to the beginning of the files command (it's just a few frames) you'll see the ^[i appearing:

https://user-images.githubusercontent.com/6660202/218008518-094ca662-e1cd-4c4c-8657-18412cf9538f.mov

iovis commented 1 year ago

While I had the system in this state, I tried changing my $SHELL to zsh.

It doesn't fail... but while recording I'm just realizing I'm seeing the ^[i too, it just gets the right output immediately after:

https://user-images.githubusercontent.com/6660202/218009328-fce05587-2096-4dd9-ad17-b7482a09b7bd.mov

ibhagwan commented 1 year ago

Ty @iovis, this is very helpful.

I'm just realizing I'm seeing the ^[i too, it just gets the right output immediately after

I can clearly see it in both your videos, still no idea where it’s coming from but it’s good to know it’s happening in all commands so I can invalidate some of my other assumptions.

Are there any print statements or sh -c commands you want me to run?

Running any command with debug=true provides debug output of the command, preview parameters and other info, none that would help us in this case though, I can create a branch with extra logging but I’m still unsure where to put these and what information I’m even looking for as I still have no hints as to the source of this thing.

I’m suspecting something in your shell configuration as this issue would have probably come up before by many others but I can’t pin point what… armed with this new information I’ll keep digging in the code and try to come up with a better idea.

ibhagwan commented 1 year ago

I just took my work computer to see if I could reproduce again, and the problem was there immediately.

One clarification I forgot to ask, is the work computer using the same dotfiles?

If so, are your dotfiles (or at least the shell profile) online? If I can reproduce this on my system with your dotfiles I would be able to solve it much faster.

iovis commented 1 year ago

One clarification I forgot to ask, is the work computer using the same dotfiles?

If so, are your dotfiles (or at least the shell profile) online? If I can reproduce this on my system with your dotfiles I would be able to solve it much faster.

Yes, same dotfiles. You can find them here:

I managed to reproduce with a very minimal configuration (video attached):

# ~/.config/fish/config.fish
/opt/homebrew/bin/brew shellenv | source  # This is the path for `nvim`, `fzf` and friends
sh -c "$(curl -s https://raw.githubusercontent.com/ibhagwan/fzf-lua/main/scripts/mini.sh)"

In the video, I ran the commands with debug=true. It also doesn't look like ^[i appears anymore, it may have been a red herring.

https://user-images.githubusercontent.com/6660202/218150923-90911ea6-2942-422e-a2b2-5eab8c6d3577.mp4

ibhagwan commented 1 year ago

Ty @iovis!

I just tried the same exact setup on a OSX, neovim 0.8.3 with fzf 0.37 with your config.fish (with starship installed) and I still can’t reproduce this annoyance lol

I’d like to try something else, I have a hunch that may or may not be true that this might be related to the new load event added with fzf 0.36, would you be able to download fzf 0.35.1 Darwin binary from fzf releases, set fzf_bin within fzf-lua’s setup function and see if the issue persists?

iovis commented 1 year ago

I just tried the same exact setup on a OSX, neovim 0.8.3 with fzf 0.37 with your config.fish (with starship installed) and I still can’t reproduce this annoyance lol

Can't say I'm surprised, I seem to only be able to reproduce it consistently under very heavy load.

I’d like to try something else, I have a hunch that may or may not be true that this might be related to the new load event added with fzf 0.36, would you be able to download fzf 0.35.1 Darwin binary from fzf releases, set fzf_bin within fzf-lua’s setup function and see if the issue persists?

For a second I got excited, because I managed to make buffers work the first time, but then it failed again. I tried setting fzf_bin directly and also just having it in my $PATH, but no differences.

https://user-images.githubusercontent.com/6660202/218187994-bc8662c9-581d-4011-b2b8-3d5ed287a308.mp4

iovis commented 1 year ago

For context, under the same load, it works fine in ZSH:

https://user-images.githubusercontent.com/6660202/218189036-df9cab7f-25f2-4327-9846-5220a72989f7.mp4

Sorry that it's a bit hard to read with the bg being set to magenta for some reason

iovis commented 1 year ago

Also, when reproducing for fish, I'm using this as config.fish:

# ~/.config/fish/config.fish
set -gx XDG_CONFIG_HOME $HOME/.config
set -gx XDG_DATA_HOME $XDG_CONFIG_HOME/.local/share
set -gx XDG_CACHE_HOME $XDG_CONFIG_HOME/.cache

fish_add_path /opt/homebrew/bin
ibhagwan commented 1 year ago

Can't say I'm surprised, I seem to only be able to reproduce it consistently under very heavy load.

What is considered "heavy load"? Are you running any background process I that I can attempt to do the same? When said command ends and load subsides does it still fail or will it work again?

Sorry that it's a bit hard to read with the bg being set to magenta for some reason

No worries, I should probably adjust the defaults in to look somewhat normal on the default colorscheme.

Also, when reproducing for fish, I'm using this as config.fish

I used the same as this was inside the config.fish you linked above, can't rule anything out anymore but I'm not sure why this sohuld make a difference.

ibhagwan commented 1 year ago

For context, under the same load, it works fine in ZSH

This is what’s so confusing here, there should be no point in the code where fish shell is run or matters, at all, anywhere a shell command is used is wrapped with sh -c.

During the plug-in development I did notice some commands behave differently if $SHELL is set to fish, for example shellescape, perhaps I should go in this route and examine every neovim API call and it’s help to see where there’s a special case for fish. From initial search doesn’t seem like any other command is affected by the $SHELL variable aside from termopen | jobstart which is already wrapped in sh -c

iovis commented 1 year ago

What is considered "heavy load"? Are you running any background process I that I can attempt to do the same? When said command ends and load subsides does it still fail or will it work again?

I just tried stopping everything thinking it would come back to normal and it's still reproducible, so I'm now even more confused of why it's easier to reproduce on my work computer. The biggest difference with my personal computers would be the MDM software, but I still have managed to reproduce in my personal computers at some point or another (just less frequently).

During the plug-in development I did notice some commands behave differently if $SHELL is set to fish, for example shellescape, perhaps I should go in this route and examine every neovim API call and it’s help to see where there’s a special case for fish. From initial search doesn’t seem like any other command is affected by the $SHELL variable aside from termopen | jobstart which is already wrapped in sh -c…

Do you want to give me some code to run manually in my editor to see if I can pinpoint which line is the one failing? I can compare between ZSH and fish to maybe see if there's any difference

ibhagwan commented 1 year ago

Do you want to give me some code to run manually in my editor to see if I can pinpoint which line is the one failing? I can compare between ZSH and fish to maybe see if there's any difference

I think this is the best route, next chance I’m on the computer I’ll create a debug branch and pepper the code with some “before” and “after” prints, you can then set Lazy to said branch and we can examine the output from :messages which will hopefully be more insightful.

iovis commented 1 year ago

I think this is the best route, next chance I’m on the computer I’ll create a debug branch and pepper the code with some “before” and “after” prints, you can then set Lazy to said branch and we can examine the output from :messages which will hopefully be more insightful.

Sounds good, I appreciate the patience helping me debug this!

ibhagwan commented 1 year ago

I think this is the best route, next chance I’m on the computer I’ll create a debug branch and pepper the code with some “before” and “after” prints, you can then set Lazy to said branch and we can examine the output from :messages which will hopefully be more insightful.

Sounds good, I appreciate the patience helping me debug this!

Can you try branch 633_debug?

It has prints before and after suspected areas of the code, paste the output of :messages while it’s stuck and especially the message the is displayed while the window is open and hopefully we get a good hint from it.

iovis commented 1 year ago

Can you try branch 633_debug?

It has prints before and after suspected areas of the code, paste the output of :messages while it’s stuck and especially the message the is displayed while the window is open and hopefully we get a good hint from it.

fish

before win create
after win create
before raw_fzf
before tempname1
after tempname1, before tempname2
after tempname2
before mkfifo /var/folders/99/_2_c98ys0f9428nsf1_vvp440000gp/T/nvim.david/SGmKtp/1
after mkfifo /var/folders/99/_2_c98ys0f9428nsf1_vvp440000gp/T/nvim.david/SGmKtp/1
before termopen
after termopen
before fs_open /var/folders/99/_2_c98ys0f9428nsf1_vvp440000gp/T/nvim.david/SGmKtp/1
after fs_open /var/folders/99/_2_c98ys0f9428nsf1_vvp440000gp/T/nvim.david/SGmKtp/1
before contents
after contents # => THE ONE DISPLAYED WHILE THE WINDOW IS OPEN
termopen exit with 129
after raw_fzf

zsh

before win create
after win create
before raw_fzf
before tempname1
after tempname1, before tempname2
after tempname2
before mkfifo /var/folders/99/_2_c98ys0f9428nsf1_vvp440000gp/T/nvim.david/oPOVuG/1
after mkfifo /var/folders/99/_2_c98ys0f9428nsf1_vvp440000gp/T/nvim.david/oPOVuG/1
before termopen
after termopen
before fs_open /var/folders/99/_2_c98ys0f9428nsf1_vvp440000gp/T/nvim.david/oPOVuG/1
after fs_open /var/folders/99/_2_c98ys0f9428nsf1_vvp440000gp/T/nvim.david/oPOVuG/1
before contents
after contents
termopen exit with 0
after raw_fzf
ibhagwan commented 1 year ago

Ty @iovis, this just got even stranger lol

after contents # => THE ONE DISPLAYED WHILE THE WINDOW IS OPEN

This is the code part: https://github.com/ibhagwan/fzf-lua/blob/2d675a6a387eb3d0d67ad83fa038c6ca0cf0f3a0/lua/fzf-lua/fzf.lua#L260-L264

Essentially all the fzf-lua code was executed and now neovim is waiting for the terminal job (fzf) to exit.

Another oddity is an exit code not mentioned in fzf manual:

0      Normal exit
1      No match
2      Error
130    Interrupted with CTRL-C or ESC
iovis commented 1 year ago

Another oddity is an exit code not mentioned in fzf manual:

I can reproduce the error 129 in ZSH (normal operation) when clicking outside the fzf popup in neovim or running :close while the popup is open. So I'm not sure if it's FZF's necessarily. I think that's just related to me closing the term popup weirdly when it fails

ibhagwan commented 1 year ago

Exit code 129 is apparently SIGHUP which gets sent when the terminal is closing which I assume you force closed with <C-w-q>, now the unanswered questions:

I’ll admit, this is quite the bug we got here lol, but at least I know now I need to focus on alternatives to redirecting the fifo into stdin, I’ll look for alternative (perhaps tee can help) and we can do some more testing.

iovis commented 1 year ago

Want me to run the sh -c command directly in fish and zsh?

iovis commented 1 year ago

I wonder if the underlying sh -c is inheriting some weird env. I noticed that when I ran zsh from a fish env to try to reproduce from zsh, it would fail in zsh too. I had to chsh -s /bin/zsh to "clean it", and it would not be reproducible anymore. So I wonder if the sh -c is somehow inheriting something from my env too

ibhagwan commented 1 year ago

Just switched the input redirection to tail in the debug branch, can you try to pull rebase and see if that solves the issue (I have a good feeling it might)?

iovis commented 1 year ago

@ibhagwan I think you nailed it!

I've been trying to break it with my full config and I can't, it looks rock solid. I don't even see the ^[i flashing in those commands anymore! (I do see the ^[i flashing for things like files, but I know you didn't touch there).

Let me try again with clean configs and zsh and maybe heavy load again. But this looks promising!

ibhagwan commented 1 year ago

@ibhagwan I think you nailed it!

Close but not there yet! It seems it cuts the input arbitrarily at 10 items, for example, try FzfLua bulltin or FzfLua man_pages, I have no idea why, yet :)

I’m still trying to find a solution for this, but otherwise it’s a solid solution, it’s also better to let fzf handle the command process using FZF_DEFAULT_COMMAND.

iovis commented 1 year ago

Close but not there yet! It seems it cuts the input arbitrarily at 10 items, for example, try FzfLua bulltin or FzfLua man_pages, I have no idea why, yet :)

tail defaults to 10 lines! I was just going to mention that FzfLua commands only reported 10 lines and I was looking at the plugin code to see if there was a new paginator haha

It's the -n option. You can test with tail -n20

iovis commented 1 year ago

@ibhagwan I just changed:

FZF_DEFAULT_COMMAND = string.format("tail %s", vim.fn.shellescape(fifotmpname))
-- to
FZF_DEFAULT_COMMAND = string.format("cat %s", vim.fn.shellescape(fifotmpname))

And it works without limiting the output, not sure if that helps

ibhagwan commented 1 year ago

And it works without limiting the output, not sure if that helps

Ofc it helps, this is great, I’m just sure how cat handles ongoing input from a fifo file, I gonna do some more research and you might have just found our solution :)

iovis commented 1 year ago

I'm investigating in the meantime, and it seems like mkfifo works differently than other shells: https://github.com/fish-shell/fish-shell/issues/2867#issuecomment-204827423 https://stackoverflow.com/questions/61946995/ls-fifo-blocks-fish-shell https://stackoverflow.com/questions/69216897/what-are-the-differences-between-fish-shell-and-bash-when-handling-named-pipes

ibhagwan commented 1 year ago

I'm investigating in the meantime, and it seems like mkfifo works differently than other shells: fish-shell/fish-shell#2867 (comment) https://stackoverflow.com/questions/61946995/ls-fifo-blocks-fish-shell https://stackoverflow.com/questions/69216897/what-are-the-differences-between-fish-shell-and-bash-when-handling-named-pipes

The question is, why does this even matter, mkfifo command is executed directly (no shell) and termopen uses sh, so technically fish shell shouldn’t even be a factor here, very confusing.

In any event the cat solution seems to work great, if it works perfectly for you as well I will commit the change, lmk?

ibhagwan commented 1 year ago

The second link you posted seems to be a clue as to what may be happening here, but again, we’re using sh… and it still doesn’t explain the inconsistency of the bug.

iovis commented 1 year ago

In any event the cat solution seems to work great, if it works perfectly for you as well I will commit the change, lmk?

It's working amazing.

The second link you posted seems to be a clue as to what may be happening here, but again, we’re using sh… and it still doesn’t explain the inconsistency of the bug.

Do you think you can give me an sh -c command with mkfifo that I could run directly in fish? I have a feeling something's weird with these subshells. When i ran zsh to get an interactive zsh session from within a fish session, the error was still reproducible as if I was still in fish. It would only "clean up" if I ran a chsh -s /bin/zsh and started over.

My hunch right now is that sh -c may not be as isolated as we think it is...

iovis commented 1 year ago

Actually, even if termopen is using sh, how would it know where to find fzf if it was not inheriting at least the $PATH? Wouldn't that mean it's not fully isolated?

iovis commented 1 year ago
Screenshot 2023-02-10 at 20 24 20
iovis commented 1 year ago

I've been trying with:

sh -c 'mkfifo myfifo; (ls > myfifo &); cat < myfifo; rm myfifo'

But I couldn't see any error

iovis commented 1 year ago

In any case, thank you so much for fixing this issue and for your patience ❤️

ibhagwan commented 1 year ago

https://github.com/ibhagwan/fzf-lua/commit/a294a8330e9b0622acc3de4e0b9ab4e853b53fa2 - I'm on a flight and the plane WiFi is blocking my git ssh connection, I had to do this from the web interface (first time), hopefully I didn't mess up anything :-)

Tysm @iovis for your patience! after much joint efforts I think we've come to the conclusion of this quite mysterious bug.

My hunch right now is that sh -c may not be as isolated as we think it is...

That is the only way to explain it but that would be quite the conundrum, perhaps this is isolated to sh -c within a neovim job? That would make more sense as it's a pretty limited usecase and the reason this hadn't come up until now.

Actually, even if termopen is using sh, how would it know where to find fzf if it was not inheriting at least the $PATH? Wouldn't that mean it's not fully isolated?

Using termopen|jobstart inherits the current env unless clear_env = true is specified in the options, the question is what in the env vars affects input redirection?

I've been trying with: But I couldn't see any error

Idk if this rabbit hole is worth pursuing any more than what we already did :-)

In any case, thank you so much for fixing this issue and for your patience

Ty again @iovis, it's been a pleasure working with you!

ibhagwan commented 1 year ago

https://github.com/ibhagwan/fzf-lua/issues/633#issuecomment-1426290620:

Sorry that it's a bit hard to read with the bg being set to magenta for some reason

Now fixed with https://github.com/ibhagwan/fzf-lua/commit/1df92492201ff9bf081a276c3c773a36cfee66a0, no more magenta on default colorsceme lol, beats me why linking to a cleared highlight group would choose Pmenu as the background color but the highlight setup now checks for cleared links and resets the hl accordingly: image

iovis commented 1 year ago

Much easier to read haha. Btw, the fix has been working wonderfully on my end these days, tysm for looking into this!