Open romkatv opened 11 months ago
I have been trying to wrap my head around this today. I'm having trouble with the specifics of the situation you've outlined, but I do think I've found a (different?) way that this stuff can get borked stemming from _zsh_autosuggest_async_response
closing the fd but not clearing _ZSH_AUTOSUGGEST_ASYNC_FD
.
As to the scenario you've outlined:
_zsh_autosuggest_async_request
opens a file._zsh_autosuggest_async_request
closes the file descriptor.- powerlevel10k opens a file and gets the same file descriptor as above.
_zsh_autosuggest_async_response
fires and closes the same file descriptor.- powerlevel10k encounters errors when trying to read from the file descriptor.
I'm not understanding how step 3 can occur here. The scenario I'm imagining from these steps is a user typing two characters in relatively quick succession. The first character fires off a child process to fetch a suggestion (1.
above), which does not complete before the second character is typed (2.
above). This would have _zsh_autosuggest_async_request
open the fd (1.
) and then close it (2.
) in preparation for making the next suggestion request.
But then I don't understand how p10k could get the same fd that _zsh_autosuggest_async_request
is dealing with in steps 1 and 2. After closing the fd, _zsh_autosuggest_async_request
immediately opens another, which as I understand it will take the same number that it had just closed. That number should then be unavailable for p10k to take without some other action occurring first.
Would you be able to provide a more in-depth explanation?
While I was looking into this, as mentioned above, I did see a problem that could arise in a different but related scenario. I was able to trigger it by enabling ZSH_AUTOSUGGEST_MANUAL_REBIND
and then having a widget that opens a file descriptor defined after the first precmd after zsh-autosuggestions is sourced where it does its widget binding. This scenario results in _zsh_autosuggest_async_request
closing the fd opened by the widget. I believe a fix for this is simply clearing _ZSH_AUTOSUGGEST_ASYNC_FD
in _zsh_autosuggest_async_response
so that the next time into _zsh_autosuggest_async_request
it doesn't mistakenly try to close the fd again.
I've broken out some of your changes into a separate branch here: https://github.com/zsh-users/zsh-autosuggestions/compare/develop...fixes/romkatv-async-fixes
I tacked the _ZSH_AUTOSUGGEST_ASYNC_FD
fix mentioned above on the end. I'm curious if that fixes the issues you were having?
I'm not understanding how step 3 can occur here.
It might help if you take a look at the self-contained code snippet I posted together with its output. It basically shows that the main zle loop works as follows (pseudo code):
while (true) {
// Get a list of all file descriptors that are ready for reading.
// This includes STDIN and all file descriptors registered via `zle -F`.
ready_fds = select();
if (ready_fds contains STDIN) {
// STDIN has input. Read it and invoke bound widgets.
read_keys_and_invoke_widgets();
}
// Invoke `zle -F` handlers for all ready file descriptors.
watches = get_watches_for_fds(ready_fds);
foreach (watch in watches) {
invoke_fd_watch(watch);
}
}
If several file descriptors become ready for reading simultaneously, and the handler for the first of them closes the second file descriptor, the second handler will still get called. That's what the code snippet in the PR description demonstrates.
Thus, when _zsh_autosuggest_async_request
closes a file descriptor, the handler for this file descriptor will still fire if the fd is ready at that point. If in the interim some other code opens a file descriptor, it may be the same numerical value. It is in fact quite likely beucase the C runtime always allocates the smallest unused number when opening a file descriptor. In this case _zsh_autosuggest_async_response
will close a file descriptor that zsh-autosuggestions did not open. Or, more precisely, zsh-autosuggestions has already closed it, and now a different piece of code "owns" a file descriptor with the same numerical value.
I took a cursory look at your changes and I don't think they fix this bug, or at least it's difficult for me to see that they do. In my code the correctness is easier to assess because 1) whenever a file descriptor is opened, a handler for it is registered; 2) the file descriptor is closed only from the handler. There is no room for a double close or fd leak: one function opens an fd and ensures that another function is eventually invoked; the second function closes the fd.
I just realized that #630 contained this _ZSH_AUTOSUGGEST_ASYNC_FD
fix and I've merged that to develop
. I'm fairly sure that this will fix the issues you're seeing, and is a less expansive change.
Please let me know if you see more issues and if so, it would help if you could provide some specific reproduction steps. Another thing that would help me in the future is to break the PR changes into more commits to make it clearer which changes are addressing which problems.
I am positive that #630 does not fix the bug. I understand that it would be easier for you if I gave you a reproducible test case but it would take me a long time to build one given that we are talking about a race condition.
Can you see that the existing code does not provide a guarantee that each fd is closed only once? Do you see that the same fd can be first closed by this line and then again by this line? For that to happen, _zsh_autosuggest_async_request
needs to be invoked when the fd is ready for reading.
Ok, thanks. I'm still trying to get my head around this.
Is this an example of what you're thinking? Please correct/extend this example to illustrate the scenario you're thinking of and help me check my own assumptions:
Some keystroke invokes async_request forking pid 1000 and opening fd 12
And then some time later, simultaneously:
Then in one iteration of the main zle loop:
read
would block until the process 1001 finishes. Probably works but I think would end up blocking rather than being async. Definitely not intentional, and into sketchy territory.I think I see your point here, and something should be done to fix this :+1:
Though I still don't quite see how something else could open fd 12 in between points 1 and 2 above. I wonder if we have pivoted from the originally reported bug to something slightly different?
Spot on! Arbitrary code can be executed between 1 and 2 by a widget wrapper.
If you look at the code of my version rather than the diff, it might be easier to see its correctness. It should be possible to convince yourself that each fd is always closed exactly once.
This PR fixes two bugs in async code.
-
inecho -nE "$suggestion"
. This is necessary to prevent"$suggestion"
from being treated as an option forecho
._zsh_autosuggest_async_response
to ensure that each file descriptor is closed only once.It's the second bug that prompted the fix. The original code in some cases could close the same file descriptor twice. The code relied on an invalid assumption that
_zsh_autosuggest_async_response
cannot fire after the file descriptor is closed. Here's a demo that shows this assumption being violated:And here's the output I get if the code is pasted into an interactive zsh:
Note that
callback2
fires after its file descriptor has been closed bycallback1
.This bug was the culprit of several issues filed against powerlevel10k. In a nutshell:
_zsh_autosuggest_async_request
opens a file._zsh_autosuggest_async_request
closes the file descriptor._zsh_autosuggest_async_response
fires and closes the same file descriptor.