Closed lewis6991 closed 1 year ago
I've been able to reduce this down a lot:
async_start_worker gitprompt
async_register_callback gitprompt refresh_prompt_callback
refresh_prompt_callback() {
local job=$1 err=$2
case $job in
\[async])
# Async worker has crashed
if (( err == 2 )) || (( err == 3 )) || (( err == 130 )); then
echo "ERROR($err)"
fi
;;
esac
}
Running this in an interactive shell and then quickly refreshing the prompt causes the worker to be killed. Any tips on how this could be debugged further?
Thanks for reporting.
I find it strange that https://github.com/mafredri/zsh-async/commit/361dc171e65c82f57ad814ebecea91c98a6d4b68 is the root cause. It's a commit that doesn't really touch on any worker logic, only modifies the startup procedure a bit. I'm guessing you tried commits before and after to determine that this is where it started?
On what system and version of Zsh are you running into these issues?
If I was to venture a guess, I'd think async_flush_jobs gitprompt
was the culprit (that crashes the worker). Can you reproduce the issue with flush jobs commented out?
Also, are you receiving any async error messages (i.e. the ones named [async]
)? If yes, what do they say?
refresh_prompt:zle:12: widgets can only be called when ZLE is active
This error should be avoidable by first checking that ZLE is active, zle && zle reset-prompt
.
I find it strange that 361dc17 is the root cause. It's a commit that doesn't really touch on any worker logic, only modifies the startup procedure a bit. I'm guessing you tried commits before and after to determine that this is where it started?
I tried quite a few commits but I could have made some mistakes in my testing. All I can say for sure is that 1.7.2 didn't exhibit any issues and 1.8.0 does.
On what system and version of Zsh are you running into these issues?
I'm running on rhe7 using a linuxbrew build of zsh 5.8
If I was to venture a guess, I'd think async_flush_jobs gitprompt was the culprit (that crashes the worker). Can you reproduce the issue with flush jobs commented out?
This was my first guess too but It didn't seem to make any difference.
Also, are you receiving any async error messages (i.e. the ones named [async])? If yes, what do they say?
Using my second code snippet the async error code is 2
so something is going on with ZLE.
This error should be avoidable by first checking that ZLE is active, zle && zle reset-prompt.
I saw this in the pure.zsh code. Whilst it will work around the error, I'm really curious as to why I'm seeing this now and not before.
For now I have been able to workaround the worker being killed by restarting it every time in precmd
, this seems to give reliable behavior no matter what I do in the prompt. Ideally I would like to understand why I'm getting error code 2 when I refresh the prompt too quickly, is it likely to do with something inherent with ZLE?
Thanks
Using my second code snippet the async error code is
2
so something is going on with ZLE.
This error originates from: https://github.com/mafredri/zsh-async/blob/490167c4aa5a870b3da1458859bcf3a9d1e24f97/async.zsh#L360 I have never actually ran into this error myself, I wonder what the actual ZLE error code is. Could you also try to log and share the error message that is sent (stderr for the callback)?
I'm also curious to know if you are reproducing the errors with the minimal configs you posted, exactly as-is? I.e. no other zsh plugins, settings, etc. And if so, could it be something ~/.prompt
does?
_async_zle_watcher:17: error: fd for gitprompt failed: zle -F 15 returned error hup
I also reduced down my .zshrc
and found the plugin marlonrichert/zsh-autocomplete
appears to cause the errors (in the sense the errors don't appear when that plugin is unloaded).
This consistently produces the error for me:
async_start_worker gitprompt
async_register_callback gitprompt refresh_prompt_callback
foo() {
A="a b c d"
vared A
}
zpty testpty foo
zpty -d testpty
refresh_prompt_callback() {
local job=$1 err=$2
case $job in
\[async])
# Async worker has crashed
if (( err == 2 )) || (( err == 3 )) || (( err == 130 )); then
echo "ERROR($err): $5"
fi
;;
esac
}
Ok, this definitely looks like an issue I've been combating since day one. Zpty destroy signals (i.e. HUP) is propagated to all zpty
s that we're created before the one being destroyed.
Most likely this change is the root cause of your current issues: https://github.com/mafredri/zsh-async/commit/32548d3c3f1361de57f09ab9293c902b78f49b55#diff-c7f89cff42efffc19f69071441a12a1cR86-R90
It should've been fixed by this commit: https://github.com/zsh-users/zsh/commit/caddeca1ac638137b26735fc8c63d08c83be6a90. But alas, we may have to revert the TRAPHUP change from above.
Is there any workarounds for this? I run into this many times a day which is a bit of a pain (but still better than not using zsh-async!)
@howardjohn If it's any help, the only thing I was able to do was reinitializing the workers when they die.
The second argument that is given to the callback function is the return code.
Docs on all the return codes:
1 Corrupt worker output.
2 ZLE watcher detected an error on the worker fd.
3 Response from async_job when worker is missing.
130 Async worker crashed, this should not happen but it can mean the file descriptor has become corrupt. This must be followed by a async_stop_worker [name] and then the worker and tasks should be restarted. It is unknown why this happens.
By just checking for this return code in the callback, you can reinitialize your workers when needed. I haven't had a problem in months with typewritten since I implemented that check.
Example of callback function checking for the return code:
tw_prompt_callback() {
local tw_name=$1 tw_code=$2 tw_output=$3
# Check for return codes indicating an error
if (( tw_code == 2 )) || (( tw_code == 3 )) || (( tw_code == 130 )); then
# reinit async workers
async_stop_worker tw_worker # stop the current worker
tw_async_init_worker # Init the worker again, and register the callback, see below
tw_async_init_tasks # Init all the tasks
elif (( tw_code )); then
# return code is not empty, reinit all tasks
tw_async_init_tasks
fi;
...
}
# For reference purpose
tw_async_init_worker() {
async_start_worker tw_worker -n
async_register_callback tw_worker tw_prompt_callback
}
@howardjohn I'm working on some improvements, but I can't say for sure if they will help.
First off, how are you using zsh-async? I've never been able to reproduce constant worker death but I know some scenarios that can cause it. For instance, sending hundreds of jobs to the worker in quick succession.
Edit: And what version of zsh are you using?
It'd be great if you could try out #45, then maybe #49. And finally there's a pretty huge rewrite going on in the (very WIP) test-rebased
branch (based of the mentioned PRs). It's possibly the best bet at fixing worker death but will require a lot more testing and fine tuning.
@reobin it's not ideal, but indeed the best solution for current master
branch, thanks for suggesting it!
@reobin thanks! that is essentially what i have been doing manually, ran it for a few hours and seems great.
I use it for my prompt, so it gets a decent number of jobs (every time I hit enter) but shouldn't be more than a couple per second
I also haven't reproduced it consistently so its hard to quickly test out changes but I can throw them in my shell for a while and see what happens
$ zsh --version
zsh 5.8 (x86_64-debian-linux-gnu)
After a couple days testing, the restart jobs workaround did not work (extremely likely i just set it up wrong), and https://github.com/mafredri/zsh-async/pull/45 also did not.
Will try #49 now
@howardjohn Thanks for testing, and too bad about the workaround. If it's any help, here's how we set the worker restart up in Pure: https://github.com/sindresorhus/pure/blob/dfc8062c64df8821eaec7d741c75f3cee20d37e3/pure.zsh#L478-L495
As expected, I had the workaround messed up, simple type :woman_facepalming: . I verified the workaround does work, added some logging when it occurs so I see its transparently happened a couple times.
Unfortunately seems like #49 did not seem to help much here during my testing.
Update: 3 months later, the workaround to reset works great.
Not sure if this was known or not, but this can easily reproduce at least one occurrence of this (I have it print that it is restarting when it is triggered):
$ `exit`
restarting async. code=3
I've recently updated from 1.7.2 and I have found that https://github.com/mafredri/zsh-async/commit/361dc171e65c82f57ad814ebecea91c98a6d4b68 has caused a regression in my setup.
I use zsh-async to update my prompt with git info. Here's my implementation with only the relevant parts:
When I quickly refresh my prompt (pressing enter quickly in sucession), it causes my
gitprompt
worker to be killed. I also get a zle error.I got neither of these problems on 1.7.2. I guess the problem is something to do with using the zle watcher?
Any help on this would be greatly appreciated.