Open h1z1 opened 2 years ago
I believe this is trickier than one might imagine - see #11082 for how invasive the changes needed may be to discard things in flight.
Indeed it won't be a simple switch from everything being sync to async but there is hope. LUA for example https://github.com/openzfs/zfs/pull/8904. Events and history seem like rather simple cases, other things like zpool add/replace/etc are of course going to need more thought. Maybe the low hanging fruit is to make zfs/zpool commands themselves timeout aware?
The major issue that you can't kill them from the shell so it hangs a lot of things. In the case of multiple pools in one system it can completely kill unrelated ones.
Related is the use of spinlocks period. A side effect of the failure related above was the CPU spins to the point the kernel thinks it's stuck (NMI watchdog isn't enabled).
The source for mutex.h has a comment about how Linux mutexes don't promise serialization in some edge cases, but the semantics elsewhere in the OpenZFS codebase assume the mutex type does, so they need to implement certain codepaths with a spinlock specifically to provide the additional guarantees.
If you can find a better solution, I'm sure it'd be welcome.
On Fri, May 6, 2022 at 11:32 PM h1z1 @.***> wrote:
Indeed it won't be a simple switch from everything being sync to async but there is hope. LUA for example #8904 https://github.com/openzfs/zfs/pull/8904. Events and history seem like rather simple cases, other things like zpool add/replace/etc are of course going to need more thought. Maybe the low hanging fruit is to make zfs/zpool commands themselves timeout aware?
The major issue that you can't kill them from the shell so it hangs a lot of things. In the case of multiple pools in one system it can completely kill unrelated ones.
Related is the use of spinlocks period. A side effect of the failure related above was the CPU spins to the point the kernel thinks it's stuck (NMI watchdog isn't enabled).
— Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/13427#issuecomment-1120124000, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUI7MFFYW46RIPF2VLCQDVIXP4FANCNFSM5VHN2JNA . You are receiving this because you commented.Message ID: @.***>
Bit misuse of words on my part sorry, I meant they have a place of course but could be tweaked. Would it not make more sense in the case above for example to either return an error or avoid the txg sync entirely? There's still an underlining issue with how the pool got into that state but it would allow some further investigation. S'pose another option is expose them in procfs?
Describe the feature would like to see added to OpenZFS
It can be extremely helpful to obtain troubleshooting information from a system crashing or on the verge of it. Problem is some tasks will hang when they could timeout.
How will this feature improve OpenZFS?
By allowing a command like zpool events to timeout (with an appropriate error), it can be the difference between a complete system failure and recovery. I'd imagine there are other similar commands like zpool status, iostats, etc.
In zpool history's case there's even a comment in the code noting history is async thus the need for a txg sync:
That logic seems backward. The log should be atomic on creation / submission, async on read no? Otherwise how is consistancy kept if the system is not able to do sync? spa_history_lock is held on read.
Additional context
I expect there is no one fix for all but in the case of zpool history, when the pool is unable to sync due to other issues, the stack becomes:
It will never return due to the failure.