openzfsonwindows / ZFSin

OpenZFS on Windows port
https://openzfsonwindows.org
1.2k stars 69 forks source link

Fixed a race condition in 'spl_cv_wait' that could potentially cause a thread to miss a broadcast event #324

Closed arun-kv closed 3 years ago

arun-kv commented 3 years ago

Moved mutex_enter above waiters_count == 1 check in spl_cv_wait, to avoid the possibility of current thread wrongly detecting that is the only thread waiting for the event while there could be another thread that acquired the mutex and is about to increment the 'waiters_count' a few lines above.

imtiazdc commented 3 years ago

@lundman For some background, we ran into an issue where the dump seems to indicate there is at least one thread waiting endlessly choking other threads depending on it.

I spent some thought with @arun-kv on this change and this PR looks good to me. Is there any specific test you want us to run as this change is at the core of synchronization in ZFSin?

lundman commented 3 years ago

That looks quite reasonable. I suspect there is further issue, I have had mutex_exit() panic due to not held. But it happens so infrequently it has been hard to track.