Closed behlendorf closed 1 week ago
I wonder if the actual problem of the test is that supposedly healthy devices really do not fit into the timeout? Sure your patch should "work", but it might just hide valid failure case when vdevs are falsely reported slow, that is tested now. I wonder if we should increase the timeouts instead.
I wonder if the actual problem of the test is that supposedly healthy devices really do not fit into the timeout?
That's exactly my suspicion. I'm game to try increasing the timeouts first and see if that resolves it. There's not going to be any right value so we'll have to pick something and see how it goes. We may end up just making the failure less likely, but not impossible.
Edit: updated PR to set slow IO threshold at 750ms and inject 1000ms delays.
I've run this through the CI 3 times now without reproducing this failure which is a good sign. Only after we merge this will we know for certain if the larger values are enough to resolve this entirely.
Motivation and Context
Resolve false positives reported by the CI for the
zpool_status_008_pos
test case.https://github.com/openzfs/zfs/actions/runs/11848141397/job/33027753860?pr=16755 https://github.com/openzfs/zfs/actions/runs/11848141397/job/33027755132?pr=16755
Description
~When checking that healthy vdevs[1-3] aren't shown omit the -s flag so slow vdevs are not considered as described by the comment. This avoids the possibility of a false positive in the CI when ZIO_SLOW_IO_MS is reduce to 160ms. Additionally, clear the fault injection as soon as it is no longer required for the test case.~
Increase the injected delay to 1000ms and the ZIO_SLOW_IO_MS threshold to 750ms to avoid false positives due to unrelated slow IOs which may occur in the CI environment. Additionally, clear the fault injection as soon as it is no longer required for the test case.
How Has This Been Tested?
Updated only the test case which will be tested by the CI.
Types of changes
Checklist:
Signed-off-by
.