hercules-390 / hyperion

Hercules 390
Other
251 stars 70 forks source link

runtest4 timeout value of 4.5 seconds is too low. #240

Closed srorso closed 6 years ago

srorso commented 6 years ago

The test case runtest4 times out on rare occasions, roughly once per 100 test executions. The issue appears in multiple UNIX-like systems.

Runtest4 tests the runtest command to ensure that it waits for all active processors to stop, not just any one of them. It starts four processors, having them loop for 1, 2, 3, and 4 seconds respectively. Each processor uses SIGP RESTART to start the next processor. The loop duration is calculated by each processor when it is started.

The test case times out when the accumulated delay in starting the fourth processor exceeds 0.5 seconds. This could be a 1/6 second delay in starting each processor or a 1/2 second delay starting any one processor. The delays appear to be the result of variability in thread dispatching by the host system. The issue appears in multiple UNIX-like systems.

Execution of runtest4 on a Hercules with a patch for #239 confirms delays in starting subsequent processors.

The PoOp discussion of SIGP RESTART says that function may take several seconds on real hardware, so there is no architecture requirement for changing SIGP RESTART to start the signaled processor immediately.

The implication for multi-processor test cases is that it is very difficult to determine definitively the execution time of such test cases, even when a test case includes fixed-time loops such as runtest4 because SIGP RESTART delays are variable and unpredictable.

Setting the timeout for runtest4 to 10.5 seconds on the "worst" case assumption that the processors run their loops serially rather than concurrently reduces the occurrence of timetouts to undetectable levels.

Fish-Git commented 6 years ago

Seems perfectly reasonable to me. Thank you, Steve!