Closed tavisrudd closed 10 years ago
Thanks @tavisrudd - I'll try and get it merged this week.
I've seen one intermittent failure (1 run out of 1000) here. It is on a branch test, and as the NOTICE
points out, these rely on non-guaranteed ordering semantics, so it's possibly not a problem, but we should keep an eye on it.
t4@guest-10-190:distributed-process-platform $ ./dist/build/SupervisorTests/SupervisorTests +RTS -N
NOTICE: Branch Tests (Relying on Non-Guaranteed Message Order) Can Fail Intermittently
Supervisor Processes:
>>>>>>>>>>>>>> [snip]
Restart Left:
Restart Left, Left To Right (Sequential) Restarts: [OK]
Restart Left, Leftmost Child Dies: [OK]
Restart Left, Left To Right Stop, Left To Right Start: [Failed]
Expected: equalTo pid://127.0.0.1:8080:0:2625
but: was pid://127.0.0.1:8080:0:2627
Restart Left, Right To Left Stop, Right To Left Start: [OK]
Restart Left, Left To Right Stop, Reverse Start: [OK]
Restart Left, Right To Left Stop, Reverse Start: [OK]
Restart Right:
Restart Right, Left To Right (Sequential) Restarts: [OK]
Restart Right, Rightmost Child Dies: [OK]
Restart Right, Left To Right Stop, Left To Right Start: [OK]
Restart Right, Right To Left Stop, Right To Left Start: [OK]
Restart Right, Left To Right Stop, Reverse Start: [OK]
Restart Right, Right To Left Stop, Reverse Start: [OK]
Restart Intensity:
Three Attempts Before Successful Restart: [OK]
Permanent Child Exceeds Restart Limits: [OK]
ToChildStart Link Setup:
Both Local Process Instances Link Appropriately: [OK]
Test Cases Total
Passed 70 70
Failed 1 1
Total 71 71
I've seen the following failures a few times:
Supervisor Processes: Stopping And Deleting Children: Sequential Shutdown Ordering: [Failed] expected the shutdown order to hold
NOTICE: Branch Tests (Relying on Non-Guaranteed Message Order) Can Fail Intermittently
Mon Mar 17 01:36:56 UTC 2014 [trace] MxReceived pid://127.0.0.1:8080:0:10 "\NUL\NUL\NUL\NUL\NUL\NUL\NUL\EOTjob1\NUL" :: (2bef152fd819b3fd,9f700da2acc86729) Mon Mar 17 01:36:56 UTC 2014 [trace] MxProcessDied pid://127.0.0.1:8080:0:10 (DiedException "exit-from=pid://127.0.0.1:8080:0:10,reason=timing is out - job1 isn't registered yet") Mon Mar 17 01:36:58 UTC 2014 [trace] MxProcessDied pid://Task Execution And Prioritisation: Each execution blocks the submitter: [OK] Only 'max' tasks can proceed at any time: [Failed] ERROR: thread blocked indefinitely in an MVar operation Crashing Tasks are Reported Properly: [OK]
Test Cases Total
Passed 2 2 Failed 1 1 Total 3 3 127.0.0. Test suite TaskQueueTests: FAIL
Tue Mar 18 20:41:07 UTC 2014 [trace] MxReceived pid://127.0.0.1:8080:0:29 "\NUL\NUL\NUL\NUL\NUL\NUL\NUL\EOTjob2\SOH" :: (5187ee24bb3438de,9efee0a8a7e7c95) Tue Mar 18 20:41:07 UTC 2014 [trace] MxProcessDied pid://127.0.0.1:8080:0:18 DiedNormal Tue Mar 18 20:41:07 UTC 2014 [trace] MxSpawned pid://127.0.0.1:8080:0:30 Tue Mar 18 20:41:07 UTC 2014 [trace] MxProcessDied pid://127.0.0.1:8080:0:29 DiedNormal Tue Mar 18 20:41:07 UTC 2014 [trace] MxSent pid://127.0.0.1:8080:0:21 pid://127.0.0.1:8080:0:16 [unencoded message] :: CallResponse (Either ExitReason [Char]) Tue Mar 18 20:41:07 UTC 2014 [trace] MxReceived pid://127.0.0.1:8080:0:21 [unencoded message] :: CallResponse (Either ExitReason [Char]) Tue Mar 18 20:41:07 UTC 2014 [trace] MxProcessDied pid://127.0.0.1:8080:0:28 DiedNormal Tue Mar 18 20:41:07 UTC 2014 [trace] MxProcessDied pid://127.0.0.1:8080:0:21 DiedNormal Tue Mar 18 20:41:07 UTC 2014 [trace] MxProcessDied pid://127.0.0.1:8080:0:30 DiedNormal Tue Mar 18 20:41:07 UTC 2014 [trace] MxSpawned pid://127.0.0.1:8080:0:31 Tue Mar 18 20:41:07 UTC 2014 [trace] MxProcessDied pid://127.0.0.1:8080:0:31 DiedNormal Tue Mar 18 20:41:07 UTC 2014 [trace] MxProcessDied pid://127.0.0.1:8080:0:20 DiedNormal Tue Mar 18 20:41:07 UTC 2014 [trace] MxProcessDied pid://127.0.0.1:8080:0:13 DiedNormal Tue Mar 18 20:41:Task Execution And Prioritisation: Each execution blocks the submitter: [OK] Only 'max' tasks can proceed at any time: [OK] Crashing Tasks are Reported Properly: [Failed] expected the server to report an error
On May 9, 2014, at 5:58 AM, Tim Watson notifications@github.com wrote:
I've seen one intermittent failure (1 run out of 1000) here. It is on a branch test, and as the NOTICE points out, these rely on non-guaranteed ordering semantics, so it's possibly not a problem, but we should keep an eye on it.
t4@guest-10-190:distributed-process-platform $ ./dist/build/SupervisorTests/SupervisorTests +RTS -N NOTICE: Branch Tests (Relying on Non-Guaranteed Message Order) Can Fail Intermittently Supervisor Processes:
[snip] Restart Left: Restart Left, Left To Right (Sequential) Restarts: [OK] Restart Left, Leftmost Child Dies: [OK] Restart Left, Left To Right Stop, Left To Right Start: [Failed]
Expected: equalTo pid://127.0.0.1:8080:0:2625 but: was pid://127.0.0.1:8080:0:2627 Restart Left, Right To Left Stop, Right To Left Start: [OK] Restart Left, Left To Right Stop, Reverse Start: [OK] Restart Left, Right To Left Stop, Reverse Start: [OK] Restart Right: Restart Right, Left To Right (Sequential) Restarts: [OK] Restart Right, Rightmost Child Dies: [OK] Restart Right, Left To Right Stop, Left To Right Start: [OK] Restart Right, Right To Left Stop, Right To Left Start: [OK] Restart Right, Left To Right Stop, Reverse Start: [OK] Restart Right, Right To Left Stop, Reverse Start: [OK] Restart Intensity: Three Attempts Before Successful Restart: [OK] Permanent Child Exceeds Restart Limits: [OK] ToChildStart Link Setup: Both Local Process Instances Link Appropriately: [OK]
Test Cases Total
Passed 70 70
Failed 1 1
Total 71 71
— Reply to this email directly or view it on GitHub.
ERROR: thread blocked indefinitely in an MVar operation
That looks to me like a bug in the test code. There is no code blocking on MVar
s in the task queues after all, so my assumption is that something has crashed or ceased communicating with the coordinating thread, leaving the test case unable to proceed (and thankfully, generating a runtime deadlock warning out of the RTS). That could be (is probably!?) indicative of a bug, but we need to track down the source of the failure. I'll try and look at it this week.
Fixes DPP-98 on JIRA.