haskell-distributed / distributed-process-supervisor

Cloud Haskell Supervision Trees
BSD 3-Clause "New" or "Revised" License
11 stars 6 forks source link

Test suite failure #1

Closed snoyberg closed 5 years ago

snoyberg commented 9 years ago
Test suite SupervisorTests: RUNNING...
NOTICE: Branch Tests (Relying on Non-Guaranteed Message Order) Can Fail Intermittently
Supervisor Processes:
  Starting And Adding Children:
    Normal (Managed Process) Supervisor Start Stop: [OK]
    Specified By Closure:
      Add Child Without Starting: [OK]
      Start Previously Added Child: [OK]
      Start Unknown Child: [OK]
      Add Duplicate Child: [OK]
      Start Duplicate Child: [OK]
      Started Temporary Child Exits With Ignore: [OK]
      Configured Temporary Child Exits With Ignore: [OK]
      Start Bad Closure: [OK]
      Configured Bad Closure: [OK]
      Started Non-Temporary Child Exits With Ignore: [OK]
      Configured Non-Temporary Child Exits With Ignore: [OK]
    Specified By Delegate/Restarter:
      Add Child Without Starting (Chan): [OK]
      Start Previously Added Child: [OK]
      Start Unknown Child: [OK]
      Add Duplicate Child (Chan): [OK]
      Start Duplicate Child (Chan): [OK]
      Started Temporary Child Exits With Ignore (Chan): [OK]
      Started Non-Temporary Child Exits With Ignore (Chan): [OK]
  Stopping And Deleting Children:
    Delete Existing Child Fails: [OK]
    Delete Stopped Temporary Child (Doesn't Exist): [OK]
    Delete Stopped Child Succeeds: [OK]
    Restart Minus Dropped (Temp) Child: [OK]
    Sequential Shutdown Ordering: [OK]
  Stopping and Restarting Children:
    Permanent Children Always Restart (Closure): [OK]
    Permanent Children Always Restart (Chan): [OK]
    Temporary Children Never Restart (Closure): [OK]
    Temporary Children Never Restart (Chan): [OK]
    Transient Children Do Not Restart When Exiting Normally (Closure): [OK]
    Transient Children Do Not Restart When Exiting Normally (Chan): [OK]
    Transient Children Do Restart When Exiting Abnormally (Closure): [OK]
    Transient Children Do Restart When Exiting Abnormally (Chan): [OK]
    ExitShutdown Is Considered Normal (Closure): [OK]
    ExitShutdown Is Considered Normal (Chan): [OK]
    Intrinsic Children Do Restart When Exiting Abnormally (Closure): [OK]
    Intrinsic Children Do Restart When Exiting Abnormally (Chan): [OK]
    Intrinsic Children Cause Supervisor Exits When Exiting Normally (Closure): [OK]
    Intrinsic Children Cause Supervisor Exits When Exiting Normally (Chan): [OK]
    Explicit Restart Of Running Child Fails (Closure): [OK]
    Explicit Restart Of Running Child Fails (Chan): [OK]
    Explicit Restart Of Unknown Child Fails: [OK]
    Explicit Restart Whilst Child Restarting Fails (Closure): [OK]
    Explicit Restart Whilst Child Restarting Fails (Chan): [OK]
    Explicit Restart Stopped Child (Closure): [OK]
    Explicit Restart Stopped Child (Chan): [OK]
    Immediate Child Termination (Brutal Kill) (Closure): [OK]
    Immediate Child Termination (Brutal Kill) (Chan): [OK]
    Child Termination Exceeds Timeout/Delay (Becomes Brutal Kill): [OK]
    Child Termination Within Timeout/Delay: [OK]
  Branch Restarts:
    Restart All:
      Terminate Child Ignores Siblings: [OK]
      Restart All, Left To Right (Sequential) Restarts: [OK]
      Restart All, Right To Left (Sequential) Restarts: [OK]
      Restart All, Left To Right Stop, Left To Right Start: [Failed]
unexpected signal from pid://127.0.0.1:10501:0:930
      Restart All, Right To Left Stop, Right To Left Start: [OK]
      Restart All, Left To Right Stop, Reverse Start: [Failed]
unexpected signal from pid://127.0.0.1:10501:0:1093
      Restart All, Right To Left Stop, Reverse Start: [OK]
    Restart Left:
      Restart Left, Left To Right (Sequential) Restarts: [OK]
      Restart Left, Leftmost Child Dies: [OK]
      Restart Left, Left To Right Stop, Left To Right Start: [OK]
      Restart Left, Right To Left Stop, Right To Left Start: [OK]
      Restart Left, Left To Right Stop, Reverse Start: [OK]
      Restart Left, Right To Left Stop, Reverse Start: [OK]
    Restart Right:
      Restart Right, Left To Right (Sequential) Restarts: [OK]
      Restart Right, Rightmost Child Dies: [OK]
      Restart Right, Left To Right Stop, Left To Right Start: [Failed]

Expected: equalTo pid://127.0.0.1:10501:0:2854
     but: was pid://127.0.0.1:10501:0:2855
      Restart Right, Right To Left Stop, Right To Left Start: [OK]
      Restart Right, Left To Right Stop, Reverse Start: [Failed]

Expected: equalTo pid://127.0.0.1:10501:0:2972
     but: was pid://127.0.0.1:10501:0:2974
      Restart Right, Right To Left Stop, Reverse Start: [OK]
  Restart Intensity:
    Three Attempts Before Successful Restart: [OK]
    Permanent Child Exceeds Restart Limits: [OK]
  ToChildStart Link Setup:
    Both Local Process Instances Link Appropriately: [OK]

         Test Cases   Total       
 Passed  67           67          
 Failed  4            4           
 Total   71           71          
Test suite SupervisorTests: FAIL
Test suite logged to: /home/ubuntu/haskell/stackage/logs/stackage-nightly-2014-12-23/distributed-process-supervisor-0.1.1/test-run.out
hyperthunk commented 9 years ago

Interesting. I'll try to reproduce and trace the failures. There could be a timing issue here in the tests or it could be a bug, I'll spend some time going over the code.

hyperthunk commented 9 years ago

It's worth remembering the motive at the top through - the branch tests rely on message ordering that can't be guaranteed. I'd forgotten about that bit. I should probably try to find a way to rewrite the tests, perhaps using the management API which should see the process deaths in the correct order iirc. I'll look into that.

hyperthunk commented 9 years ago

Right, I see the problem and it's with the tests. The test code for most of the branch restarts uses a pre-configured supervisor which is started thus: sup <- Supervisor.start rs ParallelShutdown cs. As you might imagine, the ParallelShutdown option means we don't wait for the children to die in any particular order in the supervisor implementation. There are also tests that appear to expect a certain ordering which isn't guaranteed. I'm not going to have time to rewrite all of them today or tomorrow, but I am going to attempt to get it done this week.

hyperthunk commented 9 years ago

Hmn, I'm now convinced that the tests are not written properly. The restartAllWithLeftToRightRestarts is a prime example. It makes timing assumptions that just don't hold given the semantics. I think all the test cases in the suite that cover branch restarts need to be rewritten, which I'm making a start on now...

hyperthunk commented 9 years ago

Sorry about the delay @snoyberg, it's taking me a while to find time to rewrite the tests as I'm dealing with a number of other PRs and tickets. I have not forgotten about this though!

hyperthunk commented 5 years ago

@snoyberg - I've significantly refactored the tests, so as to remove the potential for races, using the distributed-process tracing layer to capture the order of events in disparate green threads instead of relying on the timing of monitor signals arriving (which isn't guaranteed by the runtime with respect to ordering). Hopefully supervisor will no longer crash, so I'm going to relax the upper bounds on dependencies a bit, and submit a PR to get it integrated back into stackage.