seL4 / sel4test

Test suite for seL4.
http://sel4.systems
Other
24 stars 60 forks source link

SCHED0021: allow for simulation tests again #106

Closed lsf37 closed 10 months ago

lsf37 commented 10 months ago

Since e7d9607a0bd reduced the chance of hitting the race condition, this test might be suitable for simulation again.

lsf37 commented 10 months ago

Please don't merge yet, I'd like to run the full simulation suite a few times to see if it's still flaky there (it may well be, I don't think it failed before purely because of this specific race in the code).

lsf37 commented 10 months ago

Unfortunately not yet:

<testcase classname="sel4test" name="SCHED0021">

  Running test SCHED0021 (Test for pre-emption during running of many threads with equal prio)

        <error>Check duration(58006150) < ((((2 * 5 * (1000UL * 1000UL)) * (4 + 1)) * (102)) / (100))(51000000) failed. at line 1637 of file /github/workspace/projects/sel4test/apps/sel4test-tests/src/tests/scheduler.c</error>

  Test SCHED0021 failed

        <failure type="failure">result == SUCCESS at line 296 of file /github/workspace/projects/sel4test/apps/sel4test-driver/src/testtypes.c</failure>

        <error>result == SUCCESS at line 217 of file /github/workspace/projects/sel4test/apps/sel4test-driver/src/main.c</error>

    </testcase>

It takes a while for this to trigger (3 full runs of the entire simulation suite, which is 38*3=114 runs of SCHED0021), but that's still too often.

Indanz commented 10 months ago

It depends on how time is simulated. If seL4 sees the real time, none of the timing sensitive tests will be reliable, because they are affected by host OS scheduling and load.

lsf37 commented 10 months ago

Would it make sense to try something like making time slices much longer under simulation to even out effects of the host? It'll never be fully reliable, I guess, so I don't know if we'd be getting much value out of it if we just reduce the probability of a fluke by a lot.

Indanz commented 10 months ago

Would it make sense to try something like making time slices much longer under simulation to even out effects of the host?

It would, but doing that may break some tests too.

It'll never be fully reliable, I guess, so I don't know if we'd be getting much value out of it if we just reduce the probability of a fluke by a lot.

I don't think it's worth the effort. You want reliable timing tests to find timing problems. Having fluky tests is in some way worse than not having them at all, as it reduces the confidence in the tests and will cause real bugs to be ignored.

lsf37 commented 10 months ago

I don't think it's worth the effort. You want reliable timing tests to find timing problems. Having fluky tests is in some way worse than not having them at all, as it reduces the confidence in the tests and will cause real bugs to be ignored.

Ok, agreed. Let's close this and leave the test out of the simulation setting.