VoiceManagerImplTest unstable

wborn commented 1 year ago

This test failed in a GHA build: https://github.com/openhab/openhab-core/actions/runs/4501198318

TEST org.openhab.core.voice.internal.VoiceManagerImplTest#registerDialog() <<< ERROR: 
Expected: is "Recognized text"
     but: was ""
java.lang.AssertionError: 
Expected: is "Recognized text"
     but: was ""
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
    at org.openhab.core.voice.internal.VoiceManagerImplTest.registerDialog(VoiceManagerImplTest.java:624)

andrewfg commented 1 year ago

@wborn as we discovered this issue during debugging of the core.io.net tests, I felt compelled to look into it. Indeed I may look into your other 'unstable test' issues as well..

I guess that this particular issue is simply due to the two Thread.sleep() calls. Perhaps we should implement a Future or Completable of some sort? Or at least make the sleep delays longer?

https://github.com/openhab/openhab-core/blob/8494bf1199ca1341ca471808cc2f826eb3b447ae/itests/org.openhab.core.voice.tests/src/main/java/org/openhab/core/voice/internal/VoiceManagerImplTest.java#L611-L616

J-N-K commented 1 year ago

Increasing sleeps and Timeouts is what we did in the past. But I believe this is not the proper solution. If so many tests start to fail in a short time there is either something wrong in our code or in the test design.

andrewfg commented 1 year ago

either something wrong in our code or in the test design.

Agreed. And from looking at the other unstable tests, it looks like timing is the common denominator.

More importantly, if tests fail due to time sensitive code, we should expect that OH runtime installations have the same sensitivities. So perhaps we should look at how the runtime handles such errors (i.e. its 'plan b') and include a respective 'plan b test' in the test suites?

J-N-K commented 1 year ago

I'm not sure if all are real issues at runtime. Regarding the test here: I believe that the issue is that the dialog registration is delayed by 5s (or more if there are additional calls) and after that the dialog is registered and started. So there is indeed less than 1s for the whole processing which may be too short.

An idea would be to increase the log level to DEBUG and see if the processing in DialogProcessor that is needed to fill the fields in hliStub is taking place or if we fail before that. Since the logging is also consuming some time the test may fail more often with DEBUG logging enabled.

wborn commented 1 year ago

Perhaps a generic approach can be used to make tests more stable that depend on tasks being executed by an ExecutorService or ScheduledExecutorService. E.g. we could use executors that synchronously execute tasks... or have some mechanism that returns a Future on which you can wait so your tests knows that all queued executor tasks have been executed.

andrewfg commented 1 year ago

^ Some more thoughts on this..

The purpose of a test suite is to confirm the correct functionality of a piece of OH core code (call it the 'Code Under Test' CUT). Yet sometimes testing the CUT often requires checking its interaction with some artificial Test Infrastructure Code (TIC) such as a Mock, or a simulated server, etc.

This has the following consequences on the design of test assertions..

Test assertions SHALL really check the CUT -- and NOT the TIC. If an assertion checks the TIC, it should probably be eliminated from the test suite.
Test assertions SHALL check TWO possible outcomes -- namely a) assert that the CUT produces a correct response when the TIC does NOT yield an error, AND b) assert that the CUT also produces a (different) correct response when the TIC DOES yield an error.

So when testing the interaction between CUT and TIC where the TIC involves time sensitivity, we should do the following..

Run each such CUT => TIC interaction TWO times: 1) where the TIC time sensitivity error WILL definitely be triggered, and confirm that the CUT produces the expected error code / exception type / dummy result / null result. AND also 2) where the TIC time sensitivity is definitely NOT triggered, and confirm that the CUT produces the correct valid result.
If the TIC time sensitivity is (say) a wait delay / thread sleep period / idle timeout / etc. then the duration of that delay/sleep/timeout shall NOT be a hard coded attribute of the TIC !! .. rather it SHALL be an attribute of, (or determined by), the CUT!!.

andrewfg commented 1 year ago

^ And/or .. if a test suite comprises a CUT being tested against a simulated TIC web server -- to name one example -- the test assertions should check ALL of the following cases: Socket Connect Time Out (TIC offline), Socket Connect Refused (TIC refuses), Socket Read Timeout (TIC not handling request), HTTP 40x (CUT bad request), AND last-but-not-least HTTP 200 OK.

wborn commented 1 year ago

There are still issues with these tests: https://ci.openhab.org/job/PR-openHAB-Core/6164/

TEST org.openhab.core.voice.internal.VoiceManagerImplTest#startDialogWithoutPassingAnyParameters() <<< ERROR: 
Expected: is "Recognized text"
     but: was ""
java.lang.AssertionError: 
Expected: is "Recognized text"
     but: was ""
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
    at org.openhab.core.voice.internal.VoiceManagerImplTest.startDialogWithoutPassingAnyParameters(VoiceManagerImplTest.java:427)

openhab / openhab-core

VoiceManagerImplTest unstable #3477