openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.86k stars 3.57k forks source link

[modbus] ModbusDataHandlerTest unstable #15683

Open wborn opened 10 months ago

wborn commented 10 months ago

This test failed in: https://github.com/openhab/openhab-addons/actions/runs/6377170852/job/17305335904

TEST org.openhab.binding.modbus.tests.ModbusDataHandlerTest#testReadValueTypeIllegal() <<< ERROR: 
Expected: is not null
     but: was null
java.lang.AssertionError: 
Expected: is not null
     but: was null
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
    at org.openhab.binding.modbus.tests.AbstractModbusOSGiTest.lambda$1(AbstractModbusOSGiTest.java:223)
    at org.openhab.core.test.java.JavaTest.waitForAssert(JavaTest.java:245)
    at org.openhab.core.test.java.JavaTest.waitForAssert(JavaTest.java:213)
    at org.openhab.core.test.java.JavaTest.waitForAssert(JavaTest.java:166)
    at org.openhab.binding.modbus.tests.AbstractModbusOSGiTest.linkItem(AbstractModbusOSGiTest.java:223)
    at org.openhab.binding.modbus.tests.ModbusDataHandlerTest.createDataHandler(ModbusDataHandlerTest.java:268)
    at org.openhab.binding.modbus.tests.ModbusDataHandlerTest.createDataHandler(ModbusDataHandlerTest.java:230)
    at org.openhab.binding.modbus.tests.ModbusDataHandlerTest.testInitGeneric(ModbusDataHandlerTest.java:1036)
    at org.openhab.binding.modbus.tests.ModbusDataHandlerTest.testInitGeneric(ModbusDataHandlerTest.java:999)
    at org.openhab.binding.modbus.tests.ModbusDataHandlerTest.testReadValueTypeIllegal(ModbusDataHandlerTest.java:1335)

Maybe you can look into it @ssalonen?

ssalonen commented 10 months ago

Thanks for the ping

The code fails when asserting that link that was just added, is actually available from ItemChannelLinkProvider. See below

This would imply that default timeout of 10s is not enough.

Should I simply increase the timeout? Then again, the pattern here is very typical I would assume. Has there been some other regression or change that simply surfaces here?

https://github.com/openhab/openhab-addons/blob/main/itests/org.openhab.binding.modbus.tests/src/main/java/org/openhab/binding/modbus/tests/AbstractModbusOSGiTest.java

ItemChannelLink link = new ItemChannelLink(itemName, channelUID);
        assertThat(addedLinks.contains(link), not(equalTo(true)));
        itemChannelLinkProvider.add(link);
        waitForAssert(() -> assertThat(itemChannelLinkRegistry.get(AbstractLink.getIDFor(itemName, channelUID)),
                is(notNullValue())));
wborn commented 10 months ago

I haven't studied this particular code so I do not know if increasing the timeout will help. Sometimes I also run into tests failing because some service asynchronously subscribes itself as listener to some kind of event. In that case you want to make sure the service is subscribed to the events before executing the code that sends the event.

ssalonen commented 10 months ago

The implementation seems to be very straightforward in the test, I am having hard time finding the root cause.

ManagedItemChannelLinkProvider itemChannelLinkProvider is called first with .add() and then we check the link exists with .get().

ManagedItemChannelLinkProvider::add is actually implemented in AbstractManagedProvider::add which is basically shallow wrapper to put to underlying Storage. Similarly with get

The behaviour does not even seem to care whether item exists.


Only thing I can think of is some sort of race condition if the previous test's @AfterEach AbstractModbusOSGiTest.tearDownAbstractModbusOSGiTest is run when the next test is already executing. I presume the channel links are tied into one global instance ManagedItemChannelLinkProvider... if the tearDown is in the process of clearing links we could expect this kind of failure.

Do you know @wborn if this is how the tests are run in CI?

ssalonen commented 10 months ago

Quick seach is showing that actually no other itest is tearing down the created links... supporting the speculation above

Furthermore, quick comparison to ntp.tests yielded more defensive approach with links (pay note of the comment):

 itemRegistry.add(testItem);

        // Wait for the item , linked to the NTP thing to be added to the
        // ManagedThingProvider.
        final ManagedItemChannelLinkProvider itemChannelLinkProvider = waitForAssert(() -> {
            final ManagedItemChannelLinkProvider tmp = getService(ManagedItemChannelLinkProvider.class);
            assertNotNull(tmp);
            return tmp;
        });
        itemChannelLinkProvider.add(new ItemChannelLink(TEST_ITEM_NAME, channelUID));

Based on comment, maybe we should be waiting for item to be added still? I did not find such correlation in code, as analyzed above

ssalonen commented 10 months ago

~Answering myself, tests are run in parallel in github actions: https://github.com/openhab/openhab-addons/blob/729a19a8da9c6100a7caa8a8670a989b0d2fa04c/.github/scripts/maven-build#L5C1-L5C1~ No that was just instructing maven to execute using multiple cores, not necessarily running junit in parallel mode...

As commented above, tear down after the test is inherently non-thread safe, and should be fixed.

Will publish a fix

ssalonen commented 10 months ago

Hmm @wborn I think I might have misread the CI build script, it was calling mvn with -T parameter but that does not tell anything but JUnit test execution.

I did not find anything on commanding JUnit to execute parallel. Default would be single thread non-parallel and non-concurrent.

So there goes my fix-theory...