zsmartsystems / com.zsmartsystems.zigbee

ZigBee Cluster Library Java framework supporting multiple dongles
Eclipse Public License 1.0
138 stars 87 forks source link

NPE while initializing transport layer #1416

Closed amitjoy closed 4 months ago

amitjoy commented 4 months ago

The following NPE has been noticed in the runtime which indicates that ZB transport layer hasn't been initialized properly.

java.lang.NullPointerException: null
    at com.zsmartsystems.zigbee.dongle.ember.EmberNcp.getVersion(getVersion:216)
    at com.zsmartsystems.zigbee.dongle.ember.ZigBeeDongleEzsp.initialiseEzspProtocol(initialiseEzspProtocol:1322)
    at com.zsmartsystems.zigbee.dongle.ember.ZigBeeDongleEzsp.initialize(initialize:432)
    at com.zsmartsystems.zigbee.ZigBeeNetworkManager.initialize(initialize:418)
    at com.qivicon.binding.zigbee.handler.ZigBeeCoordinatorHandler.initialiseZigBee(initialiseZigBee:526)
    at com.qivicon.binding.zigbee.handler.ZigBeeCoordinatorHandler.asyncInitialize(asyncInitialize:276)
    at com.qivicon.binding.zigbee.handler.ZigBeeCoordinatorHandler.lockedAsyncInitialize(lockedAsyncInitialize:257)
    at java.util.concurrent.Executors$RunnableAdapter.call(call:511)
    at java.util.concurrent.FutureTask.run(run:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(access$201:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(run:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(runWorker:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(run:624)
    at java.lang.Thread.run(run:750)

@cdjackson Would you please provide clarification as to the underlying reasons for this occurrence?

cdjackson commented 4 months ago

I can only assume that protocolHandler is null at this point which should not normally be the case. Please can you provide a debug log that provides some context of what is going on.

amitjoy commented 4 months ago

@cdjackson Thanks a lot for your quick reply. Unfortunately, I don't have any logs at all since it has been reported to our sentry endpoint as soon as it happened. But, we have never noticed any critical scenario earlier resulting from this though. Can we still do something for this?

amitjoy commented 4 months ago

@cdjackson I think it would make sense to introduce NPE invariants (preconditions) to denote the expectation. wdyt?

cdjackson commented 4 months ago

An NPE at his point shouldn't be possible which is why I wanted to know the context. I'm assuming that this is during initialisation, and if we look at the code we see the following -:

In initialiseEzspProtocol -:

        frameHandler.connect();

        EmberNcp ncp = getEmberNcp();

        // We MUST send the version command first.
        // Any failure to respond here indicates a failure of the ASH or EZSP layers to initialise
        ezspVersion = ncp.getVersion();
        if (ezspVersion == null) {
            logger.debug("EZSP Dongle: Version returned null. ASH/EZSP not initialised.");
            return false;
        }
    public EmberNcp getEmberNcp() {
        return new EmberNcp(frameHandler);
    }

Then in ncp.getVersion() -:

image

I've pasted this as an image so we can see the line numbers. The line where the exception occurred is line 216...

So, we can see above that frameHandler.connect(); is called just before getEmberNcp. getEmberNcp uses frameHandler to pass to the ncp, and this is locally called protocolHandler. So, for protocolHandler to be null, frameHandler would also be null, but it can't be null since it was used 2 lines before calling getVersion.

Maybe this isn't where your exception occurred? Without the log it's hard to know what is happening.

Maybe in the EmberNcp constructor we should check that protocolHandler is not null and throw an invalid argument exception there?

E.g. -:

    public EmberNcp(EzspProtocolHandler protocolHandler) {
        if (protocolHandler == null) {
            throw new InvallidArgumentException(....);
        }
        this.protocolHandler = protocolHandler;
    }

What do you think?

amitjoy commented 4 months ago

@cdjackson I was also thinking along the line. It would be better to introduce the invariant for our expectation of it to be not null. I will open the MR shortly then.

cdjackson commented 4 months ago

Thanks for the PR @amitjoy