openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.88k stars 3.59k forks source link

[mqtt.homie] Did not receive mandatory topic value: homie/xxx/$name #7252

Closed chipec84 closed 4 years ago

chipec84 commented 4 years ago

Hello,

PROBLEM homie device is showing in "inbox", after validation of the thing I get error message "Did not receive mandatory topic value: homie/relay1/$name". Thing appears in things list but channels are not available. (that seams normal as it stops reading whats next on "$name" error) after rebooting the homie device channels are available and everything is working fine. In other words my issue is that the homie device needs to be rebooted for OH item configuration.

CONFIG I am running OH 2.5.3 and mqtt-binding 2.5.3. Please note that I do not have this issue with OH 2.4.0 and mqtt-binding 2.4.0

DATA my published topics from homie device: verification ok with online homie verificator.

"homie/relay1/$homie", "3.0", qos 1, retained true "homie/relay1/$name", "relay1", qos 1, retained true "homie/relay1/$nodes", "c1,c2", qos 1, retained true "homie/relay1/$state", "init", qos 1, retained true

"homie/relay1/c1/$name", "c1", qos 1, retained true "homie/relay1/c1/$type", "switch", qos 1, retained true "homie/relay1/c1/$properties", "power", qos 1, retained true

"homie/relay1/c1/power/$name", "c1_power", qos 1, retained true "homie/relay1/c1/power/$settable", "true", qos 1, retained true "homie/relay1/c1/power/$datatype", "boolean", qos 1, retained true

"homie/relay1/c2/$name", "c2", qos 1, retained true "homie/relay1/c2/$type", "switch", qos 1, retained true "homie/relay1/c2/$properties", "power", qos 1, retained true

"homie/relay1/c2/power/$name", "c2_power", qos 1, retained true "homie/relay1/c2/power/$settable", "true", qos 1, retained true "homie/relay1/c2/power/$datatype", "boolean", qos 1, retained true

...some code for init then

"homie/relay1/c1/power", false, qos 1, retained true "homie/relay1/c2/power", false, qos 1, retained true "homie/relay1/$state", "ready", qos 1, retained true

LOGS 'mqtt:homie300:be9b4a80:relay1' changed from INITIALIZING to OFFLINE (COMMUNICATION_ERROR): java.lang.Exception: Did not receive mandatory topic value: homie/relay1/$name

andvikt commented 4 years ago

It seems OH subscribes to homie topics with QoS 0 (i see it in my broker log) but broker will deliver those topics very fast just all of them at once, so plenty of them won't be delivered actually. I tried slow down sending topics from my homie implementation and it helped, but thaat not the caase I think. I think, OH must subscribe to homie topics with QoS 1.

openha commented 4 years ago

Just to add some additional information with regards to the error in the issue title. Here are the relevant lines from the mosquitto log:

Add a new MQTT broker, Thing is discovered in the Inbox

1585662631: Received SUBSCRIBE from d29c0b95-6a52-426f-9180-01b36277c074 1585662631: +/+/$homie (QoS 1) 1585662631: d29c0b95-6a52-426f-9180-01b36277c074 1 +/+/$homie 1585662631: Sending SUBACK to d29c0b95-6a52-426f-9180-01b36277c074 1585662631: Sending PUBLISH to d29c0b95-6a52-426f-9180-01b36277c074 (d0, q0, r1, m0, 'homie/GFBlinds/$homie', ... (3 bytes)) 1585662631: Received SUBSCRIBE from d29c0b95-6a52-426f-9180-01b36277c074 1585662631: homie/GFBlinds/$name (QoS 1) 1585662631: d29c0b95-6a52-426f-9180-01b36277c074 1 homie/GFBlinds/$name 1585662631: Sending SUBACK to d29c0b95-6a52-426f-9180-01b36277c074 1585662631: Sending PUBLISH to d29c0b95-6a52-426f-9180-01b36277c074 (d0, q0, r1, m0, 'homie/GFBlinds/$name', ... (12 bytes))

Add the device from the inbox. Since the subscription and publish to openhab was already done, the value is not resent. I would guess the binding expects to receive the value.

1585662647: Received SUBSCRIBE from d29c0b95-6a52-426f-9180-01b36277c074 1585662647: homie/GFBlinds/$homie (QoS 0) 1585662647: d29c0b95-6a52-426f-9180-01b36277c074 0 homie/GFBlinds/$homie 1585662647: Sending SUBACK to d29c0b95-6a52-426f-9180-01b36277c074 1585662647: Sending PUBLISH to d29c0b95-6a52-426f-9180-01b36277c074 (d0, q0, r1, m0, 'homie/GFBlinds/$homie', ... (3 bytes)) 1585662647: Received SUBSCRIBE from d29c0b95-6a52-426f-9180-01b36277c074 1585662647: homie/GFBlinds/$state (QoS 0) 1585662647: d29c0b95-6a52-426f-9180-01b36277c074 0 homie/GFBlinds/$state 1585662647: Sending SUBACK to d29c0b95-6a52-426f-9180-01b36277c074 1585662647: Sending PUBLISH to d29c0b95-6a52-426f-9180-01b36277c074 (d0, q0, r1, m0, 'homie/GFBlinds/$state', ... (5 bytes)) 1585662647: Received SUBSCRIBE from d29c0b95-6a52-426f-9180-01b36277c074 1585662647: homie/GFBlinds/$nodes (QoS 0) 1585662647: d29c0b95-6a52-426f-9180-01b36277c074 0 homie/GFBlinds/$nodes 1585662647: Sending SUBACK to d29c0b95-6a52-426f-9180-01b36277c074 1585662647: Sending PUBLISH to d29c0b95-6a52-426f-9180-01b36277c074 (d0, q0, r1, m0, 'homie/GFBlinds/$nodes', ... (39 bytes))

Re:second comment: In addition, I have also observed that not all of the properties are always added or are incomplete (e.g. name missing). If the device re-publishes the topics, this is usually populated correctly. Most likely this is not related issue to the original issue posted in the first comment.

jochen314 commented 4 years ago

I know we do have a basic problem with the mqtt protocol. Wenn the client receives a message, the 'retain' flag is only set, if the message was trasmitted to the client after a reconnect as a retained message. The first transmission of the message does not have the retained flag. We have only one connection from openhab to the broker. The discovery will have subscribe to a $name topic and get id delivered. But without the 'retained' indication. After adding the component another subscription is issued for $name. But for the broker the subscription is from the same client and it will not retransmit the retained messages. Therefor they will never arrive. I currently only see the solution to have induvidual connections to the broker. Which would mean a lot of code change and maintaining connection data (different client ids) in many places.....

chipec84 commented 4 years ago

It is working fine in v2.4.0, what is different in the code for v2.5.3?

openha commented 4 years ago

I currently only see the solution to have induvidual connections to the broker. Which would mean a lot of code change and maintaining connection data (different client ids) in many places.....

Would unsubscribing from the $name topic immediately after discovery work?

jochen314 commented 4 years ago

@chipec84 openhab-core changed the mqtt client library @openha we do unsubscribe. my guess is, that there is more context stored on the connection.....

andvikt commented 4 years ago

After adding the component another subscription is issued for $name. But for the broker the subscription is from the same client and it will not retransmit the retained messages. Therefor they will never arrive.

But how does it know the name of the thing when it is discovered? ) It seems that the most easy way is to have a homie namespace as a config parameter of a broker and to subscribe to all messages in it.

For ex, we have device with main topic myhome/mydevice/$homie. So why dont just subscribe to all myhome/# topics? With that approach we can store all the messages from myhome/# in some db and process them later without any need in retained resendings from the broker.

jochen314 commented 4 years ago

With that approach we can store all the messages from myhome/# in some db and process them later without any need in retained resendings from the broker. But then we would treat all messages as retained, which is also wrong....

openha commented 4 years ago

@chipec84 openhab-core changed the mqtt client library @openha we do unsubscribe. my guess is, that there is more context stored on the connection.....

I rechecked and there is no unsubscribe entry for $name in the mosquitto log after discovery. I could find the unsubscribe code in the source, but looking at the mqtt logs, it seems it is not called.

RayBa82 commented 4 years ago

I know we do have a basic problem with the mqtt protocol. Wenn the client receives a message, the 'retain' flag is only set, if the message was trasmitted to the client after a reconnect as a retained message. The first transmission of the message does not have the retained flag. We have only one connection from openhab to the broker. The discovery will have subscribe to a $name topic and get id delivered. But without the 'retained' indication. After adding the component another subscription is issued for $name. But for the broker the subscription is from the same client and it will not retransmit the retained messages. Therefor they will never arrive. I currently only see the solution to have induvidual connections to the broker. Which would mean a lot of code change and maintaining connection data (different client ids) in many places.....

That does absolutely not sound logical when the same approach worked with another mqtt client. All clients should handle the protocol in the same way or the client is just broken. Does openhab use a clean session when it connects to the broker?

chipec84 commented 4 years ago

hello, Anything new on this issue?

labodj commented 4 years ago

I have the same issue here, 6 devices with 10 channels each, can't configure them.

sebastianohl commented 4 years ago

Here the same. It is working if I reset my device but stops with the failure of missing name after some time. It got a lot better after I put a delay between sending the individual mqtt messages on the homie device(but still occurs sometimes) . However, this is only possible because I wrote the firmware myself. If this were commercial devices, this would not be an option.

chipec84 commented 4 years ago

What delay did you put in between messages?

mjcumming commented 4 years ago

@sebastianohl a delay shouldn't be needed.

I believe the error is as @jochen314 describes. When the OpenHAB MQTT Client connects to the broker, it gets all existing MQTT Messages marked retain and those messages are marked with the retain flag True. Messages after the OpenHAB Client connects are received with the retain flag False even if it is set to True. At least this my understanding of [MQTT-3.3.1-9]

When a Homie device is added from the Inbox, why not create a new MQTT Client connection to the broker, receive all of the retained messages, create the OpenHAB Thing and then close the connection once the device is created? Use the original Client to handle all other MQTT message.

This would be easier than maintaining a database of MQTT Homie messages and searching it for messages related to a Homie device to get the settings.

sebastianohl commented 4 years ago

For sure putting a delay between the messages and give openhab time to process it is no solution. However, it could be a dirty fix for users who generate their own firmware.

I put 500ms between my MQTT messages and restart my EPS32 devices every 24h.

baku104788 commented 4 years ago

For sure putting a delay between the messages and give openhab time to process it is no solution. However, it could be a dirty fix for users who generate their own firmware.

I put 500ms between my MQTT messages and restart my EPS32 devices every 24h.

Another easy 'dirty' fix is to simply republish the $name value on a regular interval as if it is a heartbeat. Once OH gets the $name value again everything is good.

labodj commented 4 years ago

@baku104788 I think this approach doesn't work with OpenHab 2.5.5 I intercept /homie/deviceName/$stats/uptime with node-red, for every message I receive in that topic i republish a message with payload=deviceName in /homie/deviceName/$name (every 65 seconds)

Openhab is ignoring it, it always shows java.lang.Exception: Did not receive mandatory topic value: homie/deviceName/$name for some device.

Anyway even if the error spawns if the device is cofigured just one time I can control it, but IMHO this is a big flaw in OpenHab MQTT implementation...

mjcumming commented 4 years ago

On 2.5.4 I can confirm that republishing $name works as @baku104788 described.

Interestingly, after publishing $name again BUT no other Homie messages ie nodes or properties OH correctly creates the Thing for the device and the correct Channels.

So, this doesn't make any sense to blame this on the retain flag and MQTT. I do not know Java but looking through the code there is a wait for an MQTT message with $name in it. Rather than waiting, the discovery code should check if it already has a $name message from that device and only wait if it does not.

bodiroga commented 4 years ago

OMG, I think that I have figured out what the problem was! Sh*t, it was so simple...

PR on the way! :+1:

sebastianohl commented 4 years ago

Great! Looking forward to test it

bodiroga commented 4 years ago

Pull request accepted, awesome! Let's see if we can get a stable Homie (and MQTT in general) implementation in openHAB for 2.5.6, this project deserves it! :+1:

Alpha200 commented 4 years ago

I've upgraded OpenHAB to 2.5.6 today, but I still see the "Did not receive mandatory topic value" for some of my Things. I've checked if the topics are missing via mosquitto_sub but they are definitely there. Is there anything left to do, after upgrading to 2.5.6?

labodj commented 4 years ago

It's happening to me too

mjcumming commented 4 years ago

@Alpha200 and @labodj what devices are you connecting with (i.e. which Homie implementation)?

labodj commented 4 years ago

This one I have 6 devices, with up to 10 homie nodes each (459 total topics), some of them are recognized, some other not. It's a random behavior, if I disable and re-enable MQTT broker inside openhab some random device it's well configured, some other give me "Did not receive mandatory topic value" error. Device that are erroring out change randomly turning on and off openhab MQTT broker.

The needed topics ($name in this case) are all present and retained.

RayBa82 commented 4 years ago

I have the same issues with 15 homie devices and they all have a lot of properties, some about 20-30.

Alpha200 commented 4 years ago

I'm using several self implemented Homie Devices. They are all based on Homie 4.0.0, but I think that should not make any difference. Currently I'm using VerneMQ as my MQTT broker, but I had the issues with Mosquitto as well. I'm sure that the implementations are 100% correct.

I've exactly the same issues as @labodj has stated above.

mjcumming commented 4 years ago

@bodiroga I am wondering if a better solution would be to have OH create a new connection to the broker using the $home/deviceid to subscribe when OH sees a $state ready message. That way OH will receive all topics on the broker related to that device. OH could then close that connection and use the existing connection for publishing/subscribing?

bodiroga commented 4 years ago

Oh, boy, this is a nightmare... :cry:

Let's see if I can have a look at it over the following days, I'm so sorry guys! @J-N-K, @cpmeister or @cweitkamp, can you reopen this issue, please?

bodiroga commented 4 years ago

@mjcumming, have you been enable to see the issue in the latest openHAB release? I have started a fresh openHAB container (2.5.6) and I have tried a Python script based on your Homie4 implementation, using the following code:

import time

from homie.device_dimmer import Device_Dimmer

mqtt_settings = {
    'MQTT_BROKER' : 'xxx.xxx.xxx.xxx',
    'MQTT_PORT' : 1883
}

class My_Dimmer(Device_Dimmer):

    def set_dimmer(self,percent):
        print('Received MQTT message to set the dimmer to {}. Must replace this method'.format(percent))
        super().set_dimmer(percent)

class DimmerFactory(object):

    def __init__(self):
        self.dimmers = []

    def create_dimmers(self, number):
        for n in range(number):
            self.dimmers.append(My_Dimmer(name='Test Dimmer {}'.format(n+1), mqtt_settings=mqtt_settings))

    def update_dimmers(self, value):
        for dimmer in self.dimmers:
            dimmer.update_dimmer(value)

if __name__ == "__main__":
    try:

        d_factory = DimmerFactory()
        d_factory.create_dimmers(20)

        while True:
            d_factory.update_dimmers(0)
            time.sleep(2)
            d_factory.update_dimmers(50)
            time.sleep(2)
            d_factory.update_dimmers(100)
            time.sleep(2)

    except (KeyboardInterrupt, SystemExit):
        print("Quitting.")  

After importing the 20 homie devices, all of them are online and even after multiple restarts, I can't see the "Did not receive the mandatory topic value: homie/xxx/$name" message. Perhaps this dimmer devices are very simple? As far as I know, they have a single node with just one property: dimmer.

Is any of you able to create a test environment where the error is reproducible using @mjcumming's library? That would be really helpful :+1: Also, some questions for you (@Alpha200, @labodj, @RayBa82):

Thanks!

mjcumming commented 4 years ago

@bodiroga with your fix, I do not see the error very often. It only happens on OH startup. Start up the above script and then start OH. That should reproduce it.

labodj commented 4 years ago

@bodiroga

RayBa82 commented 4 years ago

I am running openHAB on a Intel NUC with i5 processor in a docker environment. I test it always on a clean openhab installation. I deploy my configuration with ansible so this is done in a minute :-)

mjcumming commented 4 years ago

@labodj do you only see this error on OpenHAB startup or is it on your Homie device startup?

labodj commented 4 years ago

@mjcumming my homie devices are always up and should never reboot unless OTA firmware update or power outage, I notice the problem as soon as I enter openHab, restarted or not.

I'm uploading a video of the problem.

labodj commented 4 years ago

@mjcumming

Problem example video

cweitkamp commented 4 years ago

can you reopen this issue, please?

Done.

Alpha200 commented 4 years ago

I'm using openHAB in a Docker environment as well.

Currently it is only one device that has this error sometimes. It has 7 nodes with one property each. All other nodes have fewer properties an nodes.

I've not tried to delete and an readd the devices. Disabling and enabling the Things has no effect though.

The problem occurs only on restarts for me as well. This is especially painful, because I'm using a continuous deployment setup that recreates the Docker container every night to keep it up to date.

mjcumming commented 4 years ago

@labodj would you confirm which OH version you are on?

labodj commented 4 years ago

@mjcumming 2.5.6

mjcumming commented 4 years ago

I'm using openHAB in a Docker environment as well.

Currently it is only one device that has this error sometimes. It has 7 nodes with one property each. All other nodes have fewer properties an nodes.

I've not tried to delete and an readd the devices. Disabling and enabling the Things has no effect though.

The problem occurs only on restarts for me as well. This is especially painful, because I'm using a continuous deployment setup that recreates the Docker container every night to keep it up to date.

@Alpha200, the "thing" that @labodj is talking about is the MQTT broker thing. Not the actual Homie device thing.

For those with the error, need to make sure your are on the 2.5.6 release of OH.

mjcumming commented 4 years ago

@labodj if you restart the Homie device(s), this brings them back online?

Alpha200 commented 4 years ago

@Alpha200, the "thing" that @labodj is talking about is the MQTT broker thing. Not the actual Homie device thing.

I tried it with both, the broker and the device things, but for me, only a full restart of openHAB resolves it eventually.

For those with the error, need to make sure your are on the 2.5.6 release of OH.

As I said, I'm using the 2.5.6 tag of the official Docker image and the UI overview shows "openHAB 2.5.6 Release Build".

On 2.5.4, a restart of the devices did the trick, but now on 2.5.6 I restarted the problematic device several times, but only after a restart of openHAB it worked. Once the devices are are loaded successfully, they will work without problems.

RayBa82 commented 4 years ago

For me it is always the same.

In a clean openhab install i need two starts of the homie devices.

  1. Start mosquitto
  2. Start openhab
  3. start homie devices -> Not all homie devices are recognized with the "did not receive..." message
  4. stop homie devices
  5. start homie devices -> All devices are online

I did not check the behaviour with only restarting openhab in 2.5.6 but in 2.5.5 it was similar, with the difference that it always needed only one start of the homie devices to get them online.

labodj commented 4 years ago

@labodj if you restart the Homie device(s), this brings them back online?

Not automatically, not always, it's random, I made a new video, it will explain the situation

https://youtu.be/qtkfyHEjpBo

mjcumming commented 4 years ago

@labodj the video really helped understand.

@bodiroga when enabling and disabling the MQTT Broker thing I can cause the following dump in the log

`2020-06-25 10:33:47.905 [WARN ] [.transport.mqtt.MqttBrokerConnection] - Failed subscribing to topic homie/switch-2273856/$state

java.util.concurrent.CompletionException: java.util.NoSuchElementException

at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_252]

at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_252]

at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_252]

at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ~[?:1.8.0_252]

at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_252]

at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_252]

at com.hivemq.client.internal.mqtt.mqtt3.Mqtt3AsyncClientView.lambda$handleSubAck$0(Mqtt3AsyncClientView.java:64) ~[?:?]

at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) [?:1.8.0_252]

at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) [?:1.8.0_252]

at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) [?:1.8.0_252]

at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) [?:1.8.0_252]

at com.hivemq.client.rx.FlowableWithSingle$SingleFutureSubscriber.onComplete(FlowableWithSingle.java:385) [bundleFile:?]

at com.hivemq.client.internal.rx.operators.FlowableWithSingleCombine$SplitSubscriber.onComplete(FlowableWithSingleCombine.java:183) [bundleFile:?]

at io.reactivex.internal.operators.flowable.FlowableObserveOn$BaseObserveOnSubscriber.checkTerminated(FlowableObserveOn.java:199) [bundleFile:?]

at io.reactivex.internal.operators.flowable.FlowableObserveOn$ObserveOnConditionalSubscriber.runAsync(FlowableObserveOn.java:641) [bundleFile:?]

at io.reactivex.internal.operators.flowable.FlowableObserveOn$BaseObserveOnSubscriber.run(FlowableObserveOn.java:176) [bundleFile:?]

at io.reactivex.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:66) [bundleFile:?]

at io.reactivex.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:57) [bundleFile:?]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_252]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_252]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]`
bodiroga commented 4 years ago

Thanks for all the information guys, I will open the Eclipse IDE and try to found the cause of the error :wink:

bodiroga commented 4 years ago

Guys, one question before I leave this for today:

The Communication Error is always about the $name topic or do you also see thing status such as: Status: OFFLINE - COMMUNICATION_ERROR Did not receive all required topics?

From what I have seen, it seems that we have 2 different issues, and the root cause of one of them is probably https://github.com/openhab/openhab-addons/pull/7760/files . I was too optimistic with the 1000 milliseconds timeout :facepalm: But that short timeout will be responsible for the "Did not receive all required topics" error, not the one about the $name topic.

I have seen that type of error in my setup (with a big -423 topics- 'virtual' homie device created with @mjcumming's library), but have you? Or are all your problems related to $name?

Thanks for your help and best regards,

Aitor

Alpha200 commented 4 years ago

Yes, I have only seen error messages related to the $name topic.