rvdbreemen / OTGW-firmware

A ESP8266 devkit firmware for the Nodoshop version of the Opentherm Gateway (OTGW)
MIT License
145 stars 34 forks source link

Implement `availibility` topic #125

Closed marklagendijk closed 2 years ago

marklagendijk commented 2 years ago

Currently HomeAssistant doesn't see any of the OTGW sensors etc after a restart of HomeAssistant. This is because HomeAssistant will check the MQTT broker when starting up, and any devices that are no longer are marked as 'Unavailable'.

The solution for this is the MQTT retain flag. When a MQTT message is sent with retain the broker will remember the message and send it to any new subscribers.

Note: I think the retain flag should be used both for the HomeAssistant discovery messages and the device state messages.

rvdbreemen commented 2 years ago

@marklagendijk the retain flag is used, but that's not enough. The firmware actually detects the reboot of HA. But sometimes it does not work, could you go into the settings of the firmware and change the setting for HA detection.

If that works for you, please let me know.

marklagendijk commented 2 years ago

@rvdbreemen I did have that feature turned on.

Why isn't using the retain flag enough? Do you know how other integrations solve this issue? I would expect that having the retain flag on all messages (both the autodiscovery ones, and the actual state ones) should be enough.

When I have time I will test which messages are currently retained.

rvdbreemen commented 2 years ago

@marklagendijk I don't know why the retain flag on the integration is not enough. But it what we have found so far, if you have a clue why then please tell me.

On the actual state ones, it's deliberate that there is no state retain because all values are transit values. Normally within seconds to at best a couple of minutes values recover for the state value.

If you have a better solution that works, then please share, do a PR and I will look at the changes suggested.

marklagendijk commented 2 years ago

@rvdbreemen I researched the topic a bit. I still don't know why just retaining the auto discovery messages does not work, however I think I found a way to solve it, and improve something else at the same time.

I'm afraid that it won't be feasible for me to create a PR for this. The changes themselves should be 'simple enough', but I don't feel up for it, because of unfamiliarity with this project / kind of project / programming language (C?).

rvdbreemen commented 2 years ago

Hi @marklagendijk

Thanks for checking out and suggesting a solution. The OTGW firmware actually does already do this, including the LTW.

Just go and look in the MQTTstuff.ino Line 244. That's where the last will topic is set.

Then the online status you can find on Line 265.

There is also code to detect if HA goes offline and comes back online.

So I am not sure why you see unavailable topics. Are you sure it's the availability topic that is causing the issue in the first place?

Have you checked you mqtt broker to see what's going on?

Hope to hear from you, Robert

marklagendijk commented 2 years ago

Hey Robert, Thanks for your response. Based on the code I am indeed surprised that it does not work, yet. My guess would be that there must be a bug somewhere.

When I get the time I will try to debug the issue using a MQTT client, by verifying all MQTT messages (both content and retention) during several scenarios. I can't promise yet when this will happen, hopefully within a week or two.

Mark

rvdbreemen commented 2 years ago

Hey Mark, Thanks for your help. Please just take your time, you could join my discord community for realtime chat. But if you prefer this slow chat, that's just fine with me.

Hopefully you can find what's going on, and if so, I will happily fix it. Robert

rvdbreemen commented 2 years ago

I fixed a bug in 0.9.5 that caused status to be a Boolean instead of Online/Offline.

So maybe you can verify if that fixed your reported issue.

marklagendijk commented 2 years ago

Given that the code was already doing the right things, it makes sense that it was something little like this. I'll upgrade to the latest firmware and report if the issue ever comes back, but I don't think it will.

rvdbreemen commented 2 years ago

@marklagendijk yes, I agree, but what happened that there was some logic that would detect the "device" going offline, and it would work with "true" and "false" instead of "online" or "offline".

I fixed that now, so now all LWT logic will use the right way of signalling the online status of the device.

marklagendijk commented 2 years ago

Yeah, these bugs are the hardest to find, because the code looks sensible, but there is a protocol-specific detail that is wrong.