quazzie / tellstick-plugin-mqtt-hass

Plugin for tellstick, connect to homeassistant via mqtt with autodiscovery.
36 stars 7 forks source link

Does not re-connect when MQTT broker restarts #30

Open crashmatt opened 2 years ago

crashmatt commented 2 years ago

Broker connection is broken after a restart.

Tested reconnect by entering configuration, touching a field and then saving status.
Touch may not be required. This is not tested. Reboot may make a reconnect. This is not tested.

crashmatt commented 2 years ago

I suspect a lua script might act as a watchdog for this service.

class Client(plugin has methods that seem like they should be exposed to lua. There is no method I can find to plainly expose the connection state. Detecting if it is connected may be difficult.

There is no mechanism I can find in client.py that watches over the connection state.

I would make adjustments to the plugin myself but I have had no success building a good build environment for these plugins.

quazzie commented 2 years ago

Uhm lua ? You mean we should add a separate lua plugin to watch this ? There should not be a need for a watchdog for the mqtt connection, the paho client should reconnect by itself according to the documentation. But i have just noticed this problem myself, if the connection is lost it does not reconnect. I'll have a look what i can do when i get some free time.

crashmatt commented 2 years ago

Yes. Run a small lua script on a timer to check for connection and restart if required.

This lua script interacts with your plugin to send the mqtt debug message.

The only flaw in this plan is that Client does not have the correct methods.

-- hass mqtt plugin has to be installed local mqtt = nil

function onInit() mqtt = require 'HASSMQTT.Client' if mqtt == nil then print "No mqtt client" else print "mqtt client ok" end end

function onDeviceStateChanged(device, state, stateValue) dev_name = device:name() if dev_name == "ClockTick" then print "sent debug message" mqtt:_debug("bahhh") end end

crashmatt commented 2 years ago

but fixing the behavior in paho would be better

crashmatt commented 2 years ago

A lua script fix while paho is broken.

  1. A fake Nexa switch device is added as "HassTellstickMQTTWatchdog"
  2. A Hass automation sets this device through MQTT once every 10 seconds
  3. The lua script checks if the watchdog event has happened with 30s minimum interval
  4. If the watchdog signal is not received then lua sets the "hostname" of the mqtt client. This results in a disconnect-connect started from here. I did not find a better way to do disconnect-connect.
-- hass mqtt plugin has to be installed
local mqtt = require 'HASSMQTT.Client'
local deviceManager = require "telldus.DeviceManager"   
local running_timer = false
local watchdog_count = 0
local watchdog_timeout_seconds = 30 -- Delay in minutes

function init()
    if mqtt == nil then
        print "No mqtt client"
    else
        print "mqtt client ok"
    end
end

function onInit()
    init()
end

function onDeviceStateChanged(device, state, stateValue)
    if mqtt == nil then
        return
    end

    dev_name = device:name()
    if dev_name == "HassTellsickMQTTWatchdog" then
        if device:state() == 1 then
            watchdog_count = watchdog_count + 1
            print "HassTellstickMQTTWatchdog signal received"
        end
    end

    if not running_timer then
        running_timer = true
        watchdog_count = 0
        sleep(watchdog_timeout_seconds*1000)

        if watchdog_count == 0 then
            print("HassTellstickMQTTWatchdog timeout")
            mqtt:configWasUpdated('hostname', '<HASS_ADDRESS>')
        else
            print("HassTellstickMQTTWatchdog count %u", watchdog_count)
        end
        running_timer = false
    end
end
henripalmroth commented 2 years ago

Looks interesting. Could you share also the HA automation part?

pierrebengtsson commented 2 years ago

I´m experiencing the same issue. In my case we have a lot of power outages at the winter and when my znet starts up before my HA instance the MQTT connection fails and won´t reconnect until i powercykle my znet. @crashmatt could you share youre HA-automation? If i create an automation that sets the fictional device to "on" every 10 second and the only message I get from the LUA-script is "HassTellstickMQTTWatchdog timeout"

crashmatt commented 2 years ago

I have modified the lua a bit since I last posted. It also relies on a "ClockTick" device set once a minute. We can probably find a better solution to that.

The HA script is simple. It just sets a switch every 1/10 second. You need to create this "virtual" watchdog switch for Tellstick by creating a Nexa switch and then not assigning to a real switch.

alias: TellstickPingRepeatdescription: ''trigger: - platform: time_pattern seconds: /10condition: []action: - type: turn_on device_id: 702668760f977dea1be89b83a55adfb0 entity_id: switch.hasstellsickmqttwatchdog domain: switchmode: single

On Mon, 17 Jan 2022 at 13:11, pierrebengtsson @.***> wrote:

I´m experiencing the same issue. In my case we have a lot of power outages at the winter and when my znet starts up before my HA instance the MQTT connection fails and won´t reconnect until i powercykle my znet. @crashmatt https://github.com/crashmatt could you share youre HA-automation? If i create an automation that sets the fictional device to "on" every 10 second the only message I get from the LUA-script is "HassTellstickMQTTWatchdog timeout"

— Reply to this email directly, view it on GitHub https://github.com/quazzie/tellstick-plugin-mqtt-hass/issues/30#issuecomment-1014454548, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVYHIW4V2C35JWIL6XNGTUWQBPZANCNFSM5JL2UGIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

crashmatt commented 2 years ago

Important update. The watchdog from hass is sent to the tellstick. What I didn't know before is that the tellstick transmitts the virtual switch code.

I am using a nexa switch as a virtual device. Since the watchdog transmits every 10seconds, when you retrain a physical Nexa switch it is likely to capture the watchdog also. This has been driving me crazy for a few months of light switches always switching themselves back on.

Solution is to always have the watchdog send an off state. That way a switch that is training always unlearn the watchdog. Another solution might be to pick a device on a different protocol. HAve not tested this yet.

More and patchwork required...

On Mon, 17 Jan 2022 at 17:35, matthew coleman @.***> wrote:

I have modified the lua a bit since I last posted. It also relies on a "ClockTick" device set once a minute. We can probably find a better solution to that.

The HA script is simple. It just sets a switch every 1/10 second. You need to create this "virtual" watchdog switch for Tellstick by creating a Nexa switch and then not assigning to a real switch.

alias: TellstickPingRepeatdescription: ''trigger: - platform: time_pattern seconds: /10condition: []action: - type: turn_on device_id: 702668760f977dea1be89b83a55adfb0 entity_id: switch.hasstellsickmqttwatchdog domain: switchmode: single

On Mon, 17 Jan 2022 at 13:11, pierrebengtsson @.***> wrote:

I´m experiencing the same issue. In my case we have a lot of power outages at the winter and when my znet starts up before my HA instance the MQTT connection fails and won´t reconnect until i powercykle my znet. @crashmatt https://github.com/crashmatt could you share youre HA-automation? If i create an automation that sets the fictional device to "on" every 10 second the only message I get from the LUA-script is "HassTellstickMQTTWatchdog timeout"

— Reply to this email directly, view it on GitHub https://github.com/quazzie/tellstick-plugin-mqtt-hass/issues/30#issuecomment-1014454548, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVYHIW4V2C35JWIL6XNGTUWQBPZANCNFSM5JL2UGIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

fredrike commented 1 year ago

I've had this issue for quite some time, don't know a good solution though.

https://github.com/quazzie/tellstick-plugin-mqtt-hass/issues/9

It would be great if @crashmatt could share the lua and ha config (with formatting) for a watchdog.

crashmatt commented 1 year ago

Fredrik, There are a few different parts to this.

  1. A virtual switch item on tellstick to receive the heartbeats
  2. A heartbeat automation from hass to tellstick so the tellstick knows it is healthy
  3. A watchdog monitor on tellstick to reboot if the heartbeat is not ok

Do this in order. Check the heartbeats are being received at the tellstickbefore adding the watchdog monitor. Otherwise your tellstick will be continuously rebooting and you are in for a frustrating day.

Step 1 [image: image.png]

Let me know if you need more guidance and I will attempt to document it better

On Sun, 16 Apr 2023 at 22:29, Fredrik Erlandsson @.***> wrote:

I've had this issue for quite some time, don't know a good solution though.

9 https://github.com/quazzie/tellstick-plugin-mqtt-hass/issues/9

It would be great if @crashmatt https://github.com/crashmatt could share the lua and ha config (with formatting) for a watchdog.

— Reply to this email directly, view it on GitHub https://github.com/quazzie/tellstick-plugin-mqtt-hass/issues/30#issuecomment-1510478831, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVYHNW6G23NAZ3HXISLI3XBRJDXANCNFSM5JL2UGIQ . You are receiving this because you were mentioned.Message ID: @.***>

tiehfood commented 1 year ago

Might have a look at my comments on the mentioned ticket. Maybe that's a thing?

fredrike commented 1 year ago

Here are my current configurations.

  1. Created a switch in Telldus Live (called MQTT-watchdog)
  2. Changed id for the new switch to switch.tellstick_mqtt_watchdog in HA
  3. Built the following automation in HA:
      alias: HassTellstickMQTTWatchdog
      description: ""
      trigger:
        - platform: time_pattern
          seconds: /10
      condition: []
      action:
        - service: switch.turn_on
          data: {}
          target:
            entity_id: switch.tellstick_mqtt_watchdog
      mode: single
  4. Built the following Lua script on my Telldus TellStick (accessed trough the local IP):

    -- hass mqtt plugin has to be installed
    local mqtt = require 'HASSMQTT.Client'
    local deviceManager = require "telldus.DeviceManager"
    local running_timer = false
    local watchdog_count = 0
    local watchdog_timeout_seconds = 30 -- Delay in seconds
    
    function init()
       if mqtt == nil then
          print "No mqtt client"
       else
          print "mqtt client ok"
       end
    end
    
    function onInit()
       init()
    end
    
    function onDeviceStateChanged(device, state, stateValue)
       if mqtt == nil then
          return
       end
    
       dev_name = device:name()
       if dev_name == "HassTellsickMQTTWatchdog" then
          if device:state() == 1 then
             watchdog_count = watchdog_count + 1
             print "HassTellstickMQTTWatchdog signal received"
          end
       end
    
       if not running_timer then
          running_timer = true
          watchdog_count = 0
          sleep(watchdog_timeout_seconds*1000)
    
          if watchdog_count == 0 then
             print("HassTellstickMQTTWatchdog timeout")
             mqtt:connect()
          else
             print("HassTellstickMQTTWatchdog count %u", watchdog_count)
          end
          running_timer = false
       end
    end

I've not had any issues with MQTT since I started running this, but I can't say that it is just because of this script (I might be lucky too).

tiehfood commented 1 year ago
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "build/bdist.linux-x86_64/egg/paho/mqtt/client.py", line 3591, in _thread_main
  File "build/bdist.linux-x86_64/egg/paho/mqtt/client.py", line 1779, in loop_forever
  File "build/bdist.linux-x86_64/egg/paho/mqtt/client.py", line 1044, in reconnect
  File "build/bdist.linux-x86_64/egg/paho/mqtt/client.py", line 3685, in _create_socket_connection
  File "/usr/lib/python2.7/socket.py", line 575, in create_connection
    raise err
timeout: timed out

This is the error which paho throws, if the mqtt server is restarted or shut down

same problem here: eclipse/paho.mqtt.python#636

crashmatt commented 1 year ago

It will be interesting to see how stable your system is. I don't know how finely this setup is.

A bad MQTT connection can be faked by stopping the watchdog transmit. I did this a few times and checked the system came back together. Sometimes takes a while to heal and become stable again.

/Matt

On Mon, 24 Apr 2023 at 21:55, Fredrik Erlandsson @.***> wrote:

Here are my current configurations.

  1. Created a switch in Telldus Live (called MQTT-watchdog)

  2. Changed id for the new switch to switch.tellstick_mqtt_watchdog in HA

  3. Built the following automation in HA:

    alias: HassTellstickMQTTWatchdogdescription: ""trigger:

    • platform: time_pattern seconds: /10condition: []action:
    • service: switch.turn_on data: {} target: entity_id: switch.tellstick_mqtt_watchdogmode: single
  4. Built the following Lua script on my Telldus TellStick (accessed trough the local IP):

    -- hass mqtt plugin has to be installedlocal mqtt = require 'HASSMQTT.Client'local deviceManager = require "telldus.DeviceManager"local running_timer = falselocal watchdog_count = 0local watchdog_timeout_seconds = 30 -- Delay in seconds function init() if mqtt == nil then print "No mqtt client" else print "mqtt client ok" endend function onInit() init()end function onDeviceStateChanged(device, state, stateValue) if mqtt == nil then return end

    dev_name = device:name() if dev_name == "HassTellsickMQTTWatchdog" then if device:state() == 1 then watchdog_count = watchdog_count + 1 print "HassTellstickMQTTWatchdog signal received" end end

    if not running_timer then running_timer = true watchdog_count = 0 sleep(watchdog_timeout_seconds*1000)

     if watchdog_count == 0 then
        print("HassTellstickMQTTWatchdog timeout")
        mqtt:connect()
     else
        print("HassTellstickMQTTWatchdog count %u", watchdog_count)
     end
     running_timer = false

    endend

I've not had any issues with MQTT since I started running this, but I can't say that it is just because of this script (I might be lucky too).

— Reply to this email directly, view it on GitHub https://github.com/quazzie/tellstick-plugin-mqtt-hass/issues/30#issuecomment-1520743711, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVYHJ3DQ62FOF5LFMYCFDXC3LCJANCNFSM5JL2UGIQ . You are receiving this because you were mentioned.Message ID: @.***>

tiehfood commented 1 year ago

MQTT_Homeassistant-0.90.4_paho-1.5.1.zip @crashmatt , @fredrike you may want to try this version. It seems that the reconnect is working better with paho <1.6.0. So this is just the current version 0.90.4 repacked with the paho 1.5.1 from version 0.90.0. For me this is far more stable on reconnects and no exception is thrown so far.

p.s. the files are signed and unmodified from this repo, otherwise it would not be possible to load them in telldus. So you might trust the content of the ZIP 😉

crashmatt commented 1 year ago

I have no idea how to use those files. I presume they modify the tellstick plugin?

On Tue, 25 Apr 2023 at 23:57, tiehfood @.***> wrote:

MQTT_Homeassistant-0.90.4_paho-1.5.1.zip https://github.com/quazzie/tellstick-plugin-mqtt-hass/files/11327122/MQTT_Homeassistant-0.90.4_paho-1.5.1.zip @crashmatt https://github.com/crashmatt , @fredrike https://github.com/fredrike you may want to try this version. It seems that the reconnect is working better with paho <1.6.0. So this is just the current version 0.90.4 repacked with the paho 1.5.1 from version 0.90.0. For me this is far more stable on reconnects and no exception is thrown so far.

— Reply to this email directly, view it on GitHub https://github.com/quazzie/tellstick-plugin-mqtt-hass/issues/30#issuecomment-1522471284, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVYHLJWRHVW6EKXPJU2KLXDBCFFANCNFSM5JL2UGIQ . You are receiving this because you were mentioned.Message ID: @.***>

tiehfood commented 1 year ago

Just install the zip file as a plug-in (don't extract). As you do it with the official plugin from the releases page. And yes, it just replaces the paho version (from 1.6.1 down to 1.5.1)

sampod commented 1 year ago

MQTT_Homeassistant-0.90.4_paho-1.5.1.zip

This seems to be working well. I installed this and tried a couple of times restarting my mqtt server and power cycling my network switch and the mqtt connection was restored correctly.

hauard commented 9 months ago

MQTT_Homeassistant-0.90.4_paho-1.5.1.zip

This seems to be working well. I installed this and tried a couple of times restarting my mqtt server and power cycling my network switch and the mqtt connection was restored correctly.

I forgot to check what version I had first, but tried the zip-file, my problem still persists. Lately the addon have disconnected just seconds after connecting, making the znet dumb as f**k

Going to try the lua script now, fingers crossed X

hauard commented 9 months ago

Looks like the client disconnects just seconds after connecting either way. Made myself a virtual switch that a lua listens to and connects the MQTT, making it easier to investigate. Earlier I had to login and bump the addon by removing and adding a number in the config in the addon

I have several other clients that connects to the broker without issues. Tried increasing and decreasing the keepalive ping on the broker, but no luck.

Is there any way to enable logging on the znet? To see whats going on there

grEvenX commented 9 months ago

My problems with disconnects only happens after a while. After years with issues where I have had to restart the znet manually from time to time, I’ve now connected it to a power switch that I automatically power cycle every night. Now my setup is finally stable 🙈