norberts1 / hometop_HT3

Pimp your heater.
GNU General Public License v3.0
39 stars 19 forks source link

Reconnect to MQTT broker if offline #3

Closed FredericMa closed 6 years ago

FredericMa commented 6 years ago

Hi @norberts1 ,

First of all, thanks for the big update and all the improvements!

I have a suggestion; at the moment it looks like everything stops working once the MQTT broker goes offline. In my case, the broker is running on another server. In noticed that, when the broker goes offline, the application stops working, generates an error and doesn't reconnect to the broker.

I also noticed, and think, that there is a timing issue when connecting to the remote broker. When I restarted the raspberry pi, the software wasn't able to connect to the broker. Maybe because the network is not yet fully initialized? I've added a 10 second sleep (probably way to high) to the mqtt_init function in the mqtt_client_if.py file and this solved the issue.

Are there any plans to implement a reconnect to the broker in case of disconnection? I think it would also be good if the software submits all topics again to the broker after successfully reconnecting.

Greetings, Frederic

norberts1 commented 6 years ago

Hi Frederic, I checked this behaviour on my environment (mosquitto-broker on another machine) and some of that problems I found.

  1. I'll change that waiting-time to 60 seconds for connecting to the broker after startup.
  2. The mqtt-client collects that not sendable topics until the mqtt-broker is online again. The client then sends that collected topics to the mqtt-broker, but only them. So you are right, it is good to send all topics to the mqtt-broker after reconnecting to it. But this is not the best handling, cause the mqtt-broker should be running in 24/7 manner. The Collgate startup-script is waiting for the network and this is not the reason for that problems. But I'm thinking on that to get it more stable.

That crash or stopping services you had I couldn't found on my environment. What I have done is stopping the mosquitto-broker and after that the errors are logged to the mqtt-client-logfile (topic not sendable). The mqtt-IF Client and also the Collgate-daemon are still running. After mqtt-broker restart the collected topics are send to the mqtt-broker and the system continuing it's work. Some more details are required (log-informations e.g.) to find that problems on your environment. Greetings, Norbert

FredericMa commented 6 years ago

Hi Norbert,

  1. Sounds like a good idea. This also means that topics that are updated during the wait time will also be sent to the broker once connected?

  2. I agree with you that an mqtt-broker should be online 24/7 but I run it on a server where I also have other stuff installed so now and then it needs a reboot for updates or changes. Since it my home environment I also don't really bother about 4/7 availability :-D What does "waiting for the network" actually means? I would think that it means that it waits until it has received an IP from the DHCP server so it can setup connections but this doesn't seem to be the case.

  3. The crash only seems to occur during startup in case the broker is not available. If the broker goes offline during normal operation, the reconnect works fine. I see this in the mqtt-client logfile at startup:

    18.08.2017 10:14:52 CRITICAL: cmqtt_baseclass.mqtt_init() broker not available:192.168.50.101; port:1883
    18.08.2017 10:14:52 CRITICAL: mqtt_client_if() terminated!

    and this in the collgate logfile:

    18.08.2017 10:26:56 CRITICAL: ccollgate().run();Error;mqtt_client_if-thread terminated, see mqtt-logfile for details.
    18.08.2017 10:26:56 CRITICAL: ccollgate().run();Error; terminated
    18.08.2017 10:26:56 CRITICAL: cstore2db.run();Error; thread terminated unexpected

    At this point there is no reconnect initiated anymore.

If the broker goes offline during normal operation and comes back online is see indeed that the program continues as expected:

18.08.2017 10:34:41 WARNING: cmqtt_client.publish_data() error occured; topic:heating/dhw_Tmeasured; mid:70
18.08.2017 10:34:41 WARNING: cmqtt_client.publish_data() error occured; topic:heating/dhw_Tcylinder; mid:71
18.08.2017 10:34:50 INFO: My CONNACK received with code:0.

I hope this is some useful information!

Thanks!

Greetings, Frederic

norberts1 commented 6 years ago

Hi Frederic,

  1. -> yes, mqtt-client will collect that not yet sendable topics until reconnecting to the broker. But I'll think that topics aren't queued and only the last value is send.
  2. -> waiting for network is implemented in the ht_collgate startup-script. The init-daemon is handling this at startup-time. But this I'll think is 'only' a wait for a running network-daemon and not for a special service.
  3. -> I have changed the wait-timeout to 120 seconds in the modul: 'lib/mqtt_client_if.py' Please update this modul in your environment and hopefully it fits to your requirements. Greetings, Norbert
FredericMa commented 6 years ago

Hi Norbert,

  1. In my opinion is the last known value enough and isn't it necessary to send all previous values.
  2. This update works perfect, thanks! One note: if the mqtt-client can't connect within 2 minutes it still stops operation and no reconnect is initiated anymore, correct? Wouldn't it be nicer that it keeps reconnecting endlessly? For example, if I have a general outage at home and power comes back up, my raspberry pi will be up and running after like half a minute while my server will need maybe more than 3 minutes to get started. In this case I will also need to reboot the raspberry, or at least the hometop software once my server is up and running. Anyway, this a case that while rarely occur so the fix you arranged for me is very good!

Thanks!

Greetings, Frederic

norberts1 commented 6 years ago

Hi Frederic,

  1. -> The values are repeated from the heater-system and all values are send at mqtt-client startup-time, but later on only the differences to the last value are send. Some of the values are rare updated like solar-pump status during wintertime so this can raise to a problem. In the reconnecting state it could be possible to do something like resending that data. But this is not urgent I'll think.
  2. -> Yes, for some environments 2 minutes wait-timeout are not enough, but it is easy to rise up this value. Endless waiting is not the best cause than it looks like 'everything is operational', but it isn't. My aim was: connection to the server and the connection to the service (Broker) must be handled differently. If there are any faults (e.g. wrong server-name, IP-address) the mqtt-broker is never reachable. The paho-mqtt library handles this situation like: I will connect to broker, I will connect to broker... So at first I'll check the server-availability for x seconds and after that I call the paho-lib for connecting to the broker. Any errors are reported and critical ones raise an exception to terminating the program. Greetings, Norbert