raomin / ESPAltherma

Monitor your Daikin Altherma / ROTEX heat pump with ESP32
MIT License
317 stars 104 forks source link

Frequent disconnects from HAOS Mosquitto (client exceeded timeout) #411

Closed MT76 closed 3 months ago

MT76 commented 3 months ago

Running ESPAltherma on a M5StickC, connected to Mosquitto in HAOS. I'm seeing lots of short disconnects. Mosquitto log says it's due to time out:

2024-03-16T16:01:49: New client connected from 192.168.40.130:57699 as ESPAltherma-dev (p2, c1, k15, u'althermamqtt').
2024-03-16T16:05:48: Client ESPAltherma-dev has exceeded timeout, disconnecting.
2024-03-16T16:05:49: New client connected from 192.168.40.130:57700 as ESPAltherma-dev (p2, c1, k15, u'althermamqtt').
2024-03-16T16:22:18: Client ESPAltherma-dev has exceeded timeout, disconnecting.
2024-03-16T16:22:19: New client connected from 192.168.40.130:57701 as ESPAltherma-dev (p2, c1, k15, u'althermamqtt').
2024-03-16T16:22:48: Client ESPAltherma-dev has exceeded timeout, disconnecting.
2024-03-16T16:22:49: New client connected from 192.168.40.130:57702 as ESPAltherma-dev (p2, c1, k15, u'althermamqtt').
2024-03-15T16:46:48: Client ESPAltherma-dev has exceeded timeout, disconnecting.
2024-03-15T16:46:49: New client connected from 192.168.40.130:57703 as ESPAltherma-dev (p2, c1, k15, u'althermamqtt').
2024-03-15T16:50:48: Client ESPAltherma-dev has exceeded timeout, disconnecting.
2024-03-15T16:50:49: New client connected from 192.168.40.130:57704 as ESPAltherma-dev (p2, c1, k15, u'althermamqtt').
2024-03-15T17:02:18: Client ESPAltherma-dev has exceeded timeout, disconnecting.
2024-03-15T17:02:19: New client connected from 192.168.40.130:57705 as ESPAltherma-dev (p2, c1, k15, u'althermamqtt').

Seemingly random, but look at the timestamp seconds: Coincides with setup.h's FREQUENCY setting of 30000. The connection is usually re-established immediately, but Althermasensors is shown as Unavailable for a second, obviously resulting in gaps in the corresponding graphs.

Mosquitto's log_type all shows something interesting:

Last three lines of successful PUBLISH:

2024-03-16T00:58:12: Received PUBLISH from ESPAltherma-dev (d0, q0, r0, m0, 'espaltherma/log', ... (100 bytes))
2024-03-16T00:58:13: Received PUBLISH from ESPAltherma-dev (d0, q0, r0, m0, 'espaltherma/ATTR', ... (1375 bytes))
2024-03-16T00:58:14: Received PUBLISH from ESPAltherma-dev (d0, q0, r0, m0, 'espaltherma/log', ... (25 bytes))

Last three lines when a disconnect happens:

2024-03-16T00:58:42: Received PUBLISH from ESPAltherma-dev (d0, q0, r0, m0, 'espaltherma/log', ... (100 bytes))
2024-03-16T00:58:43: Received PUBLISH from ESPAltherma-dev (d0, q0, r0, m0, 'espaltherma/ATTR', ... (1379 bytes))
2024-03-16T00:59:07: Client ESPAltherma-dev has exceeded timeout, disconnecting.

See? It's missing the last /log publish that is supposed to come after /ATTR publish. And this is the case with all disconnects. Every single one of them happens at exactly the same point in the publish process: After the publish of 'espaltherma/ATTR'.

Could this be caused by an OTA flash? I ask because the first day i got it running it did not do this. Then i discovered the OTA option and reflashed it, OTA. I'm not sure, but i think that's when i started to see this behavior. I already tried a non OTA flash but that did not help.

raomin commented 3 months ago

You can try to set MQTT_SOCKET_TIMEOUT to a short time, eg 15 sec. Add this in setup.h

#define MQTT_SOCKET_TIMEOUT 15
raomin commented 3 months ago

actually, it should already be at 15 sec from the default library file. So it's weird it times-out at 30sec. Is your wifi rssi correct? (between -60 and -30)?

MT76 commented 3 months ago

Yes, fluctuates a bit, but usually it's @ -35dBm. Flaky Wifi was the first suspect, but the issue persisted even after yanking my 2 AP's off the ceiling and running a cable from my router into the boiler room and setting up a single AP there. And then i saw it disconnect at the same moment each time, so i pretty much ruled out Wifi issues. For what it's worth, i also migrated my HAOS to a new system to rule out hardware issues on the broker side. I even set up a MQTT LXC on a completely different server to rule out problems with my HAOS install. But got exceeded timeouts in all situations.

MT76 commented 3 months ago

Still think it started after the first OTA update. Maybe reflashing factory firmware back on the M5Stick and then ESPAltherma again will fix this. But, priorities. Sleep > Learn Arduino > flash. =) I'll report back here.

MT76 commented 3 months ago

Flashed factory with partition scheme No OTA and then ESPAltherma again. But unfortunately that didn't fix it, still lots of disconnects.

MT76 commented 3 months ago

Well, i think i have found the problem and if so, it's a severe case of PEBKAC...

I was looking for a way to calculate COP and i came across https://github.com/raomin/ESPAltherma#calculating-cop. But I'm using def/Altherma(EPRA D ETV16-ETB16-ETVZ16 D series 14-16kW).h for my heatpump and it didn't have a definition for Voltage (N-phase). So as an experiment i added a definition for Voltage from another definition file. And i did get readings. But they didn't make sense so i decided not to use them in HA, but i forgot to take it out of my definition file. Today i was going through the file again and saw the voltage register still there so i deleted it. Uploaded OTA to my M5Stick and lo and behold; it is running for 3 hours now, without 1 disconnect.

So sorry for wasting your time raomin, i'll just crawl back under my rock and feel stupid. =)

raomin commented 3 months ago

No problem @MT76, glad you fixed it. Still wondering why it caused a disconnect.... maybe a reboot...?