rvdbreemen / OTGW-firmware

A ESP8266 devkit firmware for the Nodoshop version of the Opentherm Gateway (OTGW)
MIT License
145 stars 34 forks source link

MQTT broker not connected #149

Closed marcoCasamento closed 1 year ago

marcoCasamento commented 1 year ago

Hi and thanks for the great FW. I use the firmware on a Nodosvop v2 with a PIC16F88 on PIC FW 5.0.1 and MCU FW 0.9.5+4eb7d24 (30-05-2022). It's configured on a MQTT server that I run in a container and seems to works just fine, but I've noticed that HomeAssistant about every 10/15 minutes say that all entities provided by the MQTT integrations are unavailable. If I connect through telnet on the MCU I see:

15:05:07.138281 (  12656| 11544) processOT   (1693): Answer Thermostat AC0000300 (9)[MsgID=  0][READ_ACK        ]>Status = Slave  [--------]
15:05:07.174146 (  11312| 10248) processOT   (1693): Thermostat        T00000300 (9)[MsgID=  0][READ_DATA       ]-Status = Master [CD---W--] <ignored>
15:05:17.409951 (  10016|  8832) processOT   (1693): Request Boiler    R80000100 (9)[MsgID=  0][READ_DATA       ]>Status = Master [C----W--]
15:05:17.420603 (  10264|  8832) handleOTGW  (1760): Serial Overrun
15:05:17.422506 (   8920|  8184) processOT   (1722): Not processed, received from OTGW => (0300) [4]
15:05:17.434596 (   9320|  8184) sendMQTTData( 406): Error: MQTT broker not connected.
15:05:17.437355 (   7976|  6888) sendMQTTData( 406): Error: MQTT broker not connected.
15:05:17.438049 (   7976|  6888) sendMQTTData( 406): Error: MQTT broker not connected.
15:05:17.438485 (   7976|  6888) sendMQTTData( 406): Error: MQTT broker not connected.
15:05:17.438929 (   7976|  6888) sendMQTTData( 406): Error: MQTT broker not connected.
15:05:17.439365 (   7976|  6888) sendMQTTData( 406): Error: MQTT broker not connected.
15:05:17.439809 (   7976|  6888) sendMQTTData( 406): Error: MQTT broker not connected.
15:05:17.440904 (   7304|  6240) sendMQTTData( 406): Error: MQTT broker not connected.
15:05:17.441753 (   7304|  6240) sendMQTTData( 406): Error: MQTT broker not connected.
15:05:17.442248 (   7304|  6240) sendMQTTData( 406): Error: MQTT broker not connected.
15:05:17.442746 (   7304|  6240) processOT   (1693): Boiler            B40000100 (9)[MsgID=  0][READ_ACK        ]>Status = Slave  [--------]
15:05:17.450384 (   9320|  8184) sendMQTTData( 406): Error: MQTT broker not connected.

After a while it reconnects and things start to work again:

15:14:49.517580 (  13400| 12384) processOT   (1693): Boiler            BC0110000 (9)[MsgID= 17][READ_ACK        ]>RelModLevel = 0.00 %
15:14:49.900375 (  14744| 13680) sendMQTTData( 406): Error: MQTT broker not connected.
15:14:49.903459 (  13400| 12384) processOT   (1693): Thermostat        T00120000 (9)[MsgID= 18][READ_DATA       ] CHPressure = 0.00 bar
MQTT connected
15:14:50.047344 (  14384| 13680) addOTWGcmdto(1239): CmdQueue: Adding cmd end of queue, slot [0]
15:14:50.049229 (  13040| 12384) addOTWGcmdto(1242): CmdQueue: Insert queue in slot[0]:cmd[CS=1] (4)
15:14:50.050279 (  13040| 12384) addOTWGcmdto(1262): CmdQueue: Next free queue slot: [1]
15:14:50.057334 (  14384| 13680) addOTWGcmdto(1239): CmdQueue: Adding cmd end of queue, slot [1]
15:14:50.058848 (  13040| 12384) addOTWGcmdto(1242): CmdQueue: Insert queue in slot[1]:cmd[CS=45] (5)
15:14:50.059862 (  13040| 12384) addOTWGcmdto(1262): CmdQueue: Next free queue slot: [2]

I've already checked MQTT logs (seems fine but I'm not an expert) and WiFi connection that seems good, probably best proof is the solid telnet connection while printing error messages. What can be the problem ? How can I troubleshoot it ?

UPDATE It turns out I looked at the wrong place for MQTT logs. MQTT server indeed logs some relevant errors:

2022-11-30T15:05:17.445710307Z 1669820717: Client OTGWAC:0B:FB:CF:A2:05 disconnected due to protocol error.

2022-11-30T15:05:59.460947971Z 1669820759: New connection from 192.168.1.177:58478 on port 1883.

2022-11-30T15:05:59.461634145Z 1669820759: New client connected from 192.168.1.177:58478 as OTGWAC:0B:FB:CF:A2:05 (p2, c1, k15, u'hauser').

2022-11-30T15:14:08.038595128Z 1669821248: Client OTGWAC:0B:FB:CF:A2:05 disconnected due to protocol error.

2022-11-30T15:14:50.056192241Z 1669821290: New connection from 192.168.1.177:50281 on port 1883.

2022-11-30T15:14:50.058293671Z 1669821290: New client connected from 192.168.1.177:50281 as OTGWAC:0B:FB:CF:A2:05 (p2, c1, k15, u'hauser').

2022-11-30T15:22:57.967104200Z 1669821777: Client OTGWAC:0B:FB:CF:A2:05 disconnected due to malformed packet.

2022-11-30T15:23:39.998089398Z 1669821819: New connection from 192.168.1.177:53478 on port 1883.

2022-11-30T15:23:39.998986189Z 1669821819: New client connected from 192.168.1.177:53478 as OTGWAC:0B:FB:CF:A2:05 (p2, c1, k15, u'hauser').
rvdbreemen commented 1 year ago

Somehow it looks like the wire protocol is broken for some reason in your case. I have not seen this myself, but I will go and look in my MQTT logs to see if that is really the case. Has you situation resolved it self? Or still seeing this strange behaviour?

marcoCasamento commented 1 year ago

Yes, sorry, this must be closed!

I’ve had the same doubt that there were some connection issues.

I’ve put a ping running for a few hours and it reported less than 1% of loss packets, but still SOME of them were losts. I’ve changed the ESP board with no luck, then I’ve tried to change firmware and I had the same problem.

Finally, last weekend I’ve moved the container running mosquitto to a different server and I had 24+ hours without problem! So the problem is NOT in the firmware and this must be closed. Sorry, I should have put an update on the issue.

Just yesterday evening I’ve also discovered that my linux server was using two ip addresses (one static and one by dhcp client). Removing the address from dhcp for some unknown reason put the lost packet to 0.

From: Robert van den Breemen @.> Sent: martedì 20 dicembre 2022 14:07 To: rvdbreemen/OTGW-firmware @.> Cc: Marco Casamento @.>; Author @.> Subject: Re: [rvdbreemen/OTGW-firmware] MQTT broker not connected (Issue #149)

Somehow it looks like the wire protocol is broken for some reason in your case. I have not seen this myself, but I will go and look in my MQTT logs to see if that is really the case. Has you situation resolved it self? Or still seeing this strange behaviour?

— Reply to this email directly, view it on GitHub https://github.com/rvdbreemen/OTGW-firmware/issues/149#issuecomment-1359324375 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AABW2Y7CMDKIRU3AKO2IPALWOGVNZANCNFSM6AAAAAASPXYU7M . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AABW2Y2NF2BAWUEP6IXKE7TWOGVNZA5CNFSM6AAAAAASPXYU7OWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSRAWSNO.gif Message ID: @. @.> >

rvdbreemen commented 1 year ago

Thanks for letting me know.