homieiot / homie-esp8266

πŸ’‘ ESP8266 framework for Homie, a lightweight MQTT convention for the IoT
http://homieiot.github.io/homie-esp8266
MIT License
1.36k stars 308 forks source link

Connection issues with Mosquitto MQTT broker. #640

Open nemidiy opened 4 years ago

nemidiy commented 4 years ago

Hi all, as the subject says I have been having issues connecting with the MQTT broker, even using the most basic sketch, what happens is that nodemcu will connect to WIFI just fine but then will struggle connecting to the broker forever blinking the blue led on the board.

Here is the sketch:

#include <Homie.h>

HomieNode sensorNode("sensor", "Sensor", "sensor");

unsigned long lastSent = 0;

void loopHandler() {
  unsigned long m = millis();
  if ( m - lastSent >= 10 * 1000UL || lastSent == 0) {
    Homie.getLogger() << "Fake value: " << m << endl;
    sensorNode.setProperty("value").send(String(m));
    lastSent = millis();
  }
}

void setup() {

  Serial.begin(115200);
  Serial << endl << endl;

  Homie_setFirmware("test", "1.0.0");
  Homie.setLoopFunction(loopHandler);

  sensorNode.advertise("value").setName("Value").setDatatype("double");

  Homie.setup();
}

void loop(){
    Homie.loop();
}

Here is platformio ini config :

[env:nodemcuv2]
platform = espressif8266
board = nodemcuv2
framework = arduino
build_flags =
    -D PIO_FRAMEWORK_ARDUINO_LWIP2_LOW_MEMORY
upload_speed = 115200
lib_ldf_mode = deep
lib_deps =
    ArduinoJson@6.13.0
    https://github.com/homieiot/homie-esp8266.git#develop-v3

Here is what I see in the serial monitor:

πŸ’‘ Firmware test (1.0.0)
πŸ”Œ Booting into normal mode πŸ”Œ
{} Stored configuration
  β€’ Hardware device ID: cc50e33cf888
  β€’ Device ID: valvebox_02
  β€’ Name: Valve Box
  β€’ Device Stats Interval: 10 sec
  β€’ Wi-Fi: 
    β—¦ SSID: differential
    β—¦ Password not shown
  β€’ MQTT: 
    β—¦ Host: 192.168.86.24
    β—¦ Port: 1883
    β—¦ Base topic: devices/
    β—¦ Auth? no
  β€’ OTA: 
    β—¦ Enabled? yes
↕ Attempting to connect to Wi-Fi...
βœ– Wi-Fi disconnected, reason: 203
Triggering WIFI_DISCONNECTED event...
↕ Attempting to connect to Wi-Fi...
βœ” Wi-Fi connected, IP: 192.168.86.111
Triggering WIFI_CONNECTED event...
↕ Attempting to connect to MQTT...
Sending initial information...
βœ– MQTT disconnected, reason: 0
Triggering MQTT_DISCONNECTED event...
↕ Attempting to connect to MQTT...
Sending initial information...
βœ– MQTT disconnected, reason: 0
Triggering MQTT_DISCONNECTED event...
↕ Attempting to connect to MQTT...
Sending initial information...
βœ– MQTT disconnected, reason: 0
Triggering MQTT_DISCONNECTED event...
↕ Attempting to connect to MQTT...
Sending initial information...

This is what I see in the mosquitto log

1574110838: New client connected from 192.168.86.111 as Homie-valvebox_02 (c1, k15).
1574110879: New connection from 192.168.86.111 on port 1883.
1574110879: Client Homie-valvebox_02 already connected, closing old connection.
1574110879: Client Homie-valvebox_02 disconnected.
1574110879: New client connected from 192.168.86.111 as Homie-valvebox_02 (c1, k15).
1574110901: Client Homie-valvebox_02 has exceeded timeout, disconnecting.
1574110901: Socket error on client Homie-valvebox_02, disconnecting.
1574110902: New connection from 192.168.86.111 on port 1883.
1574110902: New client connected from 192.168.86.111 as Homie-valvebox_02 (c1, k15).
1574110923: Client Homie-valvebox_02 has exceeded timeout, disconnecting.
1574110923: Socket error on client Homie-valvebox_02, disconnecting.
1574110924: New connection from 192.168.86.111 on port 1883.
1574110924: New client connected from 192.168.86.111 as Homie-valvebox_02 (c1, k15).
1574110954: Client Homie-valvebox_02 has exceeded timeout, disconnecting.
1574110954: Socket error on client Homie-valvebox_02, disconnecting.
1574110958: New connection from 192.168.86.111 on port 1883.
1574110958: New client connected from 192.168.86.111 as Homie-valvebox_02 (c1, k15).
1574110979: Client Homie-valvebox_02 has exceeded timeout, disconnecting.

This are the messages I get if subscribe to devices/#

devices/valvebox_02/$state init
devices/valvebox_02/$state lost
devices/valvebox_02/$state init
devices/valvebox_02/$state lost
devices/valvebox_02/$state init
devices/valvebox_02/$state lost
devices/valvebox_02/$state lost
devices/valvebox_02/$state lost
devices/valvebox_02/$state init
devices/valvebox_02/$state lost

Last but not least the broker config:

# Place your local configuration in /etc/mosquitto/conf.d/
#
# A full description of the configuration file is at
# /usr/share/doc/mosquitto/examples/mosquitto.conf.example

pid_file /var/run/mosquitto.pid

persistence true
persistence_location /var/lib/mosquitto/

log_dest file /var/log/mosquitto/mosquitto.log

include_dir /etc/mosquitto/conf.d

and BTW I have nothing set under /etc/mosquitto/conf.d

I am running ubuntu 18.04, and the mosquito version is 1.4.15-2 (straight out from the repos)

Here is what fixes it for me, not saying it's the right way though, but it does the trick :

void loop(){
    Homie.loop();
    delay(100);
}

That delay call will have the nodemcu connected right away, not disconnect, and working just fine. My guess is there is somewhere some race condition that is causing this. I am not experienced in firmware development and I still need to learn how to use a debugger for ESP2866 so that I try and dig further, without the debugger it pretty much feels like slashing my wrists with a cookie.

Any pointers will be appreciated. Thanks.

kleini commented 4 years ago

Please add the output of all versions of the used libraries. You're advertising a double value and sending a String value. This should not be a problem as there is no MQTT connection anyways but this is what hit my eyes directly.

luebbe commented 4 years ago

Don't you always publish strings to mqtt? Isn't the advertised "double" just a hint for a parser on the receiving side?

luebbe commented 4 years ago

BTW I switched all my development to homie develop-v3 a while ago. The develop branch should be considered dead. It's still an open item to merge the important stuff from develop to develop-v3 and close the old branch.

nemidiy commented 4 years ago

About setting the datatype to double and then sending a string, I was under the impression that the setDataType method was just about declaring the attribute and that had no influence in how the value get's sent out. I will fix that, anyways as you said, it should not be the problem since otherwise with the delay it would not work and it does. That same use case where the data type is set to double but then a string is sent out is present on the examples, so if it is a problem I can fix those as well and send a PR. IE:

https://github.com/homieiot/homie-esp8266/blob/develop-v3/examples/TemperatureSensor/TemperatureSensor.ino

Here are my libs :

ArduinoJson
===========
#ID: 64
An elegant and efficient JSON library for embedded systems

Version: 6.13.0
Homepage: https://arduinojson.org/?utm_source=meta&utm_medium=library.json
Keywords: web, json, http, rest
Compatible frameworks: arduino
Compatible platforms: atmelavr, atmelsam, espressif32, espressif8266, gd32v, infineonxmc, intel_arc32, kendryte210, microchippic32, nordicnrf51, nordicnrf52, ststm32, ststm8, teensy, timsp430
Authors: Benoit Blanchon

AsyncMqttClient
===============
#ID: 346
An Arduino for ESP8266 / ESP32 asynchronous MQTT client implementation

Version: 0.8.2
Keywords: iot, home, automation, async, mqtt, client, esp8266
Compatible frameworks: arduino
Compatible platforms: espressif8266, espressif32
Authors: Marvin ROGER

AsyncTCP
========
#ID: 1826
Asynchronous TCP Library for ESP32

Version: 1.1.1
License: LGPL-3.0
Keywords: async, tcp
Compatible frameworks: arduino
Compatible platforms: espressif32
Authors: Hristo Gochkov

Bounce2
=======
#ID: 1106
Debouncing library for Arduino or Wiring

Version: 2.52
Keywords: bounce, signal, input, ouput
Compatible frameworks: arduino
Compatible platforms: atmelavr, atmelsam, espressif32, espressif8266, intel_arc32, microchippic32, nordicnrf51, nordicnrf52, ststm32, teensy, timsp430
Authors: Thomas O Fredericks

ESP Async WebServer
===================
#ID: 306
Asynchronous HTTP and WebSocket Server Library for ESP8266 and ESP32

Version: 1.2.3
License: LGPL-3.0
Keywords: async, websocket, http, webserver
Compatible frameworks: arduino
Compatible platforms: espressif32, espressif8266
Authors: Hristo Gochkov

ESPAsyncTCP
===========
#ID: 305
Asynchronous TCP Library for ESP8266

Version: 1.2.2
License: LGPL-3.0
Keywords: async, tcp
Compatible frameworks: arduino
Compatible platforms: espressif8266
Authors: Hristo Gochkov

Homie
=====
ESP8266 framework for Homie, a lightweight MQTT convention for the IoT

Version: 91e9b63
Homepage: https://homieiot.github.io/homie-esp8266/
License: MIT
Keywords: iot, home, automation, mqtt, esp8266, async, sensor
Compatible frameworks: arduino
Compatible platforms: espressif8266
Authors: Marvin Roger
Source: git+https://github.com/homieiot/homie-esp8266.git#develop-v3
nemidiy commented 4 years ago

Ok ... I found what is happening, although missing the why.

Given the mosquitto logs showing the problems when connecting, I ran wireshark to see what was going on the TCP level. That showed that when trying to establish the conn to 1883 on the TCP level the mcu would send the SYN, then the server would reply with the ACK,SYN and then re-transmissions started (in the server->mcu direction). Here is a screenshot : https://imgur.com/a/9BCjm0O

That got me suspicious on the interaction mcu <-> wifi router.

I have a google wifi system at home, this guy : https://www.amazon.com/Google-WiFi-system-3-Pack-replacement/dp/B01MAW2294/ref=sr_1_3?crid=2WKDFM25EMNIB&keywords=google+wifi+3+pack&qid=1574182630&sprefix=google+wifi%2Caps%2C278&sr=8-3

The main pod is connected to my ISP modem/router (that had wifi disabled) Those two are right next to eachother and the mcu is right next to those two (same desk). To rule out problems between nodemcu and the google wifi router I enabled wifi also in the ISP router and had nodemcu connect to that wifi network instead .... guess what ? voilΓ ! no more re-transmissions, and the mcu connects right away no more issues.

Now the question is why nodemcu is not getting along with google wifi routers ? So far I have had nothing but good things to say about it. Since I got those three pods my wifi has been incredibly stable and no other device in the house has issues with it.

Can you think of anything that might be off ?

luebbe commented 4 years ago

About setting the datatype to double and then sending a string, I was under the impression that the setDataType method was just about declaring the attribute and that had no influence in how the value get's sent out.

exactly. It's just a hint for somebody on the receiving side, that it should parse the data as float.

ElizabethMorves commented 4 years ago

Wifi may be set up to client isolation.There may be duplicate ip addresses in the network. or the names of the mqtt client.

qwandor commented 4 years ago

I'm having what appears to be the same issue on an ESP32. If I advertise one or two properties it works fine, but as soon as I advertise more than two it gets into a loop of connecting, starting to send things, then disconnecting again. Adding a delay(5) (or higher) into the loop() function seems to work around the issue and I can then advertise as many properties as I like with no issues. I'm trying to connect to a Mosquitto server over SSL. I wonder if there is a buffer at some point in the network stack that is getting filled up quicker than it can be sent out over the network?

I've added some extra logging output to BootNormal.cpp to print the packet ID returned by Interface::get().getMqttClient().publish(...) at each step in case that is relevant, though it seems it may not be the issue.. Logs from the device look like this: ↕ Attempting to connect to Wi-Fi... βœ” Wi-Fi connected, IP: 192.168.86.67 Triggering WIFI_CONNECTED event... ↕ Attempting to connect to MQTT... Sending initial information... stats interval packet ID 9 FW name packet ID 10 FW version packet ID 11 FW checksum packet ID 12 implementation packet ID 13 config packet ID 14 OTA packet ID 16 Node name packet ID 17 Node type packet ID 18 Node properties packet ID 19 Node settable packet ID 21 Node datatype packet ID 22 Node settable packet ID 24 Node datatype packet ID 25 Node name packet ID 27 Node type packet ID 28 βœ– MQTT disconnected, reason: 0 Triggering MQTT_DISCONNECTED event... ↕ Attempting to connect to MQTT... Node properties packet ID 29 ↕ Attempting to connect to MQTT... Sending initial information... stats interval packet ID 9 FW name packet ID 10 FW version packet ID 11 FW checksum packet ID 12 implementation packet ID 13 config packet ID 14 OTA packet ID 16 Node name packet ID 17 Node type packet ID 18 Node properties packet ID 19 Node settable packet ID 21 Node datatype packet ID 22 Node settable packet ID 24 Node datatype packet ID 25 Node name packet ID 27 Node type packet ID 28 βœ– MQTT disconnected, reason: 0 Triggering MQTT_DISCONNECTED event... ↕ Attempting to connect to MQTT... Node properties packet ID 29 ↕ Attempting to connect to MQTT... Sending initial information... stats interval packet ID 9 FW name packet ID 10 FW version packet ID 11 FW checksum packet ID 12 implementation packet ID 13 config packet ID 14 OTA packet ID 16 Node name packet ID 17 Node type packet ID 18 Node properties packet ID 19 Node settable packet ID 21 Node datatype packet ID 22 Node settable packet ID 24 Node datatype packet ID 25 Node name packet ID 27 Node type packet ID 28 βœ– MQTT disconnected, reason: 0 Triggering MQTT_DISCONNECTED event... ↕ Attempting to connect to MQTT... Node properties packet ID 29

And from the Mosquitto server: 1586113850: New connection from 82.34.54.32 on port 8883. 1586113852: New client connected from 82.34.54.32 as Homie-m5stickc-test (c1, k15, u'homie'). 1586113852: Socket error on client Homie-m5stickc-test, disconnecting. 1586113853: New connection from 82.34.54.32 on port 8883. 1586113855: New client connected from 82.34.54.32 as Homie-m5stickc-test (c1, k15, u'homie'). 1586113856: Socket error on client Homie-m5stickc-test, disconnecting. 1586113857: New connection from 82.34.54.32 on port 8883. 1586113859: New client connected from 82.34.54.32 as Homie-m5stickc-test (c1, k15, u'homie'). 1586113859: Socket error on client Homie-m5stickc-test, disconnecting. 1586113860: New connection from 82.34.54.32 on port 8883. 1586113862: New client connected from 82.34.54.32 as Homie-m5stickc-test (c1, k15, u'homie'). 1586113863: Socket error on client Homie-m5stickc-test, disconnecting.

bertmelis commented 4 years ago

If you're using an ESP8266: It's a problem of the ESPAsyncTCP lib. TLS is done by AxTLS (it's the only one supported by ESPAsyncTCP). AxTLS uses a shared buffer for incoming and outgoing data. So it easily gets messed up when using MQTT: pubacks, subacks, pings... Chances for messing up are practically 100%. An option is to make Homie wait to communicate with the broker untill the previous message is acked. This is rather complicated. Waiting for ESPAsyncTCP to be upgraded is...

qwandor commented 4 years ago

I'm using a ESP32, with https://github.com/tve/AsyncTCP#mbed-tls-try2 and https://github.com/tve/async-mqtt-client. Adding some more logging to AsyncTCP.cpp, it looks like AsyncClient::add is calling _close() because tcp_ssl_write is returning -1. I haven't managed to get any further yet as I haven't worked out how to get debug logs out of mbedtls.

kleini commented 4 years ago

I solved this problem for me by using some patch for async-mqtt-client, that uses then AsyncTCPbuffer for sending large amounts of MQTT messages. This makes all my devices with a lot of properties very stable. Furthermore I used a patched ESPAsyncTCP, but I don't remember the exact background here. For an example how to use my patched variants please see my implementation for a Sonoff 4CH Pro: https://github.com/kleini/sonoff-4ch/blob/master/platformio.ini Getting the Homie stuff working with SSL is a priority for me because MQTT password are transferred otherwise readable for everyone. But getting the correct patches into all these library is often a very time consuming thing.

qwandor commented 3 years ago

@kleini is that still working for you? I'm now trying to get this working on an ESP8266 (D1 Mini) and I'm having a similar problem again. It looks like the SSL handshake is timing out for some reason. I tried copying your config to use the exact same library versions and other config for my code, but it doesn't seem to help.

kleini commented 3 years ago

@qwandor I did not follow development for nearly a whole year now. Maybe something broke due to new releases and new versions, that I did not pin in my implementations. Have to check, whether I am able to upgrade to latest stuff.

kleini commented 3 years ago

@qwandor I recompile yesterday one of my firmwares and it still works without any problems with SSL on MQTT connection. Next steps for me will be to upgrade used components to latest versions as they are all over a year old.

qwandor commented 3 years ago

@kleini Thanks! I tried again with your pinned versions and this time I got it working most of the time, though it's still a bit flaky. I think some network issues on my side were making things fail before, I guess there's some problem with retries or fragmentation not working properly. I still didn't manage to get it working with the latest ESPAsyncTCP or async-mqtt-client though.

kleini commented 3 years ago

@qwandor I use patched versions of ESPAsyncTCP and async-mqtt-client, especially because those not properly working network connections cause problems in the code of these libraries, especially on async-mqtt-client, some people are already working on fixes.

battlecow commented 3 years ago

Realizing this is an old thread just wanted to post for future travelers that the delay(100) after the Homie.loop() helped to significantly stabilize my esp8266 with 3 nodes where before it would disconnect and reconnect constantly.

luebbe commented 3 years ago

@battlecow could you please try with the develop version of asyncmqtt client and without your 100ms delay? There have been a lot of improvements recently in terms of stability

battlecow commented 3 years ago

Testing again, for posterity here are the libraries im running with:

SDK:2.2.2-dev(38a443e)/Core:2.7.3-3-g2843a5ac=20703003/lwIP:STABLE-2_1_2_RELEASE/glue:1.2-30-g92add50/BearSSL:5c771be
Using library homie-esp8266 at version 3.0.0 -- 9cd83972f27b394eab8a5e3e2baef20eea1b5408
Using library async-mqtt-client at version 0.8.2 -- 2991968a97193aaa6402d146490b93ea671c7e02
Using library ESPAsyncTCP at version 1.2.2 
Using library Ticker at version 1.0 
Using library ESP8266WiFi at version 1.0 
Using library ArduinoJson at version 6.11.5
Using library Bounce2 at version 2.53

I'll drop an edit in here after some time.

luebbe commented 3 years ago

This is the "official" version pulled in by homie:

Using library async-mqtt-client at version 0.8.2 -- 2991968a97193aaa6402d146490b93ea671c7e02

can you try to add the following to your platformio.ini

lib_deps = https://github.com/marvinroger/async-mqtt-client.git#develop

and check, if it pulls in the develop version (can't check myself right now)

battlecow commented 3 years ago

I'm using the Arduino IDE and specifically cloned the repo within the library path so the directory is set to that specific SHA.

luebbe commented 3 years ago

Asyncmqtt 0.8.2 still has these problems when a lot of data has to be transferred. Many nodes -> a lot of data on startup...

battlecow commented 3 years ago

Yea I would say it isn't any less stable than the previous attempts, but I still get a lot of disconnect/reconnects which is problematic as having 3 nodes makes it take a bit to reach ready status.

luebbe commented 3 years ago

Have you ever tried vs code with the platformio plugin? It's a lot more efficient and fun to work with than the Arduino IDE. (At least for me)