nodemcu / nodemcu-firmware

Lua based interactive firmware for ESP8266, ESP8285 and ESP32
https://nodemcu.readthedocs.io
MIT License
7.66k stars 3.12k forks source link

Something is wrong with MQTT #3356

Closed chathurangawijetunge closed 2 years ago

chathurangawijetunge commented 3 years ago

NodeMCU 3.0.0.0 built on nodemcu-build.com provided by frightanic.com branch: dev commit: 0fb2a121c18553d71955888f5e719289acfb6a75 release: release DTS: 202012252235 SSL: false build type: integer LFS: 0x40000 bytes total capacity modules: file,gpio,mqtt,net,node,rtctime,sjson,sntp,tmr,uart,wifi build 2020-12-27 02:08 powered by Lua 5.1.4 on SDK 3.0.1-dev(fce080e)

with even the stranded example MQTT code, it act as the connection is okay, but do not receive subscribe messages, publish do also work as if it is normal but publish masses want reach the broker, no offline message is triggered. this happens after running for long time (over 12 hours)

chathurangawijetunge commented 3 years ago
URL="broker"

m = mqtt.Client(node.chipid(),30,"user","pwd")
m:lwt("test/lwt","offline",0,1)   
m:on("connect", function(client) print ("connected") end)
m:on("connfail", function(client, reason) print ("connection failed", reason) end)
m:on("offline", function(client) print ("offline") start_mqtt() end)

m:on("message", function(client, topic, data)
  print(topic .. ":" )
  if data ~= nil then
    print(data)
  end
end)

m:on("overflow", function(client, topic, data)
  print(topic .. " partial overflowed message: " .. data )
end)

function start_mqtt()
 tmr.create():alarm(3000,0, function()
   m:connect(URL, 1883, false, function(client)
     print("connected")
     client:publish("test/lwt","online",0,1)
     client:subscribe("test/lwt",0,nil) 
     client:subscribe("test",0,nil)    
   end,
   function(client, reason)
     print("failed reason: " .. reason)
     start_mqtt()
   end)
 end)
end

start_mqtt()

this simple code connect to the broker if broker get disconnect it will reconnect but with wifi.sta.disconnect() and after wifi.sta.connect() it shows connect but it does not

nwf commented 3 years ago

Many things are wrong with MQTT (#2987, #3068, doubtless many more). My https://github.com/nwf/nodemcu-firmware/tree/dev-active branch has some fixes and refactorings that may help, but many, many things remain wrong with MQTT even after all that work and it's just been too depressing to even contemplate fixing and nobody seems really bothered by it.

Please attempt packet capture and investigate what's going on at the network level, together with transcripts from your demo program and other MQTT clients of the broker. That is, it would be most helpful to have narrative logs, with packet traces and debug information of the form "NodeMCU Device Under Test (DUT) connects and sends X Y Z to broker; broker establishes subscriptions and sends A B C to DUT; a client publishes M to Q and the broker forwards that to DUT, which acknowledges; 11 hours pass with no network traffic beyond MQTT PING and PONG between broker and DUT; a client publishes N to Q; the broker sends this to DUT, which fails to acknowledge and reports internally [...]". I'm aware that this is a huge amount of work, but someone's going to have to do it, and so far nobody, including me, has really been champing at the bit.

(ETA: Making things even more depressing... Even if we get MQTT right, it's likely that there are nigh unsolvable issues below, given, for example, #3040. It's not clear that there's a better solution at present than to give up and admit that NodeMCU is not a high-reliability platform except in very constrained circumstances; in general, your application and remote endpoints should conspire to actively keep and feed watchdog timers that cause reboots rather than trying to fix anything without.)

marcelstoer commented 3 years ago

OT but we gotta discuss this somewhere...

@nwf what is the best way out of this misery? The "upstream" https://github.com/tuanpmt/esp_mqtt has been unmaintained since 2017. Hence, we can't turn to it for fixes to port. Options:

My https://github.com/nwf/nodemcu-firmware/tree/dev-active branch has some fixes and refactorings that may help

Can we at least merge those?

HHHartmann commented 3 years ago

This has tests opposed to tuanpmt/ESP8266MQTTClient, which might lead to better quality.

Can we at least merge those?

Sounds reasonable

chathurangawijetunge commented 3 years ago

i think i have found a small workaround. by adding a timer for connection it solves my issue for the time being.

URL="broker"

m = mqtt.Client(node.chipid(),30,"user","pwd")
m:lwt("test/lwt","offline",0,1)   
--m:on("connect", function(client) print ("connected") end)
m:on("offline" ,start_mqtt) 
m:on("connfail",start_mqtt)

m:on("message", function(client, topic, data)
  print(topic .. ":" )
  if data ~= nil then
    print(data)
  end
end)

--m:on("overflow", function(client, topic, data)
--  print(topic .. " partial overflowed message: " .. data )
--end)

Mqtt_Conn_tmr=tmr.create()

function start_mqtt()
 Mqtt_Conn_tmr:alarm(3000,0, function()
   m:connect(URL, 1883, false, function(client)
     print("connected")
     client:publish("test/lwt","online",0,1)
     client:subscribe("test/lwt",0,nil) 
     client:subscribe("test",0,nil)    
   end,
   function(client, reason)
     print("failed reason: " .. reason)
     start_mqtt()
   end)
 end)
end

start_mqtt()
stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.