Closed bertmelis closed 3 years ago
A sample sketch with the development code:
RAM: [==== ] 36.5% (used 29936 bytes from 81920 bytes)
Flash: [=== ] 28.1% (used 292984 bytes from 1044464 bytes)
Master branch:
RAM: [==== ] 36.7% (used 30104 bytes from 81920 bytes)
Flash: [=== ] 27.6% (used 288428 bytes from 1044464 bytes)
A similar library whose owner is quite patronizing (API differs a bit, so sketch is not 100% the same):
RAM: [==== ] 36.5% (used 29880 bytes from 81920 bytes)
Flash: [=== ] 28.2% (used 294636 bytes from 1044464 bytes)
I added https://github.com/marvinroger/async-mqtt-client.git#develop to the platformio.ini of my https://github.com/luebbe/homie-node-collection compiled and uploaded the sensor-nodes demo to a d1 mini.
This one contains a lot of nodes and many mqtt messages (84 topics) have to be sent upon startup. I'm running mosquitto on a raspberry Pi 1 and MQTT Explorer on Windows for monitoring.
The only notable difference when the device is reset is that it seems to take longer until all the initial messages have been received by MQTT explorer. I guess this is to be expected since they are queued up. Perhaps before messages were lost when the buffer was full.
I have also added "freeheap" to $stats in my fork of homie-esp2866, so I can monitor if anything runs away there.
The graph looks like this (around 15 minutes of history)
There is a significant drop directly after the start (expected) and then it is stable at around 30KB. You can also see the drop at the five minute intervals when sensor data is sent.
It is also interesting to see that when the $stats are sent, they now trickle in over the course of 1-3 seconds. With the previous version they came in the same second.
In which qos are you sending?
One notable difference in sending qos > 0 is that "the next message" is only sent when the previous has been acknowledged. I didn't want the logic to be too complicated to ensure proper message ordering (as per mqtt spec). In qos 0 this should not matter, but I might be wrong. I changed the logic a few times.
Homie default. I didn't change anything. MQTT Explorer says qos 0?
Homie default. I didn't change anything. MQTT Explorer says qos 0?
qos subscribing might differ from qos publishing.
Stats are sent qos 1 (https://github.com/homieiot/homie-esp8266/blob/9cd83972f27b394eab8a5e3e2baef20eea1b5408/src/Homie/Boot/BootNormal.cpp#L180). So yes, this is slower than before.
The homie convention states https://homieiot.github.io/specification/#qos-and-retained-messages that it uses qos 1, but I have to check the source to confirm. I don't know from where MQTT Explorer gets the qos 0 info.
qos subscribing might differ from qos publishing.
guess so
Stats are sent qos 1 (https://github.com/homieiot/homie-esp8266/blob/9cd83972f27b394eab8a5e3e2baef20eea1b5408/src/Homie/Boot/BootNormal.cpp#L180). So yes, this is slower than before.
Fine by me. Better slow & certain than quick & dirty
I'll leave my dev board and the monitor running at least until I go to sleep, but I'm already pretty confident that there's no memory leak.
I'll anyhow add speed up to the wishlist.
Just FYI, the sensor-demo has 84 subtopics. This is the second largest amount of my homie based applications.
For me this results in a crash. Please see attached serial output from my relays-device. I am using SSL encrypted connection. crash.log
Do you run your ESP on 160MHz? A soft watchdog reset on the sys stack during connecting is probably caused by the handshake taking too much time. Off course I can be wrong.
Edit: for the record, I am setting up TLS on my broker to test myself.
gotta look into that. But still, it's a watchdog reset, so something is taking too long. I workaround could be to not send immediately after connecting but on the first onPoll. You could try with commenting out line 382. I don't know how long a broker waits to receive a CONNECT before disconnecting.
I think, the connection to the MQTT broker does not work at all:
mosquitto | 1612816096: New connection from 192.168.168.196 on port 1883.
mosquitto | 1612816099: Client <unknown> has exceeded timeout, disconnecting.
mosquitto | 1612816099: Socket error on client <unknown>, disconnecting.
And with some debug enabled, I see
✔ Wi-Fi connected, IP: 192.168.168.196
Triggering WIFI_CONNECTED event...
↕ Attempting to connect to MQTT...
CONNECTING
TCP conn, MQTT CONNECT
new front #1
snd #1: 165/100
--------------- CUT HERE FOR EXCEPTION DECODER ---------------
Soft WDT reset
So, is somehow the connect string not sent completely? Will try to sniff network traffic.
It's been a while since I've used TLS on the esp8266. How did you solve this error:
ESPAsyncTCP/src/ESPAsyncTCP.cpp:1325:31: error: no matching function for call to 'AsyncClient::_recv(tcp_pcb*&, pbuf*&, int)'
c->_recv(pcb, p->pb, 0);
Still running fine after 3,5 hours. FreeHeap still at 30k, goes down to 29,4k .. 29,6k when values are sent and always goes back to the same max value. No SSL.
It's been a while since I've used TLS on the esp8266. How did you solve this error:
ESPAsyncTCP/src/ESPAsyncTCP.cpp:1325:31: error: no matching function for call to 'AsyncClient::_recv(tcp_pcb*&, pbuf*&, int)' c->_recv(pcb, p->pb, 0);
https://github.com/me-no-dev/ESPAsyncTCP/pull/151 <- that requires a fix from @mhightower83, which is not merged for a 3/4 year now.
Sorry, enough for me for today.
Shoot me, with logging enable it gives this output:
CONNECTING <-- connect call by the user
TCP conn, MQTT CONNECT <-- TCP is connected (TLS handshake done)
new front #1 <-- CONNECT packet is added to the queue
snd #1: 101/40 <-- TCP sends 101 bytes whereas the CONNECT was only 40.
So the TCP lib is not returning the actual written payload when using encryption???
Should TCP library normally return exact the amount of bytes, that have been sent? I saw the similar behaviour some comments above.
It does, but only the non-tls. I guess it returns the encrypted TCP payload size. If you ask me, this a bad API design.
@luebbe To speed up Homie you can use set_tcp_nodelay
to true
if you use mosquitto as broker. But I added "inflight messages > 1" to the TODO.
I'm thinking about a workaround for TLS. It wont be pretty as it is impossible to know how much data actually has been sent: We can wait to send for the entire TCP buffer to be empty. Then copy a fixed amount to LWIP and "ack" our own data.
EDIT: Thinking about it... I might keep 2 sets of counters: one from tcp and one of our own and keep them in sync. I will work it out in my head while clearing the snow from the driveway.
I've added the hacky solution for TLS. I've tested it with the example in the repo + added a 2000 byte message for chuncked sending.
Did it like described above.
It works!
works.log
Reading your pull request, I see the _acked
is already increased when sending the TLS data as it is unclear how log the data will be. The returned written length seems to be somewhere deep inside axTLS of ESPAsyncTCP. Pretty hard to just read and understand that code.
Yes, there is room for improvement. I didn't want to touch the existing code too much. _acked is keeping track of the acked data in the mqtt client. But since we're copying the data to TCP, we can already consider it acked. On non-tls versions, _acked is only incremented when the real zck comes in.
I also realized I can use client->space() instead of the fixed number.
I'm going to give it a rest for a week or two (covid restrictions a eased a bit, whoohoo). I'll keep an eye on the issue and PR tracker and read whatever is going on. I'll keep supporting!
Meanwhile I'll think about possible improvements:
Awesome work so far, I must say! Thank you very much @bertmelis !
I'll go through old issues backwards and check if I can reproduce them. If not I'll tag them with "to be closed" and we can review together. Won't test TLS related stuff though. I didn't want to jump through that hoop yet.
@luebbe I can do the TLS related stuff.
I can do the TLS related stuff.
There might be a serious issue with TLS though. The last thing I was testing was to destroy the client. On non-TLS is was not problem. On TLS it crashed. Now because of a bug in the ESPAsyncTCP library, I had to have the tcp client on the heap to be able to destroy and recreate on each (re)connect. Should destroying the mqttclient's crash be related to the destruction of the tcp client, it will also crash when the mqtt client disconnects. I didn't test this.
It is solveable because the bug is related to acking data and we bypass this when using TLS. However, it involves a lot of preprocesser stuf...
Somebody pointed me to the fact that on both ESP8266 and ESP32, data is copied to LWIP instead of the data being referenced. So there never were dangling pointers. This means I can simplify the develop
code as I don't need to keep data only for acking (It is still needed for retries etc).
see: https://git.savannah.nongnu.org/cgit/lwip.git/tree/src/core/tcp_out.c?h=STABLE-2_1_2_RELEASE#n422 https://github.com/d-a-v/esp82xx-nonos-linklayer/blob/7421258237b7c8f61629226961af498a0a6e0096/glue-lwip/arduino/lwipopts.h#L1665 https://github.com/espressif/esp-idf/blob/6e776946d01ec0d081d09000c36d23ec1d318c06/components/lwip/port/esp32/include/lwipopts.h#L449 The COPY flag is set by default with the configuration LWIP is built for ESP8266 and ESP32 and cannot be changed. If we would keep the data, it will in fact be kept twice.
Hence, I'll simplify the code and keep only the data that has to be acked by the MQTT protocol.
Thanks @bertmelis
My test rig runs stable with develop, but there's no heavy load.
Two weeks ago I wrote:
I'll go through old issues backwards and check if I can reproduce them. If not I'll tag them with "to be closed" and we can review together.
I did this and tagged a handful of tickets. If someone could review these: https://github.com/marvinroger/async-mqtt-client/labels/to%20be%20closed and close where appropriate?
:+1: I'm currently trying to implement the "don't wait for confirmation before sending the next". This does require some next level thinking. Will take a few days depending on how much time I can spend.
For me everything works as well. Even OTA over MQTT over TLS improved and works better now.
lovely to hear :smile:
I also want to report - I have develop branch in 5 devices in two different locations, all of them connected to my MQTT VPN. So far so good - uptime is about 4 days now. VPN network interface statistics on my PC is: Everything seems to be working smoothly - OTA from PIO, OTA via web page, web page access, ping, ... So I can say great job and big thank You...
Hi Bert Thanks for doing this. Only issue I've found so far is the char* payload onMessage is often over length. eg 43 chars instead of 32:
Received MQTT topic: homeassistant/localtime
payload* length: 43
2021-02-24T20:28:20.236189+00:0�?w␂d␂�� @
This is from the master branch:
Received MQTT topic: homeassistant/localtime
payload* length: 32
2021-02-24T20:33:05.467993+00:00
All the best
Strange... I didn't touch that part of the code (not that I can remember, that is :wink:)
However, I did spot some other bugs. On disconnect with a session present, it might block and crash by the watchdog... Fixing it as we speak.
I will look into your issue also!
Thank you Bert - apologies for no test script (recovering from Covid, so energy levels are low) but it is a very trivial MQTT payload that doesn't change between async-mqtt-client builds. I'm only publishing and subscribing, so not testing all the features!
no worries, although it would be easier if you could test again with logging enabled. A question though: it is always over length or only sometimes?
It appears to always be the same over length (for this particular payload). Forgive my laziness but how do I enable logging? I see log_i and log_w but how to turn it on?
Do you use Arduino IDE or something else (VSCode/Platformio)?
I'm using platformio for the first time via VSCode on Ubuntu.
my platformio.ini:
[env:esp8266]
platform = espressif8266
board = d1_mini
build_flags =
-Tesp8266.flash.4m1m.ld
-D PIO_FRAMEWORK_ARDUINO_LWIP2_HIGHER_BANDWIDTH
;-D ASYNC_TCP_SSL_ENABLED
-D DEBUG_ESP_PORT=Serial
-D DEBUG_ASYNC_MQTT_CLIENT
monitor_filters = esp8266_exception_decoder
monitor_speed = 115200
Debug output goes to Serial
.
FYI I've started testing on the ESP32 - will report back any issues here
my platformio.ini: Thank you.
I hope this helps (have deleted some chars as they were the bulk of the mqtt broker password!) A different wrong length with this build.
Subscribing to: homeassistant/localtime... SUBSCRIBE
new back #8
snd #8: 30/30
success
PUBLISH
new back #3
ack 30 (30)
data rcv (5)
rcv SUBACK
SUBACK
SUB released
rmv #8
snd #3: 35/35
p #3 rel
ack 35 (35)
data rcv (59)
rcv PUBLISH
Received MQTT topic: homeassistant/localtime
payload* length: 61
str::payload length: 61
2021-02-24T21:56:09.727588+00:00alt����reqũ�� <!!!part of mqtt password!!!>
setLocaltime(): 2021-02-24T21:56:09.727588+00:00alt����reqũ�� <!!!part of mqtt password!!!> len: 61
pm open,type:2 0
PING
new back #12
snd #12: 2/2
I updated the develop
code again to simplify and iron out some bugs.
@OldGreyCells You receive data with a total length 59 (that is: including MQTT headers), how do you manage to get a payload of 61 out of it. Could you share your callback code? Mind that the payload is binary and may not include a 0-terminator for c-strings. So you cannot do strlen(payload)
.
Mind that the payload is binary and may not include a 0-terminator for c-strings. So you cannot do
strlen(payload)
.
Exactly my thoughts.
I updated the
develop
code again to simplify and iron out some bugs.@OldGreyCells You receive data with a total length 59 (that is: including MQTT headers), how do you manage to get a payload of 61 out of it. Could you share your callback code? Mind that the payload is binary and may not include a 0-terminator for c-strings. So you cannot do
strlen(payload)
.
Aha! That may well be the cause - I am doing a strlen(payload). My apologies if I've wasted your time - I've been a little* sub-par.
*understatement - this is my 'distraction'.
I'm been doing some work on the https://github.com/marvinroger/async-mqtt-client/tree/develop branch lately.
It should solve lots of the current issues. The main improvement is that outgoing messages are queued and the memory is only freed after the TCP ack allowes to do so.
:exclamation: : although the API remained unchanged, the exlamation mark denotes a change in behaviour.
I'm looking forward to feedback. I do like positive feedback :smile:, but I'm more interested in the negative ones. I still have to go through the list of mandatory statements of the MQTT specification. TLS is also untested but there has been no change in that part of the code.