khoih-prog / AsyncMQTT_Generic

Arduino Arduino Library for ESP8266, ESP32, Portenta_H7, STM32 and RP2040W asynchronous MQTT client implementation. This library, ported to support ESP32, WT32_ETH01 (ESP32 + LAN8720), ESP8266, Portenta_H7 (Ethernet or WiFi) and STM32 (LAN8742A or LAN8720 Ethernet), Teensy 4.1 using QNEthernet, RASPBERRY_PI_PICO_W with CYW43439 WiFi. Currently supporting TLS/SSL for ESP32 only
MIT License
62 stars 10 forks source link

Combining Async_MQTT Generic and Portenta_H7_AsyncWebServer fails #30

Closed javos65 closed 1 year ago

javos65 commented 1 year ago

Hi, back after some tests.

Combined 2 examples, exactly one-on-one: Portenta_H7_AsyncWebServer plus AsyncMQTT_Generic Same issue as with the PubSubClinet library: Mbed OS crashes after webserver html calls within 1-2 minutes. Problem : Both Async libraries can not co-exists.

Can' t post the issue at the Portenta_H7_AsyncWebServer git as its archived after our last mail exchange.

Jay

Arduino IDE 1.8.18 Arduino IDE 2.0 Portenta H7 rev2 lib Portenta_H7_AsyncWebServer 1.4.2 lib Portenta_H7_AsyncTCP 1.4.0 lib AsyncMQTT_Generic 1.8.0

khoih-prog commented 1 year ago

HI @javos65

Good you've done some tests.

I'm afraid there is some issue either with Portenta_H7 / mbed / libraries or combined issues.

Try using new examples for ESP32 and RP2040W at

  1. AsyncWebServer_MQTT_RP2040W for RP2040W
  2. AsyncWebServer_MQTT for ESP32

As I don't have a working Portenta_H7 anymore, I don't think I can help anything here.

After testing these examples, if OK, you can post the issue on Arduino MBED core to ask for help. The issue might be very deep inside the Portenta_H7 core / libraries, because multi-core processing / managing , etc.

ESP32 and RP2040W are multi-core MPU, but still OK

Good Luck,

javos65 commented 1 year ago

Thank you. I look into some more details and testing and post it at Mbed support My impression is thats indeed an mbed OS issue, maybe related to the Murata Wifi module drivers to support multiple clients. close this case

khoih-prog commented 1 year ago

Also try with previous core versions (v2.5.4-) to see if recent cores break something.

javos65 commented 1 year ago

Tested it on various Mbed OS versions : 3.5.4, 3.5.1, 3.3.0, 2.8.0 and 2.5.2 All fail after requesting web-calls See: https://github.com/javos65/AsyncWebServer_plus_MQTT

khoih-prog commented 1 year ago

Please also test only the example Async_AdvancedWebServer_SendChunked.

If not working with previous versions, it could be a severe issue with the core mods, etc. since it was tested extensively then when Portenta_H7_AsyncWebServer v1.4.2 created

javos65 commented 1 year ago

Tested the SendChunked version: fails as well Tested a lean version - web-pages all inline-coded : fails as well, but takes a longer time.

(I reinstalled the toolchain and updated firmware prior to this testing - just to be sure.)

khoih-prog commented 1 year ago

Hi @salasidis

Tested the SendChunked version: fails as well Tested a lean version - web-pages all inline-coded : fails as well, but takes a longer time. (I reinstalled the toolchain and updated firmware prior to this testing - just to be sure.)

Could you please check, if having spare time, why the code we tested and OK before, such as Async_AdvancedWebServer_SendChunked, etc. now suddenly can't run and just crashing. Very weird as we tested extensively and OK then. Is that something relating to the recent core mods ?

Can the recent core mods have something to do with your issue recently, posted in Async Web Server - becomes unresponsive after 1-4 days of use

I don't have the working Portenta_H7 now and can't know what's wrong. Can you help shedding some light.

salasidis commented 1 year ago

I have not tried MQTT yet however, and it is on my list of things to add to the project.

My crashing has been going on for a very long time - even before these mods, and I have been unable to figure it out - so I don't think related to the mods (I did the mods to see if it could solve some of the intermittent crashing issues).

I then thought it was due to insufficient stack size on the LWIP thread, but even after increasing it (libmbed.a compile), it still crashed. It is possible that there is some lower level issue that is causing these failures with Portenta / MBed in the main Ethernet library (I ran my unit with no ethernet, and simply collecting sensor data and logging to SD, and it ran with no failures)

I know that the LWIP thread likes to take 5-8k of stack space in my case, but is only allocated 1200 by 3.5.4. I have recompiled libmbed.a to give it more space - maybe when MQTT is used this becomes more important??

As far as immediate crashes, - I am still sending 100+k web page with no issue. I am running 3.5.4. And have updated all the libraries. Do you have an example of where this fails in order to reproduce. I can try running it on one of the portentas. Is it only if MQTT is installed.

I have a J-link, and could single step through the code - if it crashes in 1-2 minutes, that may make it easier to debug (mycrash takes 3-7 days to occur). I would also compile it with the larger LWIP stack.

khoih-prog commented 1 year ago

HI @salasidis

Sorry for not clear. The crashing code is from Portenta_H7_AsyncWebServer library, without MQTT yet, as @javos65 specified.

@javos65

Can you make a copy of the issue to Portenta_H7_AsyncWebServer, which has been un-archived recentty.

salasidis commented 1 year ago

I am sending 1 packet every 3 seconds, load a 100k web page 1-2x/day, and do an NTP time every hour. I also do a modbus poll every 5 seconds or so.

WIth all that the crash happens every 3-7 days. It happens at random times, or sometimes when the computer that has the running web page comes out of a locked screen - that was off. There are no stack overruns that I ever saw, there are no blasting retransmits happening (verified by wireshark).

I can run the example when available, and let you know (I have a 3rd unused - new portenta I will use for testing this).