Open sysizlayan opened 6 years ago
I can confirm the same behaviour. We were noticing our esp32s go into a zombie state if we keep it running for a day or two. Found the issue to be pubsub client connect hangs whenever the wifi connection is lost. I was about to create a new issue but found it is already created :)
To be more specific, it hangs exactly on client connect method call _client->connect(this->domain, this->port);
or _client->connect(this->ip, this->port)
.
or
Thinking about it, may be not related to pubsubclient library. We may need to fix it at esp32 wifi client level.
I did some alteration to my code, to listen to WiFiEvents and handling disconnect there. This resolved the issue. Once the WiFi connection is back, pubsubclient resumes the connection and able to connect to server.
Sample WiFiEvent handling for reference.
#include <WiFi.h>
const char* ssid="ssid"
const char* password="password"
static volatile bool wifi_connected = false;
void WiFiEvent(WiFiEvent_t event)
{
switch (event)
{
case SYSTEM_EVENT_AP_START:
//can set ap hostname here
WiFi.softAPsetHostname(_node_name.c_str());
//enable ap ipv6 here
WiFi.softAPenableIpV6();
break;
case SYSTEM_EVENT_STA_START:
//set sta hostname here
WiFi.setHostname(_node_name.c_str());
break;
case SYSTEM_EVENT_STA_CONNECTED:
//enable sta ipv6 here
WiFi.enableIpV6();
break;
case SYSTEM_EVENT_AP_STA_GOT_IP6:
//both interfaces get the same event
Serial.print("STA IPv6: ");
Serial.println(WiFi.localIPv6());
Serial.print("AP IPv6: ");
Serial.println(WiFi.softAPIPv6());
break;
case SYSTEM_EVENT_STA_GOT_IP:
Serial.println("WiFi connected");
Serial.println("IP address: ");
Serial.println(WiFi.localIP());
wifi_connected = true;
break;
case SYSTEM_EVENT_STA_DISCONNECTED:
wifi_connected = false;
Serial.println("STA Disconnected");
delay(1000);
WiFi.begin(ssid, password);
break;
}
}
void setup()
{
WiFi.onEvent(WiFiEvent);
WiFi.begin(ssid, password);
}
void loop() {
if(wifi_connected) {
wifi_connected_loop();
} else {
wifi_not_connected_loop();
}
}
void wifi_connected_loop() {
//Do something here
}
void wifi_not_connected_loop() {
//Do something else here
}
This issue can be closed and may be the sample code can be updated with WiFiEvents rather than blocking while loop.
@tsathishkumar that looks really good - I wasn't aware of the WifiEvents stuff. If you wanted to open a pull-request with a suitable update to the example that would be excellent.
Sure, I'll do that.
My problem cannot be solved by event action because event is fired after connect function of the library is called. Only way to handle that way may be deleting the main loop thread and creating that again, it is no different than restarting the esp32. Do you think is it because of the client class written in esp32/arduino core?
Actually it can be solved with events. I had tested it. Because once the event thread successfully establishes the connection, other thread will continue working fine.
Can you test and update here please?
On Mon, Jun 18, 2018, 8:52 PM sysizlayan notifications@github.com wrote:
My problem cannot be solved by event action because event is fired after connect function of the library is called. Only way to handle that way may be deleting the main loop thread and creating that again, it is no different than restarting the esp32. Do you think is it because of the client class written in esp32/arduino core?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/knolleary/pubsubclient/issues/444#issuecomment-398092342, or mute the thread https://github.com/notifications/unsubscribe-auth/ABaM5FUlvhrI1ve8SA7tbMom_s7dumPiks5t98W6gaJpZM4UN7wG .
@tsathishkumar I think we can safely remove: delay(1000); WiFi.begin(ssid, password); from SYSTEM_EVENT_STA_DISCONNECTED event. WiFi.begin is required only if ssid & password needs to be changed.
@tanmaysathe But, does it automatically try to connect again?
Once wifi connection is made, the chip stores SSID & password in the flash. It automatically reconnects to last used access point without any need of calling WiFi.begin function thereafter. https://arduino-https://arduino-esp8266.readthedocs.io/en/latest/esp8266wifi/readme.html#quick-start
I remember reading an issue where the auto reconnect was not working. Not sure if that was resolved. Found a reference in ESP32 site.
http://esp-idf.readthedocs.io/en/latest/api-guides/wifi.html
s6.2: In the scenario described above, the application event callback function relays
to the application task. We recommend that esp_wifi_connect() be called to reconnect the Wi-Fi, close all sockets and re-create them if necessary. Refer to .
So, the recommendation is to close all open sockets, reconnect WiFi and recreate the sockets.
Autoreconnect can be configured in esp32 arduino core but the support for persistent connection(Where ssid and password is saved to non-volatile memory) is still and issue. For autoconnect, one needs to first start the WiFi connection with ssid and password, those values are saved in the memory. If the connection is broken, it tries to reconnect again automatically until restart.
My issue is still going by the way @tsathishkumar. To be more clear, I first call the "connect" function of pubsubclient. Then the event is fired as STA_DISCONNECTED. During this process, the connection trial does not stop, only the event function is called. Even if I call the WiFi.begin again, when the event function returns, my code waits on the porint where I call connect. The socket or the client does not give any timeout error.
Even if I call the WiFi.begin again, when the event function returns, my code waits on the porint where I call connect. The socket or the client does not give any timeout error.
You mean, you got the SYSTEM_EVENT_STA_CONNECTED event and still the pubsub client was waiting on previous call? Because when I tested the pubsub client returned after the successful connection.
I mean exactly that. In normal operation, the connect function should return a timeout and keep going but the system stucks in there. Imagine a case where the Wifi does not exists anymore, the program will remain in there forever.
Because of that it is an issue, the client does not return any timeout.
In my case, when the WiFi connected again, the function returned. I will do another test sometime later and update here.
I understand and it may be possible but it shows only that there is a while loop somewhere. If the network is not available anymore, the code will remain there forever. I am currently detecting that using a watchdog timer and only recover way is restart.
What I want to understand, is the loop causing this in pubsubclient or in wifiClient?
It is in wifi client as far as I know. @knolleary please correct me if I'm wrong.
Im using this library in my CC3200 energia and my code is getting stuck at if (!client.connect(flash_buffer, "xxx", "xxx")) function, i have done all changes given above but still no effect my device is getting hang at that particular point if disconnected
if (!client.connected()) { //Serial.println("Disconnected. Reconnecting...."); led_indication_data = 0x01; master_led_config();
if (!client.connect(flash_buffer, "xxx", "xxx"))
{
led_indication_data = 0x01;
master_led_config();
//Serial.println("Connection failed");
}
else
{
led_indication_data = 0x02;
master_led_config();
//Serial.println("Connection success");
if (client.subscribe(inTopic))
{
//Serial.println("Subscription successfull");
}
}
}
There is one more case here.
Any workaround for this ?
Can this be the problem?
PubSub uses the connect methods without timeout from the WiFi library. Internally the client uses -1
if (domain != NULL) { result = _client->connect(this->domain, this->port); } else { result = _client->connect(this->ip, this-> }
maybe adding the timeout here resolves the problem
if (domain != NULL) { result = _client->connect(this->domain, this->port, 2000); } else { result = _client->connect(this->ip, this->port, 2000); }
or replace the 2000 with an constant in the header
As I just saw this only possible for the ESP32, and not for the ESP8266.
Hi, I was testing a case where network is lost on my esp32 project. While I was testing I simply turn off the router then open it after some time.
The problem occurs when a mqtt client tries to connect. I turned off the network just before the mqtt client tries to connect. The blocked segment is given below:
The program remains stuck until the watchdog is triggered. I cannot solve the problem.
Please check if there is a timeout or not for that. Thanks in advance :)