martin-ger / esp_wifi_repeater

A full functional WiFi Repeater (correctly: a WiFi NAT Router)
MIT License
4.82k stars 907 forks source link

Crash / hanging #144

Open destroyedlolo opened 6 years ago

destroyedlolo commented 6 years ago

Hello,

I'm using an ESP-01 with esp_wifi_repeater in order to expand my WiFi to my garden. The architecture is quite simple : 1/ the head of my network is a BananaPI under Gentoo which is running Mosquitto MQTT broker 2/ the home WIFI is provided by my Internet Provider box (French's Freebox v5) 3/ in the middle, I have this ESP-01 which is publishing some MQTT figures has well in order to monitor it's health 4/ In my garden, I have an ESP8266-201 which is connecting to the ESP-01 in order to publish my chicken coop figures. It's waking up from deep sleep every 5 minutes for that and fall back to my ISP box if ESP-01's can't be reached.

As I have many subscriber/publisher on the BananaPI's Mosquitto, I'm pretty sure I don't have a pb here.

Time to time, the ESP-201 can't connect at all on the ESP-01 or if it can, it can't establish it's MQTT connexion. ESP-01 "logs" highlight some connection tentative with immediate disconnect.

Time to time, the ESP-01 stops to publish its figures. It recovers by itself.

Very few time, the ESP-201 can't reach ESP-01 network and the ESP-01 doesn't publish anything anymore : the only solution is a power cycle.

I don't think it's a power issue, because I've put a BIG capacitor on ESP-01 pins, and it is powered thru a DC-DC 5v-3v converter, itself powered by my smart home automation network : other probes doesn't complain about power stability.

Last, I've soldered XPD_DCDC pin to RST one.

So : 1/ is it a way to have a watchdog got the ESP-01/esp_wifi_repeater reboot in case of hanging ? 2/ is it a way to reboot automatically the ESP-01/esp_wifi if it encounter a network issue ? 3/ does someone facing this kind of issue ?

Thanks

Laurent

martin-ger commented 6 years ago

I don't know what is happening there. Do you have any serial output infos?

But now I added two watchdog timers: set ap_watchdog secs set client_watchdog secs

The ap_watchdog observes the uplink AP interface, the client_watchdog the interface to the connected STAs.

If there is no packet received on these interfaces within the secs interval, the watchdog resets the repeater. Default is "none" and they can be set independently.

In your case you would probably set the client_watchdog to some value above the 5 minutes. When the client fails to connects, there is most probably no traffic on this interface and a reset happens. The next connect request should work again.

destroyedlolo commented 6 years ago

Hello, Sorry for this late reply I was in vacation.

Unfortunately none of the reperter or my probe is close to a machine able to collect serial output :(

I'll try with your new watchdog and will let you updated.

destroyedlolo commented 6 years ago

Hi Martin,

My gateway is running for 3 weeks now without issue (and without upgrading). I suspect my problem was caused because the ESP-01 antenna wasn't correctly aligned with my ESP-201 one. Then I think it was causing incomplete frame transmission or such, and then probably resources leakage on the gateway side.

But it's only guesses.

destroyedlolo commented 6 years ago

Well, I've been a bit optimistic : the pb raised again yesterday after 4 weeks of current run. So, I'll update my ESP and try your watch dogs.

destroyedlolo commented 6 years ago

Hi Martin,

I'm still facing this issue and both automatic reset as discussed or "manual" reset (sent by an MQTT order) are not enough. The only way to recover is to do a power cycle on the repeater (no need to do anything on other devices).

When the issue is raising, connections become erratic on the repeater and I'm facing disconnection on my Internet box WiFi. After restarting the repeater, everything return to normal.

I think the problem is cause by poor WiFi signal from one of my remote device, as I can see from MQTT flow lot of connection attempts, something without the matching disconnect :

20180725 16:06:11 ESPRouter_Domo/join 5c:cf:7f:17:55:70 20180725 16:07:01 ESPRouter_Domo/Uptime 122 20180725 16:07:01 ESPRouter_Domo/Vdd 3377 20180725 16:07:01 ESPRouter_Domo/NoStations 1 20180725 16:07:42 ESPRouter_Domo/leave 5c:cf:7f:17:55:70 20180725 16:07:43 ESPRouter_Domo/join 5c:cf:7f:17:55:70 20180725 16:07:53 ESPRouter_Domo/leave 5c:cf:7f:17:55:70 20180725 16:07:54 ESPRouter_Domo/join 5c:cf:7f:17:55:70 20180725 16:08:02 ESPRouter_Domo/Uptime 183 20180725 16:08:02 ESPRouter_Domo/Vdd 3377 20180725 16:08:02 ESPRouter_Domo/NoStations 1 20180725 16:08:04 ESPRouter_Domo/leave 5c:cf:7f:17:55:70 20180725 16:08:04 ESPRouter_Domo/join 5c:cf:7f:17:55:70 20180725 16:08:14 ESPRouter_Domo/leave 5c:cf:7f:17:55:70 20180725 16:08:14 ESPRouter_Domo/join 5c:cf:7f:17:55:70 20180725 16:08:24 ESPRouter_Domo/leave 5c:cf:7f:17:55:70 20180725 16:08:24 ESPRouter_Domo/join 5c:cf:7f:17:55:70 20180725 16:08:34 ESPRouter_Domo/leave 5c:cf:7f:17:55:70 20180725 16:08:34 ESPRouter_Domo/join 5c:cf:7f:17:55:70 20180725 16:08:44 ESPRouter_Domo/leave 5c:cf:7f:17:55:70 20180725 16:08:44 ESPRouter_Domo/join 5c:cf:7f:17:55:70 20180725 16:08:54 ESPRouter_Domo/leave 5c:cf:7f:17:55:70 20180725 16:08:54 ESPRouter_Domo/join 5c:cf:7f:17:55:70 20180725 16:09:03 ESPRouter_Domo/Uptime 244 20180725 16:09:03 ESPRouter_Domo/Vdd 3377 20180725 16:09:03 ESPRouter_Domo/NoStations 1 20180725 16:09:04 ESPRouter_Domo/leave 5c:cf:7f:17:55:70 20180725 16:09:04 ESPRouter_Domo/join 5c:cf:7f:17:55:70 20180725 16:09:14 ESPRouter_Domo/leave 5c:cf:7f:17:55:70 20180725 16:09:18 ESPRouter_Domo/join 5c:cf:7f:17:55:70 20180725 16:09:19 ESPRouter_Domo/leave 5c:cf:7f:17:55:70 20180725 16:10:04 ESPRouter_Domo/Uptime 305 20180725 16:10:04 ESPRouter_Domo/Vdd 3378 20180725 16:10:04 ESPRouter_Domo/NoStations 0 Normally this device connect every 5 minutes.

Bye

martin-ger commented 6 years ago

The difference in the state of the chip between a power cycle and a reset ist not clear to me. Understanding that would probably help to trace down the problem...

destroyedlolo commented 6 years ago

Well, I think I found out the problem : the filtering capacitor was too small. I put a 1000uF and it seems it's improving a bit the situation ... even if I'm encountering less frequently crashes.

Last time, the ESP seems having restarted (the LED blink) but it wasn't connected to MY network ... and I duno where it was connected. I did several restarts by disconnecting CH_PD but with the same result. Power cycling solved the issue again.

Do you save anything into the "RTC" memory ? Because I know the only way to clear it is a power cycle.

martin-ger commented 6 years ago

No, the esp_wifi_repeater doesn't use the RTC mem.