xyzroe / XZG

Unifies the best innovations from previous Zigbee gateway projects into a single, comprehensive solution
https://xzg.xyzroe.cc/
GNU General Public License v3.0
127 stars 27 forks source link

ESP32 firmware is not updated via the web interface #42

Closed Sergey8100 closed 5 months ago

Sergey8100 commented 5 months ago

Доброго времени суток. Постоянно получаю уведомление о доступности новой прошивки, но при попытки обновления постоянно висит в статусе "Скачивание последнего релиза GitHub...". Пробовал перезагружать роутер и координатор, ничего не помогает.

Sergey8100 commented 5 months ago

Good day. I constantly receive a notification about the availability of new firmware, but when I try to update, it always hangs in the status “Downloading the latest GitHub release...”. I tried rebooting the router and coordinator, nothing helps.

xyzroe commented 5 months ago

Thanks for bug report. I already know about that bug. Working to solve it. You can try to reboot gateway and try again just after reboot. Or use local file update option.

koeberlue commented 5 months ago

As a workaround setting the timezone to Europe/Kiev and then restarting when prompted solved the issue for me.

(Maybe it was just the restart that helped)

Edit: This probably was a coincidence. See https://github.com/xyzroe/XZG/issues/42#issuecomment-2137229946 for more details

Sergey8100 commented 5 months ago

Rebooting didn't help me. I updated via file.

koeberlue commented 5 months ago

When I looked at the Status page for some time I noticed that the entry for the DNS server is changing constantly. Strangely this behavior somehow seems to be connected to the "Local time".

Every time the local time shows a full minute (e.g. Wed May 29 14:51:00 2024) the DNS server entry will show 192. 168.188.3 (which is the one my router promotes via DHCP).

Then exactly after 19 seconds (e.g. Wed May 29 14:51:19 2024) the DNS server entry will show 253.0.0.0 for the rest of the minute.

So for the firmware update to succeed I have to wait until a full minute starts, then I have 19 seconds to start the update.

Maybe someone more knowledgeable about the firmware can draw the correct conclusions from this.

xyzroe commented 5 months ago

Your router renews DHCP lease every minute. So your device drops DNS to 253.0.0.0 (bug of current version of LWIP) every minute. XZG runs mechanism to restore original DNS servers received from DHCP also every minute. You need to increase your DHCP lease time for XZG in your router.

koeberlue commented 5 months ago

Your router renews DHCP lease every minute

I am not so sure about that. Note that I am not that proficient in the DHCP protocol, but I checked my router and it should renew the leases every 9 days. I checked some Wireshark dumps on the router and on my local machine and I could not find any DHCP packages at all.

Are there any other network packages known to cause this behavior? Is there some way for me to capture network packages on the ESP?

xyzroe commented 5 months ago

Hm.. Interesting. I don't know any way to capture network packages on the ESP. I have to surf the net for more information about this problem.

koeberlue commented 5 months ago

I did some more debugging and I share your suspicion that something on my network is causing this behavior.

What caught my eye is the fact, that the DNS server is always reset to 253.0.0.0 at the beginning of the 19th second of every minute. I briefly checked the code in this repository but I could not find anything that should be doing anything special at that second.

I will dump some things I discovered, maybe this might help painting the bigger picture:

Looking through the wireshark dump from the WiFi-Extender the device was connected to, I found these packages, which were sent at the 19th seconds on both captures I made:

No. Time Source Destination Protocol Length Info
523 16:55:19,242220 fe80::c52c:4792:8724:1e13 ff02::1 ICMPv6 104 Neighbor Advertisement redacted:2df4:f9c4:ba15:279f (ovr) is at dc:a6:32:de:e5:fb
524 16:55:19,242848 fe80::dea6:32ff:fe60:a3ec ff02::1 ICMPv6 104 Neighbor Advertisement redacted:dea6:32ff:fe60:a3ec (ovr) is at dc:a6:32:60:a3:ec
525 16:55:19,251772 :: ff02::1:ff00:0 ICMPv6 96 Neighbor Solicitation for redacted:200:ff:fe00:0
526 16:55:19,252019 :: ff02::1:ff00:0 ICMPv6 96 Neighbor Solicitation for redacted:ff:fe00:0
527 16:55:19,253679 fe80::5054:ff:fe6d:72bd ff02::16 ICMPv6 108 Multicast Listener Report Message v2
528 16:55:19,256590 fe80::aaa1:59ff:fee7:43ef ff02::16 ICMPv6 108 Multicast Listener Report Message v2

So there are some IPCMPv6 broadcasts that might cause trouble.

I also realized this project is using an external network library so it dug a little bit into it and found this issue, that sounds somewhat similar: https://github.com/espressif/arduino-esp32/issues/7602 It was resolved just a few weeks ago, so maybe an update of this library might help?

@xyzroe Would it be feasible to replace some of the LOGD statements with printLogMsg? Right now the debug console is mostly flooded with binary data (from the Zigbee chip?). So I guess some network messages might not interfere with the log while they would provide us with some useful info about what is going on.

koeberlue commented 5 months ago

@xyzroe Would it be feasible to replace some of the LOGD statements with printLogMsg? Right now the debug console is mostly flooded with binary data (from the Zigbee chip?). So I guess some network messages might not interfere with the log while they would provide us with some useful info about what is going on.

Ah, you just did this an hour ago. Will check the logs. Thank you!

koeberlue commented 5 months ago

Here are the logs. I restarted the ESP to capture the start up phase.

[1218] | [ZB] Chip: CC2652P7
[1218] | [ZB] IEEE: _redacted_
[1229] | [ZB] Flash size: 704 KB
[3587] | [Time] Wed May 29 20:01:57 2024
[3594] | [DNS] Saved ETH - 192.168.188.3
[3640] | [Time] Wed May 29 20:01:57 2024
[3647] | [DNS] Saved ETH - 192.168.188.3
[3958] | [ZB] FW: 20240316
[11714] | Connected client 1 from 192.168.188.61
[...]

// Then this message repeats (I think i missed the first invocations)

[126046] | [Time] Wed May 29 20:04:00 2024
[126046] | [DNS] Restored ETH - 192.168.188.3
[126053] | [DNS] Saved ETH - 192.168.188.3
[...]
[186060] | [Time] Wed May 29 20:05:00 2024
[186061] | [DNS] Restored ETH - 192.168.188.3
[186067] | [DNS] Saved ETH - 192.168.188.3

I removed the binary dumps. The rest of the logs is untouched.

xyzroe commented 5 months ago

56

xyzroe commented 5 months ago

Could you check on the latest release?

koeberlue commented 5 months ago

Same behavior on the latest release

xyzroe commented 5 months ago

Web UI notification checks update using your browser. But ESP fails to download the update. You should check DNS on the root page before trying to update.

koeberlue commented 5 months ago

Web UI notification checks update using your browser. But ESP fails to download the update. You should check DNS on the root page before trying to update.

Yeah, that is absolutely what's happening. And this is really not such a big issue, because nothing bad happens if the download fails.

But it does become a bigger issue when installing the Zigbee OTA update, because it erases the chip before downloading. So when the download fails, the chip will be erased without a new firmware being installed. Also this process is harder to time correctly, because the download will not start immediately after pressing the install button.

So I am still wondering why the DNS config is reset every minute. I will try debugging the issue locally, but this might take some weeks to block some time off.

koeberlue commented 5 months ago

I can confirm both ESP32 and Zigbee update now work for me in 20240607.1 without any problems. Thank you very much!

Also the Status page now always shows the correct DNS server, which seems a little bit strange, since nothing in the changelog should have fixed this directly (?).

On a side note: Connecting to MQTT does no longer work using the hostname of my mosquitto instance. But it works perfectly using the IP address of the server directly. I will check if it a local issue in my home network and will create a new issue if I can pin down the reason for this.

xyzroe commented 5 months ago

I think that I know reason. The same problem with DNS. Look change log for 20240607. I fixed the DNS in memory, and use it everywhere. But before all DNS related action I run check DNS in memory with current one, and fixed if need.