Open marcone opened 8 months ago
How often does this happen? Can you plug in and capture the serial logs?
Also, have you disabled the Home Assistant "api:" on your firmware, since you aren't using Home Assistant? The device will automatically reboot if it cannot connect to Home Assistant after 15 minutes by default. https://esphome.io/components/api.html
To disable the api, you will need to compile your own firmware with ESPHome. You'll need to install ESPHome itself somewhere and then create a device .yaml for the ratgdo and compile and flash that new firmware.
Here's what the device .yaml could look like:
substitutions:
name: ratgdo
friendly_name: Garage
packages:
ratgdo.esphome: github://ratgdo/esphome-ratgdo/v25iboard.yaml@main
esphome:
name: ${name}
name_add_mac_suffix: false
friendly_name: ${friendly_name}
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
#custom modifications using https://esphome.io/guides/configuration-types.html#remove
api: !remove
I did not edit any yaml files or build my own firmware, I just flashed it from https://ratgdo.github.io/esphome-ratgdo/ (choosing "security + 2.0" and "ratdgo v2.5x").
I do see messages about it rebooting in the log on the web UI, however the nonresponsive periods are much more frequent than that, e.g. just in the last half hour it was unresponsive from 8:29-8:31, 8:33-8:40, 8:43-8:44, 8:46-8:48, 8:50-8:51 and 8:54-8:55.
I'll see about getting serial logs. Might have to go buy a really long USB cable first.
This is probably because the ESP Home firmware is rebooting because it's not connected to HA.
Look at reboot_timeout
on https://esphome.io/components/api.html
But doesn't that reboot only happen every 15 minutes? The nonresponsive periods happen way more frequently than that, and their duration varies a lot too.
I attached the ratgdo to a Raspberry Pi so I could capture serial logs while the ratgdo was attached to the opener. I've attached two logs: "ping.log" is the log of a script that pings the ratgdo every second. When it receives a response it logs "alive", and when it doesn't receive a ping response it logs "unreachable". "serial.log" is the serial log, with each line prefixed by the timestamp of the time it was read, so it can be correlated with the ping log.
Some things that stood out to me:
Sat 23 Mar 19:58:54 PDT 2024: ^[[1;31m[E][json:041]: Could not allocate memory for JSON document! Requested 128 bytes, largest free heap block: 128 bytes^[[0m^M
Sat 23 Mar 19:58:54 PDT 2024:
Sat 23 Mar 19:58:54 PDT 2024: --------------- CUT HERE FOR EXCEPTION DECODER ---------------
Sat 23 Mar 19:58:54 PDT 2024: ^M
Sat 23 Mar 19:58:54 PDT 2024: Exception (29):^M
Sat 23 Mar 19:58:54 PDT 2024: epc1=0x4000df64 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000^M
Sat 23 Mar 19:58:54 PDT 2024: ^M
Sat 23 Mar 19:58:54 PDT 2024: >>>stack>>>^M
Sat 23 Mar 19:58:54 PDT 2024: ^M
Sat 23 Mar 19:58:54 PDT 2024: ctx: sys^M
Sat 23 Mar 19:58:54 PDT 2024: sp: 3fffec10 end: 3fffffb0 offset: 0190^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffeda0: 00000016 000000d4 00000020 40101530 ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffedb0: 00000005 00000000 00000002 4010179c ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffedc0: 4025d6e7 3ffef3f0 00000002 4025d67c ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffedd0: 00000002 4025d623 00000002 4025c778 ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffede0: 4025c7a1 3fffee90 3ffef3f0 00000016 ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffedf0: 4025a204 3fffee90 3ffef298 3ffeec60 ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee00: 3ffeb820 3fffee90 3fffee90 40101506 ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee10: 61726147 10006567 40103c1e 00000100 ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee20: 3ffeb18c 7fffffff 00002200 00000001 ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee30: 4025655d 00000080 3fffaff4 40239aea ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee40: ffffffe1 3ffeedac 3ffeb830 3ffef3f0 ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee50: 3ffee390 00000041 00000000 4025aeff ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee60: 00000000 3fff493c ffffffe1 00000000 ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee70: 00000000 3ffef3f0 00000014 0000000f ^M
Sat 23 Mar 19:58:54 PDT 2024: 3fffee80: 40000f58 00000000 3281b0bb 00000000 ^M
...
which correspond to times that the ratgdo becomes unresponsive to pings. The duration of the unresponsiveness after a "Could not allocate memory for JSON document" exception varies. They're often pretty short (like the one above, where the unresponsiveness lasted about 10 seconds), but sometimes quite long (like the one starting at "Sat 23 Mar 20:20:40", which lasted over 5 minutes)
I have this issue too... Curious if you've found any resolution. For now I've dialed my reboot_timeout
values way down to try to force the ratgdo to restart itself when it becomes unresponsive.
Haven't found a solution/workaround. Since it happens so frequently, starting shortly after boot, rebooting often wouldn't really help me much either, since then I'd just be waiting for it to finish its reboot and reconnect to the network.
On a whim I tried updating the firmware again (from 2024.4.2 to 2024.5.0) and it is so much worse now. The web interface isn't even usable anymore: every time I reload the page it takes over a minute to load just a partial mostly-empty page, another minute or more for the actual information to load, and the ratgdo is unresponsive to pings for most of this time.
this sounds similar to an issue I'm seeing with a couple devices running the homekit firmware. curious if you disconnect the device from the GDO if it becomes responsive again. in those cases just disconnecting from the gdo made it suddenly become fully responsive again.
Yeah i've been having this issue too. It constantly disconnects where as it used to be really stable. Its not rebooting. In the logs I see a disconnect and then reconnects in a short time.
I also been troubleshooting this issue - glad to find this thread as I’ve been pulling my hair out. Anyone make any progress resolving?
I just set up a ratgdo today and was seeing the json error message followed by a reboot. It was crashing so often it warned it was going into safe mode. I was testing it with combinations of remotes and Home Assistant commands and it would crash after almost every open/close cycle.
Could not allocate memory for JSON document! Requested 504 bytes, largest free heap block: 504 bytes
Removing web_server:
from config helped mine to stop crashing.
I just flashed version 2024.10.0, and so far it seems to be behaving as it should. I haven't seen any of those "Could not allocate memory for JSON document" failures yet, and instead it just reboots voluntarily every 15 minutes as described in rlowen's comment above. Would be nice if that was an option that can be turned on/off in the web interface. Alternatively, if someone could point me at a guide that explains how to build my own image, that'd be awesome.
I'm using the ratgdo 2.53i standalone, not integrated with any home automation setup. I flashed it with the ESPHome firmware and have a script that polls the REST API every few seconds to get the current state and automatically close the door when it's been accidentally left open.
What I'm noticing is that the ratgdo frequently stops responding, sometimes only briefly, sometimes for minutes at a time.
ping
ing the device shows the same behavior: it'll regularly stop responding to pings for up to a few minutes, then resumes, and these interruptions coincide with the REST API becoming nonresponsive. I can ping the GDO itself as well as other devices on the same Wifi access point (I have a dedicated AP in the garage) without issue.(as I was typing the above, it stopped responding again and stayed unresponsive for 6 minutes)