Open rehanson opened 8 months ago
Hey there @cottsay, mind taking a look at this issue as it has been labeled with an integration (rainforest_raven
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
rainforest_raven documentation rainforest_raven source (message by IssueLinks)
Thanks for the report, @rehanson. You mentioned that you're using a VM. How is the EMU-2 forwarded from the host to the VM? As a USB device or a serial device?
Ah, I can see from the diagnostics that it appears in the VM as a USB device.
There were some handle leaks in 2024.2.0-2024.3.0 that I believe were causing a multitude of problems, including the gaps in the history. From what I can tell, the leaked handles would accumulate to the point where Home Assistant was no longer able to maintain a connection at all, resulting in bugs like #111886.
I also observed some problems in my VM after letting 2024.3.0 run to the point where the handle leaks saturated and needed to disconnect and reconnect the device, though I haven't been able to reproduce the problem now that the handle leaks are fixed. Please keep me updated with any changes in behavior.
@cottsay I just updated to Home Assistant Core 2024.3.3 (from 2024.3.1). This update included the aioraven bump to 0.5.2. I had no issues getting stuck during the boot process after the update. And I haven't had any gaps in the history like I used to get. I've been running for ~3hrs with no issues yet. Before the update, I'd have about one "Unavailable" gap per minute.
Thank you for reporting back. Glad to hear things are looking better.
We can keep this open for another update cycle or so, but if nobody is seeing it any longer then I'm willing to bet that my initial instinct was right and that qemu was misbehaving after the guest leaked all those unclosed handles. Seems a full VM shutdown or device disconnect are the only ways to get back to a good state when that happens.
It's been a day now. Any errors in the logs? Any chance #113539 stopped occurring too?
In almost two days of running the latest update, I haven't had any gaps in my history! https://github.com/home-assistant/core/issues/113539 is still occurring however. I will add more comments over on that issue.
EDIT: There were actually four gaps. See a couple of comments down.
Oh, and I haven't really rebooted enough times to see if the boot process still gets stuck, but based on your explanation, I doubt I will encounter it again.
Now that I have looked closer at the logs, it did timeout four times (much less of an issue than before!) in the last two days. And I can see 30 second gaps in the history graph at those four points in time.
Logger: homeassistant.components.rainforest_raven.coordinator
Source: helpers/update_coordinator.py:324
integration: Rainforest RAVEn (documentation, issues)
First occurred: March 26, 2024 at 6:48:18 AM (4 occurrences)
Last logged: 2:07:33 AM
Timeout fetching rainforest_raven data
Here are those four occurrences:
2024-03-26 06:48:18.388 ERROR (MainThread) [homeassistant.components.rainforest_raven.coordinator] Timeout fetching rainforest_raven data
2024-03-26 09:59:53.361 ERROR (MainThread) [homeassistant.components.rainforest_raven.coordinator] Timeout fetching rainforest_raven data
2024-03-27 02:05:28.408 ERROR (MainThread) [homeassistant.components.rainforest_raven.coordinator] Timeout fetching rainforest_raven data
2024-03-27 02:07:33.359 ERROR (MainThread) [homeassistant.components.rainforest_raven.coordinator] Timeout fetching rainforest_raven data
Those are probably legitimate timeouts now. There could be dropped or corrupted messages over the serial link or the EMU-2's wireless link to the meter. There's a lot that can go wrong here.
It's too bad that the data reflects the timeout with a gap like that. We don't want to present stale data, so the only thing I can think to do is to retry the query once or twice. Would it be worth it to add that logic? How much of a problem are the gaps in the data?
With how rare the timeouts are now, it's not really a problem for me.
I just ran into the stuck bootup issue again today during a reboot of my Proxmox server (Home Assistant runs as a VM on the server). I am running Core 2024.3.3
Logger: homeassistant.bootstrap
Source: bootstrap.py:601
First occurred: 1:59:34 PM (3 occurrences)
Last logged: 2:01:34 PM
Waiting on integrations to complete setup: rainforest_raven
I was only able to complete HA startup by unplugging the Rainforest EMU-2 meter.
I just ran into the stuck bootup issue again today...
Rats, thanks for the feedback. I assume this was an update from 2024.3.3 to 2024.4.0?
Nope, I hadn't yet updated to 2024.4.0 when it happened. (I am currently updating now)
There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.
My system still hangs up occasionally when trying to boot. It shows a message about waiting for the RAVEn integration to start.
It most recently happened on 2024.6.4
FYI @rehanson, 2024.7.3 included changes to the way rainforest_raven
handles initialization errors, and the underlying serial library was updated with changes to device closure that are also in this space.
Since I still can't reproduce these problems locally I can't confirm that they will solve or mitigate this problem, but either of those changes could have changed the behavior. Let me know if you see anything different.
Thanks @cottsay, I'll report back if I see the issue again. (And I'll try to report back in a month or so if I don't see it again)
I saw this issue again today after upgrading to hassos 13.0, I had reverted the previous update (I think to 12.3 from 12.4) due to this integration not starting. The integration ran fine with the older hassos. After I saw the hang on 13.0 I tried unplugging the rainforest device and re-plugging and that seemed to work, startup completed quickly and I was getting power data. I did a second full reboot to test and had the same symptoms. Restarting just HA after this (no reboot) and everything came up normally so this seems to only occur on full OS reboots for me.
hassos: "13.0" homeassistant: 2024.8.1 machine: raspberrypi4-64
I'm still seeing the same behavior after upgrading hassos 13.1, unplugging the rainforest device after a full OS reboot restored data, no issues with subquent HA restarts
I just ran into this again after upgrading to HA OS 13.1. The initial reboot after upgrading stalled without me being able to get to the Web UI, so I rebooted again, was able to get to the Web UI, but with the dreaded message again:
Starting Rainforest RAVEn, not everything will be available until it is finished.
It finished booting when I unplugged the EMU-2.
Core: 2024.8.2
I have also hit this, on Home Assistant 2024.10.1, running in docker, with the EMU-2 exposed into the docker container as /dev/ttyACM0. The EMU-2 is actually connected to another raspberry pi, forwarding it over the network via usbip. I went to test the power loss recovery on the remote device, and when it came back online, understandably, RAVEn wasn't connected, so I restarted the HASS container and hit this issue on startup.
just upgraded to hassos 13.2 and same issue/workaround, hung at the Rainforest integration trying to start (waited 10 minutes before unplugging and re-plugging USB, then data started flowing again)
I started running into this as well. I had to remove the device from the VM, add it again, and restart the RAVEn integration.
The problem
Home Assistant got stuck in the startup process while loading the Rainforest RAVEn integration. While it was stuck, I wasn't able to access the Overview dashboard because Home Assistant said it was still booting up (I could still navigate to menus and custom dashboards).
Sequence of events:
What version of Home Assistant Core has the issue?
core-2024.3.1
What was the last working version of Home Assistant Core?
core-2024.3.0
What type of installation are you running?
Home Assistant OS
Integration causing the issue
rainforest_raven
Link to integration documentation on our website
https://www.home-assistant.io/integrations/rainforest_raven/
Diagnostics information
config_entry-rainforest_raven-2a01a9ec98f5f6849b766186260f7c24(1).json
Example YAML snippet
No response
Anything in the logs that might be useful for us?
Additional information
Even while the Rainforest RAVEn integration is running normally, there are frequent occurrences of "Unavailable" states. This causes the history graph to have many gaps, and dashboards showing the EMU-2 status will often show "Unavailable" instead of the last known value.