Open rehanson opened 3 months ago
Hey there @cottsay, mind taking a look at this issue as it has been labeled with an integration (rainforest_raven
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
rainforest_raven documentation rainforest_raven source (message by IssueLinks)
Thanks for the report, @rehanson. You mentioned that you're using a VM. How is the EMU-2 forwarded from the host to the VM? As a USB device or a serial device?
Ah, I can see from the diagnostics that it appears in the VM as a USB device.
There were some handle leaks in 2024.2.0-2024.3.0 that I believe were causing a multitude of problems, including the gaps in the history. From what I can tell, the leaked handles would accumulate to the point where Home Assistant was no longer able to maintain a connection at all, resulting in bugs like #111886.
I also observed some problems in my VM after letting 2024.3.0 run to the point where the handle leaks saturated and needed to disconnect and reconnect the device, though I haven't been able to reproduce the problem now that the handle leaks are fixed. Please keep me updated with any changes in behavior.
@cottsay I just updated to Home Assistant Core 2024.3.3 (from 2024.3.1). This update included the aioraven bump to 0.5.2. I had no issues getting stuck during the boot process after the update. And I haven't had any gaps in the history like I used to get. I've been running for ~3hrs with no issues yet. Before the update, I'd have about one "Unavailable" gap per minute.
Thank you for reporting back. Glad to hear things are looking better.
We can keep this open for another update cycle or so, but if nobody is seeing it any longer then I'm willing to bet that my initial instinct was right and that qemu was misbehaving after the guest leaked all those unclosed handles. Seems a full VM shutdown or device disconnect are the only ways to get back to a good state when that happens.
It's been a day now. Any errors in the logs? Any chance #113539 stopped occurring too?
In almost two days of running the latest update, I haven't had any gaps in my history! https://github.com/home-assistant/core/issues/113539 is still occurring however. I will add more comments over on that issue.
EDIT: There were actually four gaps. See a couple of comments down.
Oh, and I haven't really rebooted enough times to see if the boot process still gets stuck, but based on your explanation, I doubt I will encounter it again.
Now that I have looked closer at the logs, it did timeout four times (much less of an issue than before!) in the last two days. And I can see 30 second gaps in the history graph at those four points in time.
Logger: homeassistant.components.rainforest_raven.coordinator
Source: helpers/update_coordinator.py:324
integration: Rainforest RAVEn (documentation, issues)
First occurred: March 26, 2024 at 6:48:18 AM (4 occurrences)
Last logged: 2:07:33 AM
Timeout fetching rainforest_raven data
Here are those four occurrences:
2024-03-26 06:48:18.388 ERROR (MainThread) [homeassistant.components.rainforest_raven.coordinator] Timeout fetching rainforest_raven data
2024-03-26 09:59:53.361 ERROR (MainThread) [homeassistant.components.rainforest_raven.coordinator] Timeout fetching rainforest_raven data
2024-03-27 02:05:28.408 ERROR (MainThread) [homeassistant.components.rainforest_raven.coordinator] Timeout fetching rainforest_raven data
2024-03-27 02:07:33.359 ERROR (MainThread) [homeassistant.components.rainforest_raven.coordinator] Timeout fetching rainforest_raven data
Those are probably legitimate timeouts now. There could be dropped or corrupted messages over the serial link or the EMU-2's wireless link to the meter. There's a lot that can go wrong here.
It's too bad that the data reflects the timeout with a gap like that. We don't want to present stale data, so the only thing I can think to do is to retry the query once or twice. Would it be worth it to add that logic? How much of a problem are the gaps in the data?
With how rare the timeouts are now, it's not really a problem for me.
I just ran into the stuck bootup issue again today during a reboot of my Proxmox server (Home Assistant runs as a VM on the server). I am running Core 2024.3.3
Logger: homeassistant.bootstrap
Source: bootstrap.py:601
First occurred: 1:59:34 PM (3 occurrences)
Last logged: 2:01:34 PM
Waiting on integrations to complete setup: rainforest_raven
I was only able to complete HA startup by unplugging the Rainforest EMU-2 meter.
I just ran into the stuck bootup issue again today...
Rats, thanks for the feedback. I assume this was an update from 2024.3.3 to 2024.4.0?
Nope, I hadn't yet updated to 2024.4.0 when it happened. (I am currently updating now)
There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.
My system still hangs up occasionally when trying to boot. It shows a message about waiting for the RAVEn integration to start.
It most recently happened on 2024.6.4
The problem
Home Assistant got stuck in the startup process while loading the Rainforest RAVEn integration. While it was stuck, I wasn't able to access the Overview dashboard because Home Assistant said it was still booting up (I could still navigate to menus and custom dashboards).
Sequence of events:
What version of Home Assistant Core has the issue?
core-2024.3.1
What was the last working version of Home Assistant Core?
core-2024.3.0
What type of installation are you running?
Home Assistant OS
Integration causing the issue
rainforest_raven
Link to integration documentation on our website
https://www.home-assistant.io/integrations/rainforest_raven/
Diagnostics information
config_entry-rainforest_raven-2a01a9ec98f5f6849b766186260f7c24(1).json
Example YAML snippet
No response
Anything in the logs that might be useful for us?
Additional information
Even while the Rainforest RAVEn integration is running normally, there are frequent occurrences of "Unavailable" states. This causes the history graph to have many gaps, and dashboards showing the EMU-2 status will often show "Unavailable" instead of the last known value.