sfstar / hass-victron

Integration for Home Assistant to fetch data from the victron gx device via modbusTCP
Apache License 2.0
157 stars 24 forks source link

Loss of connection to GX device, entities not updating or status unknown #186

Closed Off-Grid-Garage closed 3 months ago

Off-Grid-Garage commented 3 months ago

Hello,

I've got your integration installed since last year November and it was running perfectly fine since. The problem started well over three weeks ago, when HA didn't receive any updated data any more from my GX device. HA just showed the last known value for all Victron entities. This was not after an update or so, but just happened during the night for a few hours of not receiving any data. It started working again the next morning without me doing anything. It was then running OK for the next few days when it happened again. Since then, the situation got worse and now, the synchronisation was lost more and more frequently.

Venus OS: v3.22 HA: Core 2024.3.1 Supervisor 2024.03.0 Operating System 12.1 Frontend 20240307.0 hass-victron: 0.1.7

So, here is how it looks like: image Data is coming through until close to 9pm and then suddenly stops. It displays the last known state (flat line). In the Morning I tried a few things and restarted HA. That is when the entities show unknown and no line shows at all. At 2:30pm, it suddenly started working again.

This screenshot is from today, showing just a flat line until 1pm, when it started working again, kind off at least as there is only limited data coming through. You can see the curve is very square as the update frequency was around 1-20min randomly, Since 5pm, it received again no data at all. image

The GX device does not show any Modbus errors unless I scan for devices. The other strange thing is that 3 temperature sensors connected to the GX device always keep working. These are Ruuvi Tags connected via Bluetooth to the GX and they show perfectly fine in HA. So, it seems, only Victron gear is affected but not the temperature sensors! Some of the data is still coming through... image

In HA, I have this error showing in regards to the hass-victron integration: image image

Not sure how to read this error and what it means. Any help would be much appreciated. I hope it's just a small thing and I can get this awesome integration working again.

Thank you very much!

sfstar commented 3 months ago

Hello and thank you for opening this issue,

I believe you might be experiencing 2 seperate issues. Namely:

In order to further troubleshoot would it be possible for you to check the status of the temperature or connected tank sensor at the time of the reported errors? This would allow us to deduce what status code 5 should be translated to as the victron modbus tcp documentation unfortunately doesn't specify that code 5 is a valid return value.

Once the status code error is fixed we can look into the connection loss/reporting stop that is occuring. (I believe that that issue might be triggered initially by the error generated from the status code 5)

Off-Grid-Garage commented 3 months ago

Hello,

thanks for replying so quickly.

Your analysis was of great help and... spot on. It seems the problem is related to one of the Ruuvi Temperature Tags. I checked them this morning and found one of them with a low battery status: image image image

I removed the Battery Temp sensor from the Victron system and voilà, the data immediately poured in again. Since then, no trouble any more. Time will tell of course, but I believe this low battery warning has caused the code 5 in the Victron Modbus data. The log error in HA has also stopped reporting this issue since I removed the tag.

The temperature sensor is still working fine and at this point it is only a message, that the small button battery gets low after 2 years. I remember having seen this message occasionally when I went into the temperature sensors (which I usually never do as they just work) but didn't pay much attention to it as the sensor still works without issues. This also explains why the fault was intermittent and the integration started working again in the afternoon once the battery was fully charged. Due to the temperature rise at this point, the button battery must have warmed up enough to be over the voltage warning threshold. Once it cooled down during the night, voltage dropped and caused the issue again, stopping the data from coming in. And this is also explaining why the problem got worse in the last few weeks: ambient temperature has dropped due to weather and the button battery got weaker as well in that time.

So, please let me know if there is any further testing I can do for you to implement a solution for your integration, if this is at all possible?

Thanks a lot, that was a quick turnaround.

sfstar commented 3 months ago

Hello,

Thanks for testing and confirming the issue is the status of the ruuvi temperature sensor so quickly. The fix on this end would entail a new release with an updated enum that also is able to decode and display the value 5.

This would fix your issue immediately. However, this would not solve the integration stopping reporting values when victron release another update with new values that aren't yet added to this integration. For that I would have to change the decoding code to gracefully handle the failure to decode and log the failure as an more descript warning so that it doesn't impede other entities and notifies users to create an issue to get the value added to the integration. Will look into if I can create a patch on main, which hopefully you would be willing to test in your setup, later this afternoon or by tomorrow.

Once tested and confirmed to be working this could be included in the next shippable release.

Off-Grid-Garage commented 3 months ago

More than happy to assist and support you with more testing.

Everything is working beautifully since this morning when I removed the Ruuvi Tag.

Thank you very much.

sfstar commented 3 months ago

Great to hear. I've just merged an change to the main branch that should allow for the value decoding error to not stop the other entities from updating their values. This does not yet contain the fix to decode the low battery state as low battery state. Could you perhaps test this change, in order to check whether future decoding issues will now be handled more gracefully.

To test the change please install the main version of the integration via the procedure described over here: https://github.com/sfstar/hass-victron/issues/182#issuecomment-1998254266

Based on my changes your tests should conclude that all other entities keep being updated when the ruuvi sensor is present with an low battery state. If this is correct, it means that in the future these errors should not fully disrupt the integration from working.

I will add the correct decode value for the ruuvi sensor (so that it doesn't generate an error anymore) once the impact reduction on other entities is confirmed.

Off-Grid-Garage commented 3 months ago

Perfect!

I added the Ruuvi Tag againto the Venus OS and immediately, the data flow in HA stopped through your integration. image

Re-downloaded main and restarted HA, data is rolling back in even having the low battery status for this tag. image

Status 5 could mean not only a low battery warning in a Ruuvi Tag though, but a lot more, I guess? A general status code for a warning maybe? Hard to tell...

sfstar commented 3 months ago

Thank you for testing.

You mention data is rolling back in even having the low battery status for this tag. Do you mean that the temperature status is showing low battery? Because I'm expecting it to say NONDECODABLE instead.

Furthermore, there should be an error in the home assistant logs like this The reported value %s for entity %s isn't a decobale value. Please report this error to the integrations maintainer Where the %s should be replaced with 5 and the name of your temperature sensor entity.

Could you perhaps check if this is the case?

Off-Grid-Garage commented 3 months ago

Well, after reloading the main version, everything works just fine and all Victron data is updating in HA as it should. Including the temperature sensor. It still shows the Sensor battery low warning in Victron: image

Yes, you're right, there is a new error message: image

image

iLeeeZi commented 3 months ago

I've been using fix in PR #93 and it has been working fine. The only place i found mentioning sensor_battery_low is here: https://github.com/victronenergy/venus/issues/889#issuecomment-1011790859

sfstar commented 3 months ago

@Off-Grid-Garage Great to hear that the decoding issues are now handled as they should be (logged but not breaking the integration). I will merge #188 so that status code 5 is correctly decoded. As for the other entities it seems that the 0 value is either meaning "Unknown" or something like "not present".

Since I believe you are still using the single MPII with the phoenix as an generator input? If so, phase 3 registers couldn't possible be populated with measured values. Could you check in the GX device if you can perhaps determine what 0 means in case of (possibly the easiest option) vebus alarm gridlost?

Off-Grid-Garage commented 3 months ago

Yes, I'm still using the single MP and the Phoenix on AC-in if more power is required.

Not sure where I can obtain these information in the GX device. I looked in the MPII section but could not see anything related to that. No alarms or Ve.Bus errors. I can see the status of the entity has changed in HA with the new version from OK to Nondecodable, hence the error in the log: image image ...

sfstar commented 3 months ago

In that case I suggest we move the solving of the decoding issues for these entities to another issue report. Since this is going to take some time to clear out with victron support or futher troubleshooting at our end. I want to see if I can release the fix for the initially reported issue alongside all the other fixes done during the past couple of days as a new release. So that users will also benefit from the improvement stability gain in case of decode failures.

@Off-Grid-Garage would you be willing to open another issue to track the vebus decode errors?

iLeeeZi commented 3 months ago

It is the PR #187 that causes values like 0 = OK, 0 = OFF to change to 0 = NONDECODABLE so I think some changes are required to that.

All these values were "OK" before

Screenshot_20240317_171611_Home Assistant

Off-Grid-Garage commented 3 months ago

It is the PR #187 that causes values like 0 = OK, 0 = OFF to change to 0 = NONDECODABLE so I think some changes are required to that. <

Yes, thank you @iLeeeZi. I've added some of these screenshots to #189

sfstar commented 3 months ago

The release (v0.2.0) has been shipped. Closing issue, feel free to re-open if the issue persists with the new release