home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
74.04k stars 31.07k forks source link

apcupsd Error doing job: Task exception was never retrieved #111831

Closed baran81 closed 5 months ago

baran81 commented 9 months ago

The problem

Hi my ups information stop to update. got this error in the log:

2024-02-29 12:20:19.867 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 256, in _handle_refresh_interval
    await self._async_refresh(log_failures=True, scheduled=True)
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 412, in _async_refresh
    self.async_update_listeners()
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 183, in async_update_listeners
    update_callback()
  File "/usr/src/homeassistant/homeassistant/components/apcupsd/sensor.py", line 511, in _handle_coordinator_update
    self._update_attrs()
  File "/usr/src/homeassistant/homeassistant/components/apcupsd/sensor.py", line 517, in _update_attrs
    self._attr_native_value, inferred_unit = infer_unit(self.coordinator.data[key])
                                                        ~~~~~~~~~~~~~~~~~~~~~^^^^^
KeyError: 'LASTSTEST'

What version of Home Assistant Core has the issue?

core-2024.2.5

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

apcupsd

Link to integration documentation on our website

https://www.home-assistant.io/integrations/apcupsd

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

home-assistant[bot] commented 9 months ago

Hey there @yuxincs, mind taking a look at this issue as it has been labeled with an integration (apcupsd) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `apcupsd` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign apcupsd` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


apcupsd documentation apcupsd source (message by IssueLinks)

yuxincs commented 9 months ago

Hmmm, this indicates that field LASTSTEST is present when the integration is set up, but after a few updates the connection to apcupsd is still OK but the field somehow is missing. Did you have any change in UPS or APCUPSD?

Can you try running the following python code several times (replace <HOST> and port with your actual APCUPSD host and port) after pip install aioapcaccess (also make sure you redact the results if the run is successful)?

import asyncio
import aioapcaccess

async def main():
    result = await aioapcaccess.request_status(host='<HOST>', port=3551)
    print(result)

if __name__ == '__main__':
    asyncio.run(main())

I would be curious to know if APCUPSD ever really drops fields (if so, we should add some defensive logic in the integration to gracefully handle that).

Also, if you restart the HA container this should be fixed, right? (since the setup logic won't set up the sensor for LASTSTEST anymore if it's no longer there).

baran81 commented 9 months ago

hi

the python give me this output:

nasadmin@ananas:/tmp$ python3 a.py OrderedDict([('APC', '001,036,0873'), ('DATE', '2024-03-03 17:03:05 +0100'), ('HOSTNAME', 'ananas'), ('VERSION', '3.14.14 (31 May 2016) debian'), ('UPSNAME', 'ananas'), ('CABLE', 'USB Cable'), ('DRIVER', 'USB UPS Driver'), ('UPSMODE', 'Stand Alone'), ('STARTTIME', '2024-02-29 11:58:37 +0100'), ('MODEL', 'Back-UPS XS 950U'), ('STATUS', 'ONLINE'), ('LINEV', '220.0 Volts'), ('LOADPCT', '14.0 Percent'), ('BCHARGE', '100.0 Percent'), ('TIMELEFT', '4.9 Minutes'), ('MBATTCHG', '5 Percent'), ('MINTIMEL', '3 Minutes'), ('MAXTIME', '0 Seconds'), ('SENSE', 'Medium'), ('LOTRANS', '155.0 Volts'), ('HITRANS', '280.0 Volts'), ('ALARMDEL', '30 Seconds'), ('BATTV', '13.4 Volts'), ('LASTXFER', 'Unacceptable line voltage changes'), ('NUMXFERS', '0'), ('TONBATT', '0 Seconds'), ('CUMONBATT', '0 Seconds'), ('XOFFBATT', 'N/A'), ('SELFTEST', 'NO'), ('STATFLAG', '0x05000008'), ('SERIALNO', '3B1602X23792'), ('BATTDATE', '2016-01-13'), ('NOMINV', '230 Volts'), ('NOMBATTV', '12.0 Volts'), ('NOMPOWER', '480 Watts'), ('FIRMWARE', '925.T2 .I USB FW:T2'), ('END APC', '2024-03-03 17:03:05 +0100')])

I'm running apcupsd version: apcupsd/jammy,now 3.14.14-3.1build1 amd64 [installed]

that should not be updated since " 16 Feb 2022 " (from changelog)

the out of apcaccess don't show "LASTSTEST":

root@ananas:~# apcaccess status APC : 001,036,0873 DATE : 2024-03-03 17:08:08 +0100 HOSTNAME : ananas VERSION : 3.14.14 (31 May 2016) debian UPSNAME : ananas CABLE : USB Cable DRIVER : USB UPS Driver UPSMODE : Stand Alone STARTTIME: 2024-02-29 11:58:37 +0100 MODEL : Back-UPS XS 950U STATUS : ONLINE LINEV : 220.0 Volts LOADPCT : 14.0 Percent BCHARGE : 100.0 Percent TIMELEFT : 4.5 Minutes MBATTCHG : 5 Percent MINTIMEL : 3 Minutes MAXTIME : 0 Seconds SENSE : Medium LOTRANS : 155.0 Volts HITRANS : 280.0 Volts ALARMDEL : 30 Seconds BATTV : 13.4 Volts LASTXFER : Unacceptable line voltage changes NUMXFERS : 0 TONBATT : 0 Seconds CUMONBATT: 0 Seconds XOFFBATT : N/A SELFTEST : NO STATFLAG : 0x05000008 SERIALNO : 3B1602X23792 BATTDATE : 2016-01-13 NOMINV : 230 Volts NOMBATTV : 12.0 Volts NOMPOWER : 480 Watts FIRMWARE : 925.T2 .I USB FW:T2 END APC : 2024-03-03 17:08:09 +0100

yes.. restarting HA container fix it...

yuxincs commented 9 months ago

Can you report if it fails again? this could be because of the "self test" functionality of the apc ups but somehow I can't reproduce this on my lower-end apc ups which simply doesn't report this field no matter what. I wonder in what scenarios the daemon would report LASTSTEST but then drop it in the following reports.

My theory is that if the daemon restarts in the middle it may clear the last self test results and therefore stop reporting it in the following pulls. 🤔 One thing worth trying would be to manually trigger a self test (apctest), see if we can get LASTSTEST reported, and then restart daemon to see if we can make it drop the field. (Make sure you don't have any important devices connected to it just to be on the safe side).

If we can confirm this, I can add some logic to mark the sensors as unavailable if the daemon somehow dropped the fields (currently the assumption is that it may report different values, but the fields will always stay the same).

baran81 commented 8 months ago

HI. I run apctest, but it required apcupsd sto be stopped. After executing apctest and starting apcupsd I still do not see LASTSTEST.

Probably I have to wait an auto test of the UPS to see that label again.

26tajeen commented 8 months ago

@yuxincs Maybe related?

yuxincs commented 8 months ago

@26tajeen I don't think it's related, "[Task exception was never retrieved]" is just a generic exception that gets raised in asyncio when some exception happened in an async task and it was never properly retrieved in the main event loop. HA might improve its main executor logic to retrieve such exceptions, but it doesn't automatically imply that these exceptions are related: you have to look at the underlying exception of the task.

Here the exception happened because in APCUPSD we incorrectly assumed some key is always present in the response of the daemon, but it turns out it isn't. PR #113125 is there to gracefully handle this case.

issue-triage-workflows[bot] commented 5 months ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.