Open jhemak opened 3 weeks ago
Hey there @matthewflamm, @kamiyo, mind taking a look at this issue as it has been labeled with an integration (nws
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
nws documentation nws source (message by IssueLinks)
Exact same issue here
See also #117533. This is caused by NWS server caching/flakiness resulting in old observation data. We ask for data that is at most 70 minutes old. We also cache data for up to an hour. So we treat observation data as valid for 60-130 minutes depending on timing of (in)successful API calls. Unfortunately, sometimes this results in no data being available after this time period due to API flakiness. We could consider increasing these time periods, however we will end up with stale data in that case.
@MatthewFlamm maybe we should change the error to a warning or info? Since the integration is behaving as expected and the error is upstream. That way people don't break out at seeing an error. Also maybe we can add message that says cached data is being used, and the integration will retry in XX minutes.
The homeassistant machinery considers errors upstream to be logged as errors. For example, a 502 response from NWS is clearly an upstream error amd would be logged in homeassistant as an error.
We could write our own logic for handling all types of errors for nws, but this is a lot of maintenance burden. Further if we bypass the built-in machinery for logging, we also bypass the built-in machinery for availability of the entities, among other things. More maintenance burden. I have been moving the integration to use the built-in machinery as much as possible to make the integration less brittle to changes in core.
Okay, got it. Just wondering if we could somehow mitigate users reporting this issue when there's not really anything we can do about it. I feel like it happens usually when HA is updated because that's when people usually restart their machines and then the cached data isn't there.
People don't care that the data might be a little stale as it's most likely still very accurate data, and stale data in the case of weather is better than nothing. This is very common with the NWS API, so letting it stay the way it currently is will not work. That timeout is way too short.
I've never seen these sensors go unavailable before upgrading my HA the other day. I was a bit behind and have so many things hooked into the weather forecast, I was waiting to upgrade to the new service call approach when I had some time. So this also didn't happen before changes on HAs side from what I can tell. Blaming NWS 100% is not fair.
I guess it's just more template/trigger sensors I have to make to work around weirdly opinionated functionality in HA. I'll ignore unavailable myself and keep the old data.
I'm open to suggestions on appropriate timing. The old way of the integration was not good as you could have stale data being shown for days.
I totally agree with the above comment from kevinhaas. 1 hour stale data is better than none. This started when upgrading to either core 2024.6.1 or 2024.6.2 cant remember exactly.
The timing would time out the data at 120-130 minutes assuming NWS API isn't flaky. The 60 minute side is assuming NWS API has an issue. What are the proposals here?
Looking at my sensors, the longest time I've had unavailable is just under 4 hours. What about 6 or 12 hours? I think even up to a day is acceptable for my use case.
Are these logs a symptom of the "70 min request" you mentioned? I see start time in the request, but otherwise not seeing that. Observations API returns 200 with an empty response (HA 2024.6.2).
Looks related to https://github.com/MatthewFlamm/pynws/issues/194 and https://github.com/home-assistant/core/pull/115857, but it's still repro'ing with those changes.
I added a log for the full request. Would it make sense to include that by default (url, params, headers)?
2024-06-15 11:10:15.708 DEBUG (MainThread) [pynws.raw_data] Request: https://api.weather.gov/stations/xxxx/observations/ Params: {'start': '2024-06-15T17:00:15+00:00'} Headers: {'accept': 'application/geo+json', 'User-Agent': 'pynws xxxx@xxxx.xxx'}
2024-06-15 11:10:16.418 DEBUG (MainThread) [pynws.raw_data] Request for https://api.weather.gov/stations/xxxx/observations/ returned code: 200
2024-06-15 11:10:16.419 DEBUG (MainThread) [pynws.raw_data] Request for https://api.weather.gov/stations/xxxx/observations/ returned header: <CIMultiDictProxy('Server': 'nginx/1.20.1', 'Content-Type': 'application/geo+json', 'Access-Control-Allow-Origin': '*', 'Access-Control-Expose-Headers': 'X-Correlation-Id, X-Request-Id, X-Server-Id', 'X-Request-ID': '...', 'X-Correlation-ID': '...', 'X-Server-ID': 'vm-bldr-nids-apiapp13.ncep.noaa.gov', 'Content-Encoding': 'gzip', 'Content-Length': '94', 'Cache-Control': 'public, max-age=300, s-maxage=120', 'Expires': 'Sat, 15 Jun 2024 18:15:16 GMT', 'Date': 'Sat, 15 Jun 2024 18:10:16 GMT', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'X-Edge-Request-ID': '...', 'Vary': 'Accept,Feature-Flags,Accept-Language', 'Strict-Transport-Security': 'max-age=31536000 ; includeSubDomains ; preload')>
2024-06-15 11:10:16.420 DEBUG (MainThread) [pynws.raw_data] Request for https://api.weather.gov/stations/xxxx/observations/ returned data: {'@context': {'@version': '1.1'}, 'type': 'FeatureCollection', 'features': []}
2024-06-15 11:10:16.421 ERROR (MainThread) [homeassistant.components.nws.coordinator] Error fetching NWS observation station xxxx data: Observation received with no data.
2024-06-15 11:10:16.421 DEBUG (MainThread) [homeassistant.components.nws.coordinator] Finished fetching NWS observation station xxxx data in 0.713 seconds (success: False)
I'm running with these values now to test, seems to be working for my particular locations:
OBSERVATION_VALID_TIME = timedelta(minutes=120)
FORECAST_VALID_TIME = timedelta(minutes=120)
# A lot of stations update once hourly plus some wiggle room
UPDATE_TIME_PERIOD = timedelta(minutes=180)
Three suggestions:
Allow these valid time settings to be set by the user based on their need for up-to-date information, since some use cases are more tolerant of unknown data than others, may be better to have a bit stale data rather than no data at all.
Add the timestamp returned from the NWS API to the attribution (or return it as a separate data element), thus giving the user a way to know exactly how timely the data are. Something like:
Observations at 2024-06-15 13:53:00-04:00 from Station KPTK-Pontiac, Oakland County International Airport
Last changed
time in HA seems to give the proper age but just explicitly exposing the timestamp on the data as received might also be helpful.
Change the error message to be more correct, since the API is actually not returning an observation with no data, it's really returning an observation with data that are too old based on our requirements. Maybe a message like:
Error fetching NWS observation station xxxx data: Observation received with data from 2024-06-15 13:55:00-04:00, older than required 120 minutes.
Great suggestions. I've seen ~2 hours max for the unavailable state also, so I'll try modifying the integration like you did for now until a fix is available.
Also, when it's in the unavailable state, it seems I can't pull any forecast data either. I guess that makes sense following the existing logic, but feels weird I can't pull any forecast at all since I'm not explicitly told it's out of date or anything. The error was just something like "service call does not match entity".
The NWS system is really messed up! It has been reporting the same temp 86 degrees since Sunday 06/16/2024 for my location at it's web site for radar/temp/alerts. So most of it's reporting stations seem to be having issues with temps and rain fall reporting. Seems their IT department is out to lunch.
Mine is working fine after making the updates posted above. Sounds like another issue or just an outage for those stations. They should have something posted somewhere if there is an outage.
The NWS system is really messed up! It has been reporting the same temp 86 degrees since Sunday 06/16/2024 for my location at it's web site for radar/temp/alerts. So most of it's reporting stations seem to be having issues with temps and rain fall reporting. Seems their IT department is out to lunch.
What version of homeassistant are you running with the stuck data?
The NWS system is really messed up! It has been reporting the same temp 86 degrees since Sunday 06/16/2024 for my location at it's web site for radar/temp/alerts. So most of it's reporting stations seem to be having issues with temps and rain fall reporting. Seems their IT department is out to lunch.
What version of homeassistant are you running with the stuck data?
My local reporting station: https://www.willyweather.com/mi/midland-county/jack-barstow-airport.html config_entry-nws-534b9c928edd160589dd5eb836a30d1e.json
My HA is: Core 2024.6.3 Supervisor 2024.06.0 Operating System 12.4 Frontend 20240610.1
NWS integration: the latest from the repository
I saw some issues just now and the NWS posted a bulletin that may explain it. I'm not sure if any of these changes affect the integration or the API is just in a bad state right now, but the error I get when calling get_forecasts is:
websocket_api script: Error executing script. Unexpected error for call_service at pos 1: list index out of range
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 526, in _async_step
await getattr(self, handler)()
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 763, in _async_call_service_step
response_data = await self._async_run_long_action(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 726, in _async_run_long_action
return await long_task
^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2741, in async_call
response_data = await coro
^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2784, in _execute_service
return await target(service_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 977, in entity_service_call
single_response = await _handle_entity_call(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 1049, in _handle_entity_call
result = await task
^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/components/weather/__init__.py", line 1054, in async_get_forecasts_service
native_forecast_list = await weather.async_forecast_hourly()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/components/weather/__init__.py", line 1231, in async_forecast_hourly
return await self._async_forecast("hourly")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/components/weather/__init__.py", line 1220, in _async_forecast
list[Forecast] | None, getattr(self, f"_async_forecast_{forecast_type}")()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/config/custom_components/nws/weather.py", line 279, in _async_forecast_hourly
return self._forecast(self.nws.forecast_hourly, HOURLY)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 456, in forecast_hourly
return self._convert_forecast(forecast)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 513, in _convert_forecast
time, weather = parse_icon(forecast_entry["icon"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 181, in parse_icon
time = icon_list[5]
~~~~~~~~~^^^
IndexError: list index out of range
There was a change in the API which wasn't in the bulletin.... This needs to be fixed.
I'm hesitant to fix it immediately in pynws
as sometimes NWS will revert something like this.
I'm hesitant to fix it immediately in
pynws
as sometimes NWS will revert something like this.
Could a condition test be used, looking for empty data return, so if one get no data it tries the new fix if that actually fixes the no data, and then it can fall back to old method if api change is reverted.
Just something to think about.
I'm hesitant to fix it immediately in
pynws
as sometimes NWS will revert something like this.Could a condition test be used, looking for empty data return, so if one get no data it tries the new fix if that actually fixes the no data, and then it can fall back to old method if api change is reverted.
Just something to think about.
We are starting to mix issues in this thread. Let's drop talking about the IndexError: list index out of range
error just reported. If anyone wants to discuss it, it can go in another thread. This is already being handled in https://github.com/MatthewFlamm/pynws/issues/207 and will hopefully go into a fix in homeassistant soon. After I looked at it closer, there is an elegant solution for handling this in both cases.
For the new data issue. The issue with the data itself from NWS being several days old is a not infrequent problem that some users run into. This is one reason why the integration now filters out old data. NWS must fix this problem, we have no control over it. In some other cases, the data doesn't update on the order of hours. The current 70 minute window might be too short, and I'm thinking that something in the 2-3 hr-ish range is reasonable middle ground. I could be considered to go longer, but then I worry that we are providing data that is too stale.
As of 06/21/2024 4:15 PM Looks like the NWS has done some repair work to it's data collection system, it is now reporting data for my main weather site. Will continue to observe. But almost 2 weeks without, even their own sites seem to be working again. A happy weather watcher here again.
Update: 06/21/2024 6:51 PM Now I'm seeing the error about forecast that @kevinhaas posted about earlier in the issue thread
Logger: homeassistant.components.websocket_api.http.connection Source: components/nws/weather.py:282 integration: Home Assistant WebSocket API (documentation, issues) First occurred: 4:49:37 PM (9 occurrences) Last logged: 6:44:42 PM
[547298063360] Error handling message: Unknown error (unknown_error) RA Kish from 192.168.1.32 (Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0) Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/websocket_api/decorators.py", line 27, in _handle_async_response await func(hass, connection, msg) File "/usr/src/homeassistant/homeassistant/components/weather/websocket_api.py", line 103, in ws_subscribe_forecast await entity.async_update_listeners({forecast_type}) File "/usr/src/homeassistant/homeassistant/components/weather/init.py", line 987, in async_update_listeners native_forecast_list: list[Forecast] | None = await getattr( ^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/weather/init.py", line 1236, in async_forecast_twice_daily return await self._async_forecast("twice_daily") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/weather/init.py", line 1220, in _async_forecast list[Forecast] | None, getattr(self, f"_asyncforecast{forecast_type}")() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/nws/weather.py", line 282, in _async_forecast_twice_daily return self._forecast(self.nws.forecast, DAYNIGHT) ^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 450, in forecast return self._convert_forecast(forecast) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 513, in _convert_forecast time, weather = parse_icon(forecast_entry["icon"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 181, in parse_icon time = icon_list[5]
IndexError: list index out of range
Yup, my forecast is also stuck on June 20th and it led me here..... after all the pain converting to the new format, I honestly thought we finally had a reliable solution.
So I'm getting data when they get everything run, except forecast. But it does look like the Home Assistant WebSocket API is throwing the error for the weather. I'm getting one issued with the Firefox browser:
Logger: homeassistant.components.websocket_api.http.connection Source: components/nws/weather.py:282 integration: Home Assistant WebSocket API (documentation, issues) First occurred: June 26, 2024 at 11:55:31 PM (2 occurrences) Last logged: June 26, 2024 at 11:57:56 PM
[547650102912] Error handling message: Unknown error (unknown_error) RA Kish from 192.168.1.32 (Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0) Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/websocket_api/decorators.py", line 27, in _handle_async_response await func(hass, connection, msg) File "/usr/src/homeassistant/homeassistant/components/weather/websocket_api.py", line 103, in ws_subscribe_forecast await entity.async_update_listeners({forecast_type}) File "/usr/src/homeassistant/homeassistant/components/weather/init.py", line 987, in async_update_listeners native_forecast_list: list[Forecast] | None = await getattr( ^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/weather/init.py", line 1236, in async_forecast_twice_daily return await self._async_forecast("twice_daily") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/weather/init.py", line 1220, in _async_forecast list[Forecast] | None, getattr(self, f"_asyncforecast{forecast_type}")() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/nws/weather.py", line 282, in _async_forecast_twice_daily return self._forecast(self.nws.forecast, DAYNIGHT) ^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 450, in forecast return self._convert_forecast(forecast) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 513, in _convert_forecast time, weather = parse_icon(forecast_entry["icon"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 181, in parse_icon time = icon_list[5]
IndexError: list index out of range
The NWS integration seems a lost cause. I took direct control and I'm now doing direct, periodic calls to the NWS api for the data I consume and it works flawlessly.
Not sure why this integration has become so unreliable, but similar to @jazzmonger, I switched to using a simple REST call:
rest:
- resource: "https://api.weather.gov/gridpoints/LOX/148,50/forecast?units=us"
headers:
accept: application/geo+json
sensor:
- name: "Raw NWS Forecast"
unique_id: sensor.nws_raw
json_attributes_path: properties
json_attributes:
- periods
value_template: "{{value_json.properties.updateTime}}"
This dumps the forecast data into the attributes of a sensor, which gives me everything I need. I believe you could also use a Template Weather to get all the way back to having a weather entity.
yup. thread is here: https://community.home-assistant.io/t/help-with-formatting-payload-results-from-a-restful-service-call/742688/7
LOTS of folks are seeing the same thing. We've all given up on the integration.
The problem
Lately, I have begun to intermittently receive errors from the NWS integration as follows:
[homeassistant.components.nws.coordinator] Error fetching NWS observation station XXXX data: Observation received with no data.
This is intermittent, and after it occurs the weather sensor shows Unavailable
What version of Home Assistant Core has the issue?
core-2024.6.1
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant OS
Integration causing the issue
NWS
Link to integration documentation on our website
https://www.home-assistant.io/integrations/nws/
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
No response
Additional information
No response