home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
69.72k stars 28.88k forks source link

NWS Integration: Observation received with no data #119114

Open jhemak opened 3 weeks ago

jhemak commented 3 weeks ago

The problem

Lately, I have begun to intermittently receive errors from the NWS integration as follows:

[homeassistant.components.nws.coordinator] Error fetching NWS observation station XXXX data: Observation received with no data.

This is intermittent, and after it occurs the weather sensor shows Unavailable

What version of Home Assistant Core has the issue?

core-2024.6.1

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

NWS

Link to integration documentation on our website

https://www.home-assistant.io/integrations/nws/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

home-assistant[bot] commented 3 weeks ago

Hey there @matthewflamm, @kamiyo, mind taking a look at this issue as it has been labeled with an integration (nws) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `nws` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign nws` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


nws documentation nws source (message by IssueLinks)

gdgeist commented 3 weeks ago

Exact same issue here

MatthewFlamm commented 3 weeks ago

See also #117533. This is caused by NWS server caching/flakiness resulting in old observation data. We ask for data that is at most 70 minutes old. We also cache data for up to an hour. So we treat observation data as valid for 60-130 minutes depending on timing of (in)successful API calls. Unfortunately, sometimes this results in no data being available after this time period due to API flakiness. We could consider increasing these time periods, however we will end up with stale data in that case.

kamiyo commented 3 weeks ago

@MatthewFlamm maybe we should change the error to a warning or info? Since the integration is behaving as expected and the error is upstream. That way people don't break out at seeing an error. Also maybe we can add message that says cached data is being used, and the integration will retry in XX minutes.

MatthewFlamm commented 3 weeks ago

The homeassistant machinery considers errors upstream to be logged as errors. For example, a 502 response from NWS is clearly an upstream error amd would be logged in homeassistant as an error.

https://github.com/home-assistant/core/blob/6df7c34aa290616159ab16f59778bfc73b81f5e7/homeassistant/helpers/update_coordinator.py#L321-L326

We could write our own logic for handling all types of errors for nws, but this is a lot of maintenance burden. Further if we bypass the built-in machinery for logging, we also bypass the built-in machinery for availability of the entities, among other things. More maintenance burden. I have been moving the integration to use the built-in machinery as much as possible to make the integration less brittle to changes in core.

kamiyo commented 3 weeks ago

Okay, got it. Just wondering if we could somehow mitigate users reporting this issue when there's not really anything we can do about it. I feel like it happens usually when HA is updated because that's when people usually restart their machines and then the cached data isn't there.

kevinhaas commented 2 weeks ago

People don't care that the data might be a little stale as it's most likely still very accurate data, and stale data in the case of weather is better than nothing. This is very common with the NWS API, so letting it stay the way it currently is will not work. That timeout is way too short.

I've never seen these sensors go unavailable before upgrading my HA the other day. I was a bit behind and have so many things hooked into the weather forecast, I was waiting to upgrade to the new service call approach when I had some time. So this also didn't happen before changes on HAs side from what I can tell. Blaming NWS 100% is not fair.

I guess it's just more template/trigger sensors I have to make to work around weirdly opinionated functionality in HA. I'll ignore unavailable myself and keep the old data.

MatthewFlamm commented 2 weeks ago

I'm open to suggestions on appropriate timing. The old way of the integration was not good as you could have stale data being shown for days.

ccm19931993 commented 2 weeks ago

I totally agree with the above comment from kevinhaas. 1 hour stale data is better than none. This started when upgrading to either core 2024.6.1 or 2024.6.2 cant remember exactly.

MatthewFlamm commented 2 weeks ago

The timing would time out the data at 120-130 minutes assuming NWS API isn't flaky. The 60 minute side is assuming NWS API has an issue. What are the proposals here?

kamiyo commented 2 weeks ago

Looking at my sensors, the longest time I've had unavailable is just under 4 hours. What about 6 or 12 hours? I think even up to a day is acceptable for my use case.

tbaron commented 2 weeks ago

Are these logs a symptom of the "70 min request" you mentioned? I see start time in the request, but otherwise not seeing that. Observations API returns 200 with an empty response (HA 2024.6.2).

Looks related to https://github.com/MatthewFlamm/pynws/issues/194 and https://github.com/home-assistant/core/pull/115857, but it's still repro'ing with those changes.

I added a log for the full request. Would it make sense to include that by default (url, params, headers)?

2024-06-15 11:10:15.708 DEBUG (MainThread) [pynws.raw_data] Request: https://api.weather.gov/stations/xxxx/observations/ Params: {'start': '2024-06-15T17:00:15+00:00'} Headers: {'accept': 'application/geo+json', 'User-Agent': 'pynws xxxx@xxxx.xxx'}
2024-06-15 11:10:16.418 DEBUG (MainThread) [pynws.raw_data] Request for https://api.weather.gov/stations/xxxx/observations/ returned code: 200
2024-06-15 11:10:16.419 DEBUG (MainThread) [pynws.raw_data] Request for https://api.weather.gov/stations/xxxx/observations/ returned header: <CIMultiDictProxy('Server': 'nginx/1.20.1', 'Content-Type': 'application/geo+json', 'Access-Control-Allow-Origin': '*', 'Access-Control-Expose-Headers': 'X-Correlation-Id, X-Request-Id, X-Server-Id', 'X-Request-ID': '...', 'X-Correlation-ID': '...', 'X-Server-ID': 'vm-bldr-nids-apiapp13.ncep.noaa.gov', 'Content-Encoding': 'gzip', 'Content-Length': '94', 'Cache-Control': 'public, max-age=300, s-maxage=120', 'Expires': 'Sat, 15 Jun 2024 18:15:16 GMT', 'Date': 'Sat, 15 Jun 2024 18:10:16 GMT', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'X-Edge-Request-ID': '...', 'Vary': 'Accept,Feature-Flags,Accept-Language', 'Strict-Transport-Security': 'max-age=31536000 ; includeSubDomains ; preload')>
2024-06-15 11:10:16.420 DEBUG (MainThread) [pynws.raw_data] Request for https://api.weather.gov/stations/xxxx/observations/ returned data: {'@context': {'@version': '1.1'}, 'type': 'FeatureCollection', 'features': []}
2024-06-15 11:10:16.421 ERROR (MainThread) [homeassistant.components.nws.coordinator] Error fetching NWS observation station xxxx data: Observation received with no data.
2024-06-15 11:10:16.421 DEBUG (MainThread) [homeassistant.components.nws.coordinator] Finished fetching NWS observation station xxxx data in 0.713 seconds (success: False)
mtl010957 commented 2 weeks ago

I'm running with these values now to test, seems to be working for my particular locations:

OBSERVATION_VALID_TIME = timedelta(minutes=120)
FORECAST_VALID_TIME = timedelta(minutes=120)
# A lot of stations update once hourly plus some wiggle room
UPDATE_TIME_PERIOD = timedelta(minutes=180)

Three suggestions:

  1. Allow these valid time settings to be set by the user based on their need for up-to-date information, since some use cases are more tolerant of unknown data than others, may be better to have a bit stale data rather than no data at all.

  2. Add the timestamp returned from the NWS API to the attribution (or return it as a separate data element), thus giving the user a way to know exactly how timely the data are. Something like: Observations at 2024-06-15 13:53:00-04:00 from Station KPTK-Pontiac, Oakland County International Airport Last changed time in HA seems to give the proper age but just explicitly exposing the timestamp on the data as received might also be helpful.

  3. Change the error message to be more correct, since the API is actually not returning an observation with no data, it's really returning an observation with data that are too old based on our requirements. Maybe a message like: Error fetching NWS observation station xxxx data: Observation received with data from 2024-06-15 13:55:00-04:00, older than required 120 minutes.

kevinhaas commented 2 weeks ago

Great suggestions. I've seen ~2 hours max for the unavailable state also, so I'll try modifying the integration like you did for now until a fix is available.

Also, when it's in the unavailable state, it seems I can't pull any forecast data either. I guess that makes sense following the existing logic, but feels weird I can't pull any forecast at all since I'm not explicitly told it's out of date or anything. The error was just something like "service call does not match entity".

lordwizzard commented 1 week ago

The NWS system is really messed up! It has been reporting the same temp 86 degrees since Sunday 06/16/2024 for my location at it's web site for radar/temp/alerts. So most of it's reporting stations seem to be having issues with temps and rain fall reporting. Seems their IT department is out to lunch.

kevinhaas commented 1 week ago

Mine is working fine after making the updates posted above. Sounds like another issue or just an outage for those stations. They should have something posted somewhere if there is an outage.

MatthewFlamm commented 1 week ago

The NWS system is really messed up! It has been reporting the same temp 86 degrees since Sunday 06/16/2024 for my location at it's web site for radar/temp/alerts. So most of it's reporting stations seem to be having issues with temps and rain fall reporting. Seems their IT department is out to lunch.

What version of homeassistant are you running with the stuck data?

lordwizzard commented 1 week ago

The NWS system is really messed up! It has been reporting the same temp 86 degrees since Sunday 06/16/2024 for my location at it's web site for radar/temp/alerts. So most of it's reporting stations seem to be having issues with temps and rain fall reporting. Seems their IT department is out to lunch.

What version of homeassistant are you running with the stuck data?

This is the NWS site with stuck temp: https://radar.weather.gov/?settings=v1_eyJhZ2VuZGEiOnsiaWQiOiJ3ZWF0aGVyIiwiY2VudGVyIjpbLTg0LjkyNiw0My41MTFdLCJsb2NhdGlvbiI6Wy04NC4yMSw0My42NDldLCJ6b29tIjo3LjIxODg5NTQ5MTkyNzU2M30sImFuaW1hdGluZyI6dHJ1ZSwiYmFzZSI6InN0YW5kYXJkIiwiYXJ0Y2MiOmZhbHNlLCJjb3VudHkiOmZhbHNlLCJjd2EiOmZhbHNlLCJyZmMiOmZhbHNlLCJzdGF0ZSI6ZmFsc2UsIm1lbnUiOnRydWUsInNob3J0RnVzZWRPbmx5IjpmYWxzZSwib3BhY2l0eSI6eyJhbGVydHMiOjAuOCwibG9jYWwiOjAuNiwibG9jYWxTdGF0aW9ucyI6MC44LCJuYXRpb25hbCI6MC44NH19

My local reporting station: https://www.willyweather.com/mi/midland-county/jack-barstow-airport.html config_entry-nws-534b9c928edd160589dd5eb836a30d1e.json

image

My HA is: Core 2024.6.3 Supervisor 2024.06.0 Operating System 12.4 Frontend 20240610.1

NWS integration: the latest from the repository

kevinhaas commented 1 week ago

I saw some issues just now and the NWS posted a bulletin that may explain it. I'm not sure if any of these changes affect the integration or the API is just in a bad state right now, but the error I get when calling get_forecasts is:

websocket_api script: Error executing script. Unexpected error for call_service at pos 1: list index out of range

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 526, in _async_step
    await getattr(self, handler)()
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 763, in _async_call_service_step
    response_data = await self._async_run_long_action(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 726, in _async_run_long_action
    return await long_task
           ^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2741, in async_call
    response_data = await coro
                    ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2784, in _execute_service
    return await target(service_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 977, in entity_service_call
    single_response = await _handle_entity_call(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 1049, in _handle_entity_call
    result = await task
             ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/weather/__init__.py", line 1054, in async_get_forecasts_service
    native_forecast_list = await weather.async_forecast_hourly()
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/weather/__init__.py", line 1231, in async_forecast_hourly
    return await self._async_forecast("hourly")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/weather/__init__.py", line 1220, in _async_forecast
    list[Forecast] | None, getattr(self, f"_async_forecast_{forecast_type}")()
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/nws/weather.py", line 279, in _async_forecast_hourly
    return self._forecast(self.nws.forecast_hourly, HOURLY)
                          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 456, in forecast_hourly
    return self._convert_forecast(forecast)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 513, in _convert_forecast
    time, weather = parse_icon(forecast_entry["icon"])
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 181, in parse_icon
    time = icon_list[5]
           ~~~~~~~~~^^^
IndexError: list index out of range
MatthewFlamm commented 1 week ago

There was a change in the API which wasn't in the bulletin.... This needs to be fixed.

MatthewFlamm commented 1 week ago

I'm hesitant to fix it immediately in pynws as sometimes NWS will revert something like this.

lordwizzard commented 1 week ago

I'm hesitant to fix it immediately in pynws as sometimes NWS will revert something like this.

Could a condition test be used, looking for empty data return, so if one get no data it tries the new fix if that actually fixes the no data, and then it can fall back to old method if api change is reverted.

Just something to think about.

MatthewFlamm commented 1 week ago

I'm hesitant to fix it immediately in pynws as sometimes NWS will revert something like this.

Could a condition test be used, looking for empty data return, so if one get no data it tries the new fix if that actually fixes the no data, and then it can fall back to old method if api change is reverted.

Just something to think about.

We are starting to mix issues in this thread. Let's drop talking about the IndexError: list index out of range error just reported. If anyone wants to discuss it, it can go in another thread. This is already being handled in https://github.com/MatthewFlamm/pynws/issues/207 and will hopefully go into a fix in homeassistant soon. After I looked at it closer, there is an elegant solution for handling this in both cases.

For the new data issue. The issue with the data itself from NWS being several days old is a not infrequent problem that some users run into. This is one reason why the integration now filters out old data. NWS must fix this problem, we have no control over it. In some other cases, the data doesn't update on the order of hours. The current 70 minute window might be too short, and I'm thinking that something in the 2-3 hr-ish range is reasonable middle ground. I could be considered to go longer, but then I worry that we are providing data that is too stale.

lordwizzard commented 1 week ago

As of 06/21/2024 4:15 PM Looks like the NWS has done some repair work to it's data collection system, it is now reporting data for my main weather site. Will continue to observe. But almost 2 weeks without, even their own sites seem to be working again. A happy weather watcher here again.

Update: 06/21/2024 6:51 PM Now I'm seeing the error about forecast that @kevinhaas posted about earlier in the issue thread

Logger: homeassistant.components.websocket_api.http.connection Source: components/nws/weather.py:282 integration: Home Assistant WebSocket API (documentation, issues) First occurred: 4:49:37 PM (9 occurrences) Last logged: 6:44:42 PM

[547298063360] Error handling message: Unknown error (unknown_error) RA Kish from 192.168.1.32 (Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0) Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/websocket_api/decorators.py", line 27, in _handle_async_response await func(hass, connection, msg) File "/usr/src/homeassistant/homeassistant/components/weather/websocket_api.py", line 103, in ws_subscribe_forecast await entity.async_update_listeners({forecast_type}) File "/usr/src/homeassistant/homeassistant/components/weather/init.py", line 987, in async_update_listeners native_forecast_list: list[Forecast] | None = await getattr( ^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/weather/init.py", line 1236, in async_forecast_twice_daily return await self._async_forecast("twice_daily") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/weather/init.py", line 1220, in _async_forecast list[Forecast] | None, getattr(self, f"_asyncforecast{forecast_type}")() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/nws/weather.py", line 282, in _async_forecast_twice_daily return self._forecast(self.nws.forecast, DAYNIGHT) ^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 450, in forecast return self._convert_forecast(forecast) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 513, in _convert_forecast time, weather = parse_icon(forecast_entry["icon"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 181, in parse_icon time = icon_list[5]


IndexError: list index out of range
jazzmonger commented 1 week ago

Yup, my forecast is also stuck on June 20th and it led me here..... after all the pain converting to the new format, I honestly thought we finally had a reliable solution.

image
lordwizzard commented 6 days ago

So I'm getting data when they get everything run, except forecast. But it does look like the Home Assistant WebSocket API is throwing the error for the weather. I'm getting one issued with the Firefox browser:

Logger: homeassistant.components.websocket_api.http.connection Source: components/nws/weather.py:282 integration: Home Assistant WebSocket API (documentation, issues) First occurred: June 26, 2024 at 11:55:31 PM (2 occurrences) Last logged: June 26, 2024 at 11:57:56 PM

[547650102912] Error handling message: Unknown error (unknown_error) RA Kish from 192.168.1.32 (Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0) Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/websocket_api/decorators.py", line 27, in _handle_async_response await func(hass, connection, msg) File "/usr/src/homeassistant/homeassistant/components/weather/websocket_api.py", line 103, in ws_subscribe_forecast await entity.async_update_listeners({forecast_type}) File "/usr/src/homeassistant/homeassistant/components/weather/init.py", line 987, in async_update_listeners native_forecast_list: list[Forecast] | None = await getattr( ^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/weather/init.py", line 1236, in async_forecast_twice_daily return await self._async_forecast("twice_daily") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/weather/init.py", line 1220, in _async_forecast list[Forecast] | None, getattr(self, f"_asyncforecast{forecast_type}")() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/nws/weather.py", line 282, in _async_forecast_twice_daily return self._forecast(self.nws.forecast, DAYNIGHT) ^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 450, in forecast return self._convert_forecast(forecast) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 513, in _convert_forecast time, weather = parse_icon(forecast_entry["icon"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pynws/simple_nws.py", line 181, in parse_icon time = icon_list[5]


IndexError: list index out of range
jazzmonger commented 6 days ago

The NWS integration seems a lost cause. I took direct control and I'm now doing direct, periodic calls to the NWS api for the data I consume and it works flawlessly.

krkeegan commented 1 day ago

Not sure why this integration has become so unreliable, but similar to @jazzmonger, I switched to using a simple REST call:

rest:
    - resource: "https://api.weather.gov/gridpoints/LOX/148,50/forecast?units=us"
      headers:
        accept: application/geo+json
      sensor:
        - name: "Raw NWS Forecast"
          unique_id: sensor.nws_raw
          json_attributes_path: properties
          json_attributes:
            - periods
          value_template: "{{value_json.properties.updateTime}}"

This dumps the forecast data into the attributes of a sensor, which gives me everything I need. I believe you could also use a Template Weather to get all the way back to having a weather entity.

jazzmonger commented 1 day ago

yup. thread is here: https://community.home-assistant.io/t/help-with-formatting-payload-results-from-a-restful-service-call/742688/7

LOTS of folks are seeing the same thing. We've all given up on the integration.