V3 api, more flexible formulas, persist storage

tsbernar commented 1 year ago

Add support for custom formulas and multiple factors. Make multiple sensors instead of extra state attributes. Make sensors SensorStateClass.MEASUREMENT for graphing and statistics support. Persist the rolling window of data to save API calls after restarts and allow for a larger lookback window.

tsbernar commented 1 year ago

Rebased with main repo and added a few more features:

*Implement new sensor type, total_rain. -Takes a start and end offset and returns total rainfall between them.

For example, start_hour = -72, end_hour = 0 will show the total rainfall in the last 72 hours. Configurable lookback days, default to 30 better API rate limit handling with configurable settings
- backfills will now happen more slowly as permitted by limits
- limit tracking data is persisted
- add some tests
- fix merge issues
- run backfill tasks in the background, default to 10 requests on every 30s cycle, bound by per hour and per day API rate limits. Reserve enough requests to always be able to request next 24hrs of data.

petergridge commented 1 year ago

With the new repository I get

2023-04-23 02:53:41.114 ERROR (MainThread) [homeassistant.components.sensor] Error while setting up openweathermaphistory platform for sensor
Traceback (most recent call last):
  File "/workspaces/core/homeassistant/helpers/entity_platform.py", line 304, in _async_setup_platform
    await asyncio.shield(task)
  File "/workspaces/core/config/custom_components/openweathermaphistory/sensor.py", line 172, in async_setup_platform
    await _async_setup_v3_entities(add_entities, hass, config, units)
  File "/workspaces/core/config/custom_components/openweathermaphistory/sensor.py", line 233, in _async_setup_v3_entities
    await sensor_registry.async_load()
  File "/workspaces/core/config/custom_components/openweathermaphistory/sensor.py", line 403, in async_load
    await self._weather_history.async_load()
  File "/workspaces/core/config/custom_components/openweathermaphistory/weatherhistory.py", line 103, in async_load
    if data["hour_rolling_window"]:
KeyError: 'hour_rolling_window'

I guess your json structure has changed, any hints to clear the persisted data

tsbernar commented 1 year ago

Dang. I’ll fix that, but I’m away from my computer right now.

For now, you should be able to delete the file under .storage/openweathermaphistory.history (STORAGE_KEY in the const file)

petergridge commented 1 year ago

That helped, moving onto the next issue :) I love testing other peoples code, sure beats people finding bugs in mine.

2023-04-23 03:16:30.565 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/workspaces/core/config/custom_components/openweathermaphistory/weatherhistory.py", line 156, in backfill_chunk
    await self._async_update_for_datetime(end_dt)
  File "/workspaces/core/config/custom_components/openweathermaphistory/weatherhistory.py", line 231, in _async_update_for_datetime
    return self.add_observation(data)
  File "/workspaces/core/config/custom_components/openweathermaphistory/weatherhistory.py", line 239, in add_observation
    rain = json_data["rain"]["1h"] if "rain" in json_data else 0
KeyError: '1h'

interesting that the data returned from the API has 'rain': {'3h': 1} not 'rain': {'1h': 1}

{'dt': 1681714800, 'sunrise': 1681720747, 'sunset': 1681761041, 'temp': 18.48, 'feels_like': 18.08, 'pressure': 1013, 'humidity': 65, 'dew_point': 11.79, 'clouds': 34, 'wind_speed': 5.33, 'wind_deg': 338, 'wind_gust': 5.73, 'weather': [{'id': 500, 'main': 'Rain', 'description': 'light rain', 'icon': '10n'}], 'rain': {'3h': 1}}

Another question, what is the behaviour if I set up two sensors with different locations? How will the persistent storage work and API counts.

petergridge commented 1 year ago

if it helps this is the URL/location I am running for:

url: https://api.openweathermap.org/data/3.0/onecall/timemachine?lat=-33.8715&lon=-33.8715&dt=1681714800&appid={API_KEY}&units=metric

weatherhist.py line 68, you have both lat and lon using latitude.

petergridge commented 1 year ago

https://openweathermap.org/history tells me that 3h is the rainfall for the last 3 hrs, so we need to subtract the previous 2 hours rainfall to get this hours rainfall. Why would they do this to us?!

tsbernar commented 1 year ago

Thanks! This is all really helpful debugging info.

Very annoying that they have the '3h' rain samples; I hadn't come across that yet in my location and didn't see it in the v3 docs. It looks like they mix '3h' and '1h' and then also report the same number 3 times in a row for the '3h'

I'm not quite sure what to make of this; what do you think the "correct" total rain is in this period?

Maybe 1.0 + 1.13 + 1.0 + 1.31 ?

tsbernar commented 1 year ago

For your other comments:

Fixed the json loading so it won't break when the schema changes
Fixed the default longitude loading
I like the idea of including the forecast! I think it gets trickier with integrating forecast data into you're irrigation logic instead of just historical data though. For example, if the forecast shows it is likely to rain later today, maybe you skip this morning's irrigation.. but then tomorrow, if it turns out it did not actually rain, you'd want to tweak your irrigation to make up for the one you skipped in anticipation of rain. My initial thought is to expose a sensor for a historical data factor and a sensor for a forecast data factor, and then in the irrigation logic, you would use forecast rain + past rain + past irrigation to make your decision. I saw you had some other irrigation projects that I haven't yet looked at, so maybe you're already thinking about this?
For multiple locations, you can set up a config with multiple entries like this and split your API limits across them:

sensor: 
  - platform: openweathermaphistory
    api_key: 'key'
    v3_api: True
    max_api_calls_per_hour: 30
    max_api_calls_per_day: 200
    lookback_days: 30
    resources:
      - name: rainfactor_default_location
        type: default_factor
        data:
          watertarget: 0.5
      - name: rainfactor_with_custom
        type: custom
        data:
          formula: 'max( (0.5 - day0rain - day1rain/2 - day2rain/4 - day3rain/8 - day4rain/16) / 0.5, 0)'
      - name: 48hr_rain
        type: custom
        data:
          formula: day0rain + day1rain
  - platform: openweathermaphistory
    api_key: 'key'
    v3_api: True
    max_api_calls_per_hour: 30
    max_api_calls_per_day: 200
    lookback_days: 6
    latitude:  -33.8302547
    longitude: 151.1516128
    resources:
      - name: rainfactor_aus_locatoin
        type: default_factor
        data:
          watertarget: 0.5
      - name: rainfactor_with_custom_aus
        type: custom
        data:
          formula: 'max( (0.5 - day0rain - day1rain/2 - day2rain/4 - day3rain/8 - day4rain/16) / 0.5, 0)'
      - name: 24hr_rain_aus
        type: custom
        data:
          formula: day0rain

The persistence will now store in a file with the location included in the name.

I'm open to suggestions on how to handle multiple locations better, I only have 1 location for mine. Maybe you could override the location at each sensor instead of setting up a new platform like this?

sensor: 
  - platform: openweathermaphistory
    api_key: 'key'
    v3_api: True
    max_api_calls_per_hour: 60
    max_api_calls_per_day: 400
    lookback_days: 30
    resources:
      - name: rainfactor_default_location
        type: default_factor
        data:
          watertarget: 0.5
      - name: rainfactor_with_custom
        type: custom
        data:
          formula: 'max( (0.5 - day0rain - day1rain/2 - day2rain/4 - day3rain/8 - day4rain/16) / 0.5, 0)'
      - name: 48hr_rain
        type: custom
        data:
          formula: day0rain + day1rain
      - name: rainfactor_aus_locatoin
        type: default_factor
        latitude:  -33.8302547
        longitude: 151.1516128
        data:
          watertarget: 0.5
      - name: rainfactor_with_custom_aus
        type: custom
        latitude:  -33.8302547
        longitude: 151.1516128
        data:
          formula: 'max( (0.5 - day0rain - day1rain/2 - day2rain/4 - day3rain/8 - day4rain/16) / 0.5, 0)'
      - name: 24hr_rain_aus
        latitude:  -33.8302547
        longitude: 151.1516128
        type: custom
        data:
          formula: day0rain

Maybe we should also lower the default API rate limit settings so that 2-3 locations can be supported without having to mess with including rate limits in the config?

petergridge commented 1 year ago

For the '3h' issue I would simply divide the value by 3 that should be accurate enough and given they provide 3 periods with the same data it logically makes sense.

Add the forecast as a new sensor makes sense, I was not planning to use it in my factor calculation but it opens up a lot of opportunities for the future

I think the first option for multiple sensors is best as it matches the way HA supports sensors.

My preference is defaulting to 5 days of data to support the UI and calculation model limiting the start up load to only 120 calls for each sensor and then letting it build up naturally to a longer 30 day limit.
Provide service to download additional days of history so advanced users can get data faster if required
The end user can then be responsible for not overdoing the calls

petergridge commented 1 year ago

I can see that there is a lot happening, the calls are made regularly but, no sensor is created in HA. Are you seeing the same at your end?

tsbernar commented 1 year ago

The sensors are working on my end, could you share the config you’re using ?

petergridge commented 1 year ago

I copied your GIT repository, I can try downloading again.

I'm using the docker dev container and Visual Studio Code as my environment.

tsbernar commented 1 year ago

Oh I was talking about the config for your sensor so I can try to replicate on my end

petergridge commented 1 year ago

Ah, sorry, here is the yaml

sensor:
  - platform: openweathermaphistory
    name: 'rainfactor new'
    api_key: 6e5dd5b87a55018adee10ab2c7ed6f96
    v3_api: True
    lookback_days: 5

tsbernar commented 1 year ago

Got it, so you’ll need to add individual sensors under the resources list. (Borrowed the config naming from https://www.home-assistant.io/integrations/systemmonitor/)

The way it works now is you have one “platform” per lat/lon location, and then each “platform” can have multiple sensors under its “resources” list. Maybe we should just add the default sensor if none are specified to shrink down the minimal config?

Something like this should work to just give you the default sensor on your default location:

sensor:
  - platform: openweathermaphistory
    api_key: 6e5dd5b87a55018adee10ab2c7ed6f96
    lookback_days: 5
    resources:
      - name: new_rainfactor_sensor
        type: default_factor

Here’s a full example with 2 locations and multiple sensors at each

sensor: 
  - platform: openweathermaphistory
    api_key: 'key'
    max_api_calls_per_hour: 30
    max_api_calls_per_day: 200
    lookback_days: 30
    resources:
      - name: rainfactor_default_location
        type: default_factor
        data:
          watertarget: 0.5
      - name: rainfactor_with_custom
        type: custom
        data:
          formula: 'max( (0.5 - day0rain - day1rain/2 - day2rain/4 - day3rain/8 - day4rain/16) / 0.5, 0)'
      - name: 48hr_rain
        type: custom
        data:
          formula: day0rain + day1rain
  - platform: openweathermaphistory
    api_key: 'key'
    max_api_calls_per_hour: 30
    max_api_calls_per_day: 200
    lookback_days: 6
    latitude:  -33.8302547
    longitude: 151.1516128
    resources:
      - name: rainfactor_aus_locatoin
        type: default_factor
        data:
          watertarget: 0.5
      - name: rainfactor_with_custom_aus
        type: custom
        data:
          formula: 'max( (0.5 - day0rain - day1rain/2 - day2rain/4 - day3rain/8 - day4rain/16) / 0.5, 0)'
      - name: 24hr_rain_aus
        type: custom
        data:
          formula: day0rain

tsbernar commented 1 year ago

The reason for splitting it this way is to allow all the sensors at the same location to share the same set of data / api calls. Though we could also just achieve that on the backend if you think its more desirable to just have one “platform” configured and specify different locations on the sensor level in the resources list.

tsbernar commented 1 year ago

Responding to a few other comments:

For the '3h' issue I would simply divide the value by 3 that should be accurate enough and given they provide 3 periods with the same data it logically makes sense.

Makes sense to me; I've added this as well as a warning log message if we see anything else unexpected in there. Hopefully, "1h" and "3h" is all we'll see.

My preference is defaulting to 5 days of data to support the UI and calculation model limiting the start up load to only 120 calls for each sensor and then letting it build up naturally to a longer 30 day limit. Provide service to download additional days of history so advanced users can get data faster if required The end user can then be responsible for not overdoing the calls

I've made a change to the default API rate limits that should roughly accomplish this, though without a separate service. The default lookback is still 30 days, which is the maximum amount of data that we will keep in the rolling window and persistent store, but we will only backfill the first 5 shortly after startup. The way the backfilling works now is:

Every 30s SCAN_INTERVAL: 1) We check if our (30-day default) lookback window is full. If it's not full, we check if we have available API limits for the current hour and the current day; if we do, we will send off a background task to backfill up to 10 hours (or less if constrained by the API limits). 2) We check if we need to do a live update for the current hour. Step 1 always reserves enough limits so that we will be able to do the live updates once per hour.

The current limits are set to allow a backfill of 5 days in the first hour after a restart. In practice, this happens in the first 6 mins of the hour at a rate of 10 hours backfilled every 30s interval, then no backfilling for the rest of the hour until our initial requests roll off. The remaining 25 days of the full lookback window will then be slowly filled in over the next couple of days as daily and hourly limits permit. The default limits allow for up to 3 locations at a time without getting into paid API requests, assuming 0 persisted data at the start and all need a full backfill. If you already have a location configured, adding more should be okay, as the existing locations will only be using 24 requests per day once the backfills are complete.

Another option could be to have 2 lookback windows configured, a backfill window and a lookback window. The backfill window could be set to 5 days in your example, and the lookback set to 30. In this case we would only backfill the 5 days on startup (as permitted by the limits), but we will keep up to 30days of history as time passes and we naturally add more samples from live requests

petergridge commented 1 year ago

You have been busy, I like what you have done and I am learning something new from your coding, I still think in COBOL :)

Maybe we should just add the default sensor if none are specified to shrink down the minimal config?

That makes sense, I believe that having a default resource will make it more user friendly, less yaml = less mistakes and most users just run with default settings. we also need to consider the complexity that is needed to build into the config flow. If you are looking for an example config flow (all be it overly complex) the irrigation custom component in my repository has config flow.

I also like this option, if a user requests more than 5 days your existing rules will kick in.

Another option could be to have 2 lookback windows configured, a backfill window and a lookback window. The backfill window could be set to 5 days in your example, and the lookback set to 30. In this case we would only backfill the 5 days on startup (as permitted by the limits), but we will keep up to 30days of history as time passes and we naturally add more samples from live requests

I would consider getting the numeric value from the key and using it as the denominator, just to future proof it.

I've added this as well as a warning log message if we see anything else unexpected in there. Hopefully, "1h" and "3h" is all we'll see.

What are your plans to use the 30 days of data?

I have a template sensor that updates every 24 hours at midnight to capture that days details so the information is captured in HA History so I can present a graph. I can see this as one of the sensors types, the max temp, min temp, total rain and snow, average humidity value for a calendar day. This is a feature a lot of users have been looking for.
for more granularity the rainfall for an hour is now possible given the hourly nature of V3. These types of sensors will let us use HA's history capability to capture the long term stats,
We should be able to use SQL sensors to get and manipulate the long term data if required.
We may even be able to get away from providing all the attributes and present a card from history data rather than the attributes, not that I have seen anyone do that, but I haven't looked very hard.

tsbernar commented 1 year ago

Nice, we're both learning here! This is the first HA integration I've worked on, and it's been much easier to see how it all works starting with an integration that already works than starting from scratch. (Just bought my first house, and have been a bit too excited about all the home automation things)

That makes sense, I believe that having a default resource will make it more user friendly, less yaml = less mistakes and most users just run with default settings. we also need to consider the complexity that is needed to build into the config flow. If you are looking for an example config flow (all be it overly complex) the irrigation custom component in my repository has config flow.

Agreed on the default. I was just starting to struggle with the config flow today, so will take a look at the irrigation component. I've been meaning to take a look at that anyway as irrigation automation is next up for me after getting this rain data. Do you have any other tips for irrigation generally? Using moisture sensors or anything else like that?

I would consider getting the numeric value from the key and using it as the denominator, just to future-proof it.

Makes sense to me. I was just worried that the division by the numeric value might not always work. I'm used to dealing with software where if something unexpected happens, you probably want to know about it right away and stop.. probably not our ideal behavior in this case, and there are other users to think about.

What are your plans to use the 30 days of data?

Mainly for UI, I have a vague idea of what I want a custom card to look like for displaying both irrigation time and rainfall over time, but I have yet to dig into the weeds of how hard that will be to make. I was thinking of using hourly data for recent days and a monthly view.

I have a template sensor that updates every 24 hours at midnight to capture that days details so the information is captured in HA History so I can present a graph. I can see this as one of the sensors types, the max temp, min temp, total rain and snow, average humidity value for a calendar day. This is a feature a lot of users have been looking for.

I think this should be straightforward with a custom type sensor after we expose humidity and temp as inputs to the formula, but also a good idea for a new sensor type with easier config.

Agreed on the stats, the HA history is great.. I just don't have enough history yet

I think a custom card that uses the internal state rather than HA history would give a lot of flexibility and allow us to display backdated data.

petergridge commented 1 year ago

I was just worried that the division by the numeric value might not always work.

As long as an error is handled and the control does not crash it should be fine, if not a valid value ignore it. But on that note I purposely exceeded the call limit to see how my version handles it, and I was thinking an error sensor type that provides the error details would be great, I can put on the dashboard with a condition to show only when it is active. Also the other sensors would benefit with a default value when they are in error so the irrigation system still gets a value, I could/should handle it at that end as well.

Do you have any other tips for irrigation generally? Using moisture sensors or anything else like that?

The irrigation control has had pretty good take up since I put it on HACS, it only took me 5 years before I got around to publishing it. I always get a rush of requests as the northern hemisphere watering season kicks in, every time I think I have all the bases covered someone has a good idea. I built it to be simple to configure and provide a functional UI capability that is not technical, since then I have built a card as well again functional rather than fancy. But this weather map history control has been downloaded over 600 time in the last couple of weeks since it was published.

I built 'rainfactor' because I got sick of fiddling with rain and moisture sensors. I built my own ESP based irrigation controller (the box it is in was more expensive than the components) with inputs to support sensors. I had issues with the sensor being in a rain shadow when the wind was blowing and it did not really help to determine how much rain there was. It could rain in the morning and my program runs in the afternoon... my list of grievance's is endless:). Even with moisture sensors it depends on where you place it, one in the lawn, one in pot plants the list goes on, and it was just fiddly so I went with a more the more global method that does not need hardware, that is what the internet is for after all, it has worked well for me.

I found it more reliable to use weather data to reduce watering based on rainfall, if a zone does not water it will check the next day and run if the conditions are met. If you have configured to run every 3 days and it does not water because of the 'rainfactor' it will still check every day until it does need to water, it does not wait another 3 days.

With the additional information and your formulas I can also increase the watering if the temperature is high or stop/reduce it if the temperature is low.

The other usage for your model is to build a template that I can use to alter the frequency of watering or even enable a second program to run if there is an extended or forecast period of hot weather, this will only need a small tweak to the irrigation control.

The work you have done will make this a much better partner for the irrigation control.

I think a custom card that uses the internal state rather than HA history would give a lot of flexibility and allow us to display backdated data.

From what I have seen (not that it is definitive), you can't access the backend data directly from the card, you need to access from the sensor and attributes, to go back 30 days you will need to expose a lot of attributes, this is not the way HA is heading. I exposed them this way as I was to lazy to create many sensors for people to get the information for their own calculations, but you have exposed the formula capability and multiple sensors so I think my attributes are no longer required.

Here is the graph I have now and the config I use, the mean option smooths out the graph to look more appealing. I only have 30 days of history kept to keep the database snappy, I don't want to stress out the PI. Having said that I have a small SSD attached via USB3 and it is very good. I have an automation that runs weekly to clean up and compress the history.

Waiting 30 days was a bit of a pain when I started tracking but it was kinda nice to see the graph fill out over time. For me it was aesthetic rather than functional anyway 5 days was plenty for my purpose.

tsbernar commented 1 year ago

I still need to clean up my config flow code some more before I publish here, but it took me a while to get going so I wanted to give an update on how I think the config flow could work. Here's a screen record demo:

https://user-images.githubusercontent.com/11330651/236645872-1e88bf34-f400-409d-8e73-466b79a76983.mov

petergridge commented 1 year ago

The config flow is looking good, couple of things to consider:

the lat,lon data is not be editable, that is good. but the API and other attributes should be.
The sensor names should not be editable. The sensor name can be changed in HA assuming a unique id is allocated
removal of the data file when the integration is removed I would add validation:
that the Lat Lon is only used once across all instances of the integration
sensor name is unique within the instance of the integration consider automatically allocating the SensorDeviceClass if the formula is simple i.e. only one known element so the unit of measure is set. See below...

I have taken your code (mostly) and put it into my latest checked in source. I have reworked:

the way the entities are created and the API is called
Allowed the allocation of Sensor class information so it allocates the unit of measure and handles the conversion of the information from metric to imperial automatically.
The allocation of a unique id is best as this allow you to modify the name, numeric precision etc from the UI
Modified how the formula's are handled to allow for mor complex templates, I wanted to use the templates to alter watering days.
removed any reference to API v2.5 to simplify the code significantly
Added current observations and forecast data for 8 days
allowed the allocation of attributes to support the current custom card
I am still using pickle to save data, but want to use your method, just haven't tried yet.

tsbernar commented 1 year ago

Thanks. I've just pushed the code with the config flow.

-The same Lat Lon can only be used once across all instances of the integration. -Sensor names are unique across an instance -Validation added for each step. We validate that the API key is valid and that we can call the API, and that the inputs are valid for each sensor type. -I did not add the removal of the data file yet, it's quite small even with a longer lookback window, and it seemed more valuable for saving API calls from having the data persisted if you remove and re-add the integration, at least for testing this has been useful.

petergridge commented 1 year ago

Hi Trevor,

I just pushed out a version 2 of the component, I think it covers most of your requirements, after all I stole a whole lot of your work, thanks.

If you have time let me know what you think and we can add improvements from there.

Cheers Pete

petergridge / openweathermaphistory

V3 api, more flexible formulas, persist storage #14