michaeldavie / env_canada

Environment Canada Weather Data
MIT License
81 stars 20 forks source link

Timestamp meta data #92

Closed gwww closed 3 months ago

gwww commented 3 months ago

Opening this as an issue for a discussion thread.

The ECWeather meta data as best I can tell comes from the top level timestamp. For the daily_forecasts (and other ECWeather bits) it has it's own timestamp in the XML data which is different than the top level timestamp.

This may be important in trying "properly" solve https://github.com/home-assistant/core/pull/121850.

Right now, I'm using the heuristic of "if the first forecast entry is a daily low and the time is between midnight and 5am" then it must be previous day's data. It's an ok heuristic but not perfect since time has timezones and where the person is running their system may not be the same timezone as the weather they are looking at. And there is 4.5 hours of timezone differences across Canada.

So, back to the timestamps. Forecast data has a timestamp of when the forecast was issued. Could it be a better way to tell if the first entry is from yesterday? Here's the thinking (please poke holes in it, cause it's just my crazy thoughts):

forecast_timestamp = timestamp from ForecastGroup, with timezone (e.g. EDT)
start_date_for_building_HA_forecast = forecast_timestamp
if half_days[0] is "low" 
    single entry for the day
else
    two entries for the day
continue with rest of the entries as usual

If that makes some sense then I propose that a new field is added to ECWeather called timestamps. It would be a dict with the keys being conditions, daily_forecasts, hourly_forecasts, and alerts -- matching the names of the respective data. Each timestamp should be in the local timezone, not UTC. That way we tell the freshness of the data relative to where the data comes from, not where you (or your system) happens to be.

The existing metadata field should be left as is for backwards compatibility.

Thoughts? Would you be willing to add it?

gwww commented 3 months ago

And, I have another idea. The timestamps field is still useful, but not necessary for this fix. The new idea is to add timestamps right inside this lib. I will try that out today. It seems like a good fix. It assumes that the first forecast entry is on the day of the timestamp, which I think is a good assumption.

michaeldavie commented 3 months ago

That makes sense, using the forecast timestamp and plain language per-forecast period headings to generate the missing UTC timestamps. If it helps, we can add a dependency on dateparser: https://github.com/scrapinghub/dateparser

gwww commented 3 months ago

Testing it now! I don't think I need dateparser, but good to know its there.

michaeldavie commented 3 months ago

Released in v0.7.2

gwww commented 3 months ago

The HA PR is in draft. I'm about to push some new changes. If you could take a peek.