impactlab / caltrack

Shared repository for documentation and testing of CalTRACK methods
http://docs.caltrack.org
6 stars 5 forks source link

Weather data interpolation/regularization? #10

Closed jbackusm closed 7 years ago

jbackusm commented 8 years ago

Copied from Slack, since there was no response...

Question on the weather data: the ISD data is not actually hourly--the observations are somewhat irregular, sometimes with multiple observations in an hour. Do we have an agreed-upon method for normalizing/regularizing the temperature time series to coincide with our hourly usages? For example, linear interpolation between observations and integrated hourly mean of the interpolated time series?

matthewgee commented 7 years ago

@jbackusm In the current version of the technical requirements, we have that we don't do interpolation of weather, we just treat missing values as missing and we throw out days with >5 missing values. The intent on this was that the interpolation would eventually be handled upstream by NOAA under their new release (which hasn't been published yet), but that would help prevent errors from inconsistent downstream interpolation.

That doesn't deal with the multiple observations per hour. My suggestion for that is that we always keep the first value for a given hour and drop the second value.

Thoughts?

jbackusm commented 7 years ago

Thanks @matthewgee. Ok, I think taking the first temperature reading in a given hour and truncating the minutes in the corresponding timestamp could be a good option--though we might want to match that to the next hour in the usage time series, since I understand the usage values represent average usage over the previous hour?

houghb commented 7 years ago

@matthewgee Did we settle on doing this? I don't see it in any specs.

matthewgee commented 7 years ago

@houghb right now we only need daily average data in the weather data cleaning guidance for monthly methods, so it shouldn't be in the current spec.

However, if we think that using hourly weather data to generate degree day values for daily methods is a good idea, we'll need to resolve this. I've gone ahead and moved this to be part of the v1.1 Daily milestone and created a separate issue for that milestone focused on deciding whether hourly weather data should be included in daily methods. That way, we only need to resolve this if we end up deciding to use it.

houghb commented 7 years ago

Re-opening this issue since most of the beta testers have not weighed in on it.

Note that if we decide to use hourly data we will need to re-open Issue #12

houghb commented 7 years ago

@matthewgee @jbackusm For daily methods we decided to use hourly ISD weather data, which can have multiple multiple observations in a given hour as John described at the start of this issue. The current approach is to average all the temperature readings in a given day to get the daily average temperature that we use. Under this approach hours that report the temperature more than once will be weighted higher when calculating the mean. Do we want to revise the spec to keep only the first entry in each hour as suggested above?