Are monthly data sufficiency requirements underspecified?

mcgeeyoung commented 7 years ago

Blake: I still think that we're leaving a really big piece of this unspecified and it will lead to every implementer coming up with their own way of doing things in a way that will likely lead to non-reproducable results among implementers... I've commented on this extensively in earlier drafts and in github issues, but the monthly billing data is not organized by calendar months, or even billing periods of approximately the length of months, so checking data sufficiency (not to mention partitioning usage into month-size chunks for modeling) will be implementer specific and not documented...

houghb commented 7 years ago

Here is the document where I made the above comment so others have context: https://docs.google.com/document/d/177tyzgx3b0Hcws5it8V00w0rLN5Smql7rJGsmngDjh0/edit?usp=sharing

houghb commented 7 years ago

Also, I want to make clear, I'm not just referring to data sufficiency requirements.

I don't think it is a trivial thing to look at the monthly billing data and just plug it in to either our data cleaning & integration steps OR our analysis steps. The monthly billing data is quite different from the AMI data we've been using, for example:

it contains billing periods that range from fractions of a month to a couple months.
gaps in the data are harder to identify because even if there is no entry for a month in the billing data table, that doesn't mean the data is missing - you need to look at the meter read dates and the number of days billed to determine whether something is actually missing.
our current specs rely heavily on the assumption that usage data is organized by calendar month - even when the billing data is delivered in month-sized chunks they don't necessarily correspond to the start and end date of a month.
We've defined our blackout periods based on extending the blackout to the start and end of the months it is part of - what do you do if the monthly billing data that covers the start of the blackout period is 50 days long, do we blackout that whole period?

...the list of small things that are not addressed goes on, and if someone actually tries to use the monthly billing data I am sure more things that need to be specified will be identified...

So, there will be a lot of additional assumptions and steps needed to check monthly billing data for data sufficiency, clean it, and transform it into monthly chunks that can plug into our existing analysis specifications. A competent implementer of the CalTRACK protocols will be able to come up with viable ways to do these things in the absence of our guidance, but my point is that they will be coming up with their own methods which are not documented, and therefore not reproducible (which Leif has made clear is a primary requirement for this specification). If we don't expect that the methods we publish will actually be used with monthly billing data, that's fine and we can just ignore this, but if we are actually trying to publish something that will lead to reproducible results using monthly billing data then this seems like an issue that we need to at least discuss seriously (instead of me continuing to beat a dead horse by myself).

matthewgee commented 7 years ago

Thanks @houghb. Sorry you feel like you're beating a dead horse by yourself. I can say personally that getting to sufficient specificity in the documentation is a dead horse I'm definitely willing to beat some more (this is getting pretty morbid. time to switch cliches to keep it PG-13). It's definitely been my intent from the very first draft of the data prep and methods draft docs way back when to seed as many issues as possible to address in the docs but leave plenty of room for folks to add as much as possible because I know there's no way I'm smart enough to think of everything. I totally agree that the issues you bring up above would be really helpful guidance to add to the monthly data prep and monthly methods. Let's figure out how to make it so you don't feel like you're beating a dead horse alone to being able to directly suggest changes to the spec.

I've tried to take a first pass at doing that below, but will need your help with additional specific suggestions in this issue or a pull request with edits to the spec.

Proposed additions to monthly docs to improve the specification and ensure greater replicability:

Dealing with variable billing period lengths Two suggested changes here. 1) Add detail for monthly billing analysis data prep that makes sure the UPD is specified as dividing by number of days in bill period, regardless of length (this was already added to the spec in pull request #42 ). 2) Add WLS as suggested variance-reducing estimation approach in the analysis methods in a new pull request.

Dealing with identification of missing data vs cumulative bill reads This problem is a bit more subtle to deal with because there aren't any hard and fast rules that catch all cases. However, I suggest three additions to the data prep guidelines to deal with the majority of cases: 1) Bill periods with zero use followed by bill periods with extraordinarily high use. This is common in situations where meter reads were skipped once month then the cumulative read was read the next month and the data extract was generated off of only meter reads and replaced estimated billing with zeros. The ideal way to deal with this would be to have a column in the monthly billing dataset that indicates whether a use value is cumulative over multiple billing periods. If this exists or can be obtained, then the rule should be to assign the cumulative value to the entire multi-bill period and divide by the total number of days across the combined bill periods. This guidance was added in pull request #42 . If that information does not exist with the bills, then we need to come up with a heuristic for inferring cumulative billing. I propose the following: if a bill period use value that is zero or missing is followed by a value in the subsequent bill period that is 1.75x higher than the average of the bill period after it and the bill period before the zero, then assume it is a cumulative value and apply the prior rule. Else, treat it as a missing value and have it count against the sufficiency criteria. I recognize that 1.75 is arbitrary, so if anyone has a more defensible arbitrary multiplier, feel free to chime in.

Dealing with bill periods vs calendar months I think this is resolved now with pull request #42, but let me know if you think we need something else.

Blackout Periods with long billing periods This one I also proposed a rule in pull request #42. It's straightforward: the blackout period should be inclusive of the entire work period. We could make some assumptions about use being evenly distributed throughout the billing period and apply the use fractionally so that we don't have to throw away long billing periods where the work ended the second day, but that assumption is not very defensible and an inclusive blackout period, while it may screen out more projects for not meeting data sufficiency requirements, it is simple to apply and favors replicability.

From the above, the main two additions would be the WLS and 1.75x rule.

Anything else we want to make sure we get in this version of the spec, or any quibble with the two additional changes I'll add through a pull request @houghb?

Thanks!

houghb commented 7 years ago

@matthewgee, thanks for taking a stab at this. I don't think I'm doing a very good job at explaining why we aren't comfortable publishing this set of documents as-is. Let me try to succinctly re-frame my point:

Right now it is not possible to follow the guidance docs step-by-step and reproducibly generate aggregated savings estimates using monthly billing data. We don't think it will be possible to have confidence the proposed methods work unless beta testers actually try this with monthly billing data because unanticipated issues will arise upon implementation.

For what it's worth, we aren't sure that anyone is ever actually going to use monthly billing data for any of the CalTRACK use cases, so don't necessarily advise spending time to implement the monthly billing data scenario and get the methods specs right - our issue is more about the fact that this is being presented as a set of reproducible methods that work as written when that is not the case.

Doing our best to add additional guidance that we anticipate will be necessary, as you've done above, is a good thing, but will be incomplete. We would be more comfortable instead with redefining the deliverable to accurately reflect what it is: an untested set of guidance recommendations for our best guess (based on our collective experience) at how to create premise-level savings estimates using monthly billing data. We should make it clear that the well-defined, tested, and reproducible deliverable from CalTRACK will be the forthcoming daily methods, but that we are providing suggested guidance on how someone might do something similar with monthly data.

matthewgee commented 7 years ago

@houghb yeah, I totally agree. The current v1 monthly guide is burying the lead in distinguishing general methods guidance in monthly from an empirically tested spec developed for daily, which is one of the more useful contributions of the CalTRACK process. There used to be a section in the guides explaining a bit of the history on monthly methods for why they weren't tested and what that means for users, but it got overwritten in one of the editing and doc merging commits. It needs to be back in there and at the top fo the v1Guide.md. Can you take a first whack at developing the language that you think appropriately caveats the monthly methods guidance? I can help out, but I think this is a commit that is better served by starting with what's in your head and wordsmithing from there. Thanks!

houghb commented 7 years ago

@matthewgee I submitted a pull request with our suggestions for language that makes clear the points I have been making. Please take a look and provide feedback when you have a minute.

impactlab / caltrack

Are monthly data sufficiency requirements underspecified? #49