Timing of Comparison Group Selection

One issue we continue to wrestle with, and upon which I’d like to spend some time in tomorrow’s call gathering more feedback, is the challenge of the timing of comparison group selection. At the heart of the matter is the need for a comparison group to be analyzed in real-time, alongside treated customers, so that interim savings results reflect a realistic assessment of performance. In an opt-out program, where all customers are enrolled more or less at the same time, there is no issue. But in an opt-in program, where customers are not pre-selected and are enrolled over the course of an entire year (or several years), we are left with three unsatisfactory choices: The comparison group cannot be selected until the last customer is enrolled, leaving programs in the dark during early critical reporting periods The comparison group is pre-selected, leaving the portfolio subject to bias if the treated customers end up looking different than the non-treated customers The comparison group members are selected alongside the treated customers as they are enrolled, resulting in a comparison group that is continuously evolving and difficult to assess for factors like historical volatility and overall representativeness.

Consider the image below, which is a distribution function of customers according to the ratio of their consumption between the middle of the afternoon and the peak of the duck curve (or evening ramp). The buildings on the far right hand side of this distribution have the steepest evening ramp, while the buildings on the far left side have the shallowest evening ramp. If a program is planning on targeting buildings with steep evening ramps, it makes sense that a comparison group would also be pulled from that subset of buildings. But we won’t know how successful the program is in actually pulling from that subset of customers until after enrollment has closed. What if the steepest evening ramp customers are, for whatever reason, completely uninterested in the program offering? If our comparison group pulled from that set of customers, they would be unrepresentative of our treatment group.

If we waited until after the program closed, we could select a representative comparison group, but until that point, the program would be flying blind.

Or, as customers are enrolled, we could place them within the distribution and select comparison group members from within the same strata of the distribution. On the one hand, this seems like a nice middle ground, but I want to make sure that we fully understand the implications of this methodology. Each participant, in essence, would be randomly assigned to maybe 5-10 non-participants from within their strata. These non-participants would be assigned a pre-post intervention date that matched the participants and their change in consumption would be considered broadly reflective of exogenous factors and applied against the savings of the participant. There will most certainly be bias in this calculation, insofar as these non-participants most certainly are also influenced by other factors that are not present in the energy use changes of the participants. For example, one of these non-participants might severely reduce operating hours, or perhaps a residential customer might add an electric vehicle to their energy load. While these particular effects might net out across a portfolio, for monthly cohorts with small numbers of enrolled participants, the effects might be untenably distorting.

Thanks McGee. My thoughts:

Control groups should be similar to actual enrollees, not targeted enrollees.
Each program may (and probably should) use a different method to select a control group. I don't think we need to define this method now -- it should be up to the program teams.
Why wait to select a control group until after the program is closed? Instead, select a comparison group for each new monthly cohort as they are enrolled. (I assume these programs make monthly payments.)

You wrote:

Each participant, in essence, would be randomly assigned to maybe 5-10 non-participants from within their strata.

Only 5-10? This might work for Starbucks stores but is not going to work for residential programs -- you can't find 5-10 homes whose average energy will reliably track the future energy use of another home. As you noted, there's too much variation from dozens of other factors.

Instead, why not select one very large control group (1000's) with characteristics similar to the entire enrolled group, and then use their historic energy use for the full range of program participants with different baseline periods, intervention dates and reporting periods? Control group members have no actual intervention, so you can use different portions of their entire energy history, correct?

Example based on a theoretical program:

10 new enrollments occurred each month for 10 months, producing 100 enrolled participants in the program at the time you want to analyze savings.
Each of these participants probably have different installation dates (and therefore different baseline and reporting periods).
Select 1000 (or 5000?) buildings similar to the profiles of the 100 participants (this can be done monthly, so the selection process improves over time).
For participant #23, you'll be comparing their change in energy use between their baseline and reporting periods against that of the entire control group for the same time periods.
For participant #24, you do the same with the same control group, but use participant #24's baseline and reporting periods.

Am I missing something?

recurve-methods / comparison_groups

Timing of Comparison Group Selection #3