recurve-methods / comparison_groups

Repository for discussion of Comparison Group topics
25 stars 5 forks source link

Regarding method 7.2: Clarify meaning of "rebaselined for each month" #14

Open steevschmidt opened 4 years ago

steevschmidt commented 4 years ago

From the draft findings document, page 6:

7.2 Where programs enroll customers over a period of time longer than 30 days, the comparison group must be rebaselined for each month of enrollment and a new vintage created that is assigned to a monthly cohort of enrolled participants.

I'm unclear what this means. In our residential P4P program we enroll new customers every month. Per existing CalTRACK methods, the baseline period for each project is determined by an install date (or "treatment date") specific to the project. This date is independent of the "month of enrollment".

Now, for the purpose of measuring exogenous factors, I assume a baseline period for the comparison group will be selected that matches the exact same baseline period for each project.

Example: Building B1 is enrolled in early July 2020. If the baseline period for building B1 is April 16 2019 to April 15 2020, the baseline period for the comparison group used to determine the size of exogenous factors will be identical (i.e. 4/16/19 to 4/15/20). As savings for building B1 are tracked over time, neither of these two identical baseline periods should change.

Where/how does "rebaselining" and/or a "monthly cohort" apply to this example?

mcgeeyoung commented 4 years ago

Hi Steve, you are on the right track. The recommendation for the comparison group rebaselining is based on the principle that the comparison group and the treated customers are experiencing the same baseline conditions. Adding monthly "vintages" of the comparison group allows you to align treated and non-treated customers. Doing this based on the specific date of enrollment would be possible, but probably infeasible. In that case, you'd have to create 365 vintages of your comparison group rather than 12. Some testing would probably be needed to determine the incremental benefits, but initially the tradeoff of additional complexity seemed inadvisable.

steevschmidt commented 4 years ago

So with 12 "vintages" of comparison groups using a mid-month start date I guess the worst case would be a 16 day difference between the treated customer -- for example, with a baseline period that starts March 31st -- and her associated comparison group with a mid-month baseline start date of March 15th.

If this understanding is correct, it seems this worst case scenario could introduce a difference-in-differences savings error of 1/24th, or a bit over 4% per year. (Did I do that right?)

Since a single comparison group can be used to create all 365 "vintages", it seems this would really just be a tradeoff between compute time and accuracy. Modern compute resources being so cheap I wouldn't expect that to be a problem... is it being prioritized against accuracy in some way?

mcgeeyoung commented 4 years ago

I'm not sure if that math holds or not. Certainly is a tradeoff though.