Extrapolation rule - Githubissues

washingtonpost / elex-live-model

a model to generate estimates of the number of outstanding votes on an election night based on the current results of the race

48 stars 5 forks source link

Description

At a high level, the idea is as follows:

We have a set of reporting units that we have observed the results for over time. Even though we have only seen the reporting unit at a particular set of percent_expected_vote values, we can use them to come up with an estimate for the normalized margin at any percent_expected_vote value.

For example, if we saw a reporting county at both 50% and 70% expected vote, we can estimate the normalized margin at 60% expected vote via

margin_60 = [margin_50 50 + (batch_margin_70) (60 - 50)] 60

where batch_margin_70 is the normalized margin of the batch of votes we saw when we recorded the 70% vote for the county.

Now that we have these estimates for the normalized margin at any percent_expected_vote value, we can use them to estimate how we ought to correct the current normalized margin to estimate the final value.

For example, let's imagine our non-reporting county is at 60% reporting. For the previous county, we can look at the difference between margin_100 - margin_60 and that would be our best guess (from that example) for how to correct our extrapolation from the observed normalized margin in the non-reporting county.

We can repeat this for all the reporting counties and then take the mean of the corrections to get our best guess for how to correct the normalized margin for the non-reporting county.

The code also includes some additional logic to ensure that this extrapolation step is only used when we can be confident in its validity.

1) We only estimate the correction using counties belonging to the same state.

2) We also only apply this method to a non-reporting county once it has passed a certain threshold of reporting.

3) We also do not use the correction estimate from a reporting county if the closest observed vote to the percent_expected_vote is too far away.

4) The correction estimates (obtained using VersionedResultsHandler) are also np.nan when there are irregularities in the reporting (e.g., there's a correction to the dem/gop vote totals that revises them downwards).

5) We only run this method in states with at least self.min_extrapolating_units counties available.

Hi @jjcherian ! Thanks for all your work on this! Your description and comments are very helpful 🎉 I left a few questions and comments, and I have a few that are more general:

Can you help me zoom out and better understand the problem being solved here? Is the idea that if we're able to predict the margin for a unit that is only reporting with (say) 60% expected vote, does that mean we can safely enable the model earlier in the night? 🤔

Do you have any evaluation showing how this compares to our current bootstrap model? 🤔

Do you have any unit tests? 🤔

There are a few things in here that I can't quite figure out how they relate to the overall extrapolation work, like the reduction in the number of bootstrap samples regardless of whether extrapolation is True and the changes relating to the postal_code fixed effect in Featurizer. Are those just things we should do anyway, or are they necessary for extrapolation to work properly?

I'm trying to think about how we can better consolidate code that reads from s3 buckets into one class. I don't really have any suggestions right now, just something for us to think about 🤔

Sorry for taking so long to respond to this Diane! Have too many balls to juggle right now!

The problem being solved here is to try to address what we can do when a unit (like a county) is very close to fully reporting (e.g., we're in a close election, it's Wednesday and we're just waiting on an 85% complete Philadelphia county to call the election). The prediction rule being defined here uses the versioned_results to define a model for how voting margins change in a state from 85% to 100% - and uses that estimate to extrapolate. This prediction is then combined into a weighted average with our typical prediction rule where the weights are determined by how much variability there is in how voting margins in that state have changed historically over that reporting period (i.e., if in some counties in the state, the voting margin becomes more Republican over time, and in others, more Democrat, the variance will be relatively high, and the weights will push us towards using the fundamentals rule).
Sadly, my evaluations are mostly focused on 2020, but unit level MAE if we deploy this rule for units with >75% reporting (this threshold for when we can start using this method is configurable in model settings) drops is can be substantially lower. The left column shows MAE without this ensembling and with at roughly 10pm on election night in 2020. One note of caution is that these results are slightly out of date to what is now merged. I have not retested these with the newest rules we have for estimating and weighting the forecasts, which I will do.

TN 0.018 0.013 NH 0.025 0.004 ME 0.033 0.005 WV 0.012 0.023 IN 0.014 0.005 SC 0.018 0.012 NC 0.016 0.012 FL 0.023 0.014 GA 0.027 0.023 VT 0.018 0.003 VA 0.027 0.016 MO 0.114 0.13 KY 0.016 0.005 ND 0.049 0.022 TX 0.07 0.02 OH 0.032 0.015

Sadly no, I don't have capacity for that right now, but if you want to read through it and propose some test cases, that would be great. I will say: one reason I am not so worried about this is that I view this prediction idea as essentially "optional" - we don't need to use it, but if the election is close, it will force the forecasts to actually match what we can see, rather than stay constant (even as we get new information about the number of votes remaining, etc.).
That is somehow junk code from some merge - I removed that! Thanks!
I'll leave that one to you :) - I agree that it all seems a little redundant. That being said, (this is my take not Lenny's, so obviously maybe he has different priorities), I think there are more low-hanging fruit modeling improvements that are more valuable than refactoring this code.

Hi @jjcherian ! Thanks for all your work on this! Your description and comments are very helpful 🎉 I left a few questions and comments, and I have a few that are more general:

Can you help me zoom out and better understand the problem being solved here? Is the idea that if we're able to predict the margin for a unit that is only reporting with (say) 60% expected vote, does that mean we can safely enable the model earlier in the night? 🤔

Do you have any evaluation showing how this compares to our current bootstrap model? 🤔

Do you have any unit tests? 🤔

There are a few things in here that I can't quite figure out how they relate to the overall extrapolation work, like the reduction in the number of bootstrap samples regardless of whether extrapolation is True and the changes relating to the postal_code fixed effect in Featurizer. Are those just things we should do anyway, or are they necessary for extrapolation to work properly?

I'm trying to think about how we can better consolidate code that reads from s3 buckets into one class. I don't really have any suggestions right now, just something for us to think about 🤔

Sorry for taking so long to respond to this Diane! Have too many balls to juggle right now!

The problem being solved here is to try to address what we can do when a unit (like a county) is very close to fully reporting (e.g., we're in a close election, it's Wednesday and we're just waiting on an 85% complete Philadelphia county to call the election). The prediction rule being defined here uses the versioned_results to define a model for how voting margins change in a state from 85% to 100% - and uses that estimate to extrapolate. This prediction is then combined into a weighted average with our typical prediction rule where the weights are determined by how much variability there is in how voting margins in that state have changed historically over that reporting period (i.e., if in some counties in the state, the voting margin becomes more Republican over time, and in others, more Democrat, the variance will be relatively high, and the weights will push us towards using the fundamentals rule).

Sadly, my evaluations are mostly focused on 2020, but unit level MAE if we deploy this rule for units with >75% reporting (this threshold for when we can start using this method is configurable in model settings) drops is can be substantially lower. The left column shows MAE without this ensembling and with at roughly 10pm on election night in 2020. One note of caution is that these results are slightly out of date to what is now merged. I have not retested these with the newest rules we have for estimating and weighting the forecasts, which I will do.

TN 0.018 0.013 NH 0.025 0.004 ME 0.033 0.005 WV 0.012 0.023 IN 0.014 0.005 SC 0.018 0.012 NC 0.016 0.012 FL 0.023 0.014 GA 0.027 0.023 VT 0.018 0.003 VA 0.027 0.016 MO 0.114 0.13 KY 0.016 0.005 ND 0.049 0.022 TX 0.07 0.02 OH 0.032 0.015

Sadly no, I don't have capacity for that right now, but if you want to read through it and propose some test cases, that would be great. I will say: one reason I am not so worried about this is that I view this prediction idea as essentially "optional" - we don't need to use it, but if the election is close, it will force the forecasts to actually match what we can see, rather than stay constant (even as we get new information about the number of votes remaining, etc.).

That is somehow junk code from some merge - I removed that! Thanks!

I'll leave that one to you :) - I agree that it all seems a little redundant. That being said, (this is my take not Lenny's, so obviously maybe he has different priorities), I think there are more low-hanging fruit modeling improvements that are more valuable than refactoring this code.

Hi @jjcherian no worries at all, and thank you so much for sharing all of this!! 🎉 I'm genuinely excited about this and amused that I thought this was for the exact opposite situation than it is for lol 😂

But yeah, no worries, I've made some notes here to come back to non-model-related, non-we-absolutely-need-this-before-Election-Day improvements after the election, so we're good 😄 🎉

washingtonpost / elex-live-model

Extrapolation rule #122

Description

Jira Ticket

Test Steps