washingtonpost / elex-live-model

a model to generate estimates of the number of outstanding votes on an election night based on the current results of the race
48 stars 5 forks source link

Elex 2771 new ols qr #77

Closed lennybronner closed 1 year ago

lennybronner commented 1 year ago

Description

This PR moves over parts of the changes we are making for the bootstrap election model PR in order to make reviewing that one easier. It make the changes necessary to the old ConformalElectionModel to work with the updates made to elex-solver in this PR and it makes small tweaks to the estimandizer to prepare it for multiple estimands being generated at once. It also updates unit tests accordingly.

Jira Ticket

https://arcpublishing.atlassian.net/browse/ELEX-2771

Test Steps

tox

also

elexmodel 2020-11-03_USA_G --office_id=P --estimands=dem --geographic_unit_type=county --pi_method=nonparametric --percent_reporting=30 --aggregates=postal_code --historical
dmnapolitano commented 1 year ago

@lennybronner thank you so much for all of this! 🎉 🙌🏻

Can you say a bit about what the turnout factor is? 😅

dmnapolitano commented 1 year ago

@lennybronner thank you so much for all of this! 🎉 🙌🏻

Can you say a bit about what the turnout factor is? 😅

Or actually, sorry, I think I see what you're doing. Even though the end result we're sharing with the world is the predicted margin, we still need to predict turnout in order to predict margin. But not all of our data sets include results_turnout, so you're checking to make sure it exists and if not, sum across the (applicable) results_ columns we do have. Do I have that right-ish? 😅

lennybronner commented 1 year ago

@lennybronner thank you so much for all of this! 🎉 🙌🏻 Can you say a bit about what the turnout factor is? 😅

Or actually, sorry, I think I see what you're doing. Even though the end result we're sharing with the world is the predicted margin, we still need to predict turnout in order to predict margin. But not all of our data sets include results_turnout, so you're checking to make sure it exists and if not, sum across the (applicable) results_ columns we do have. Do I have that right-ish? 😅

Yeah, we need to predict turnout in order to get the normalization constant for normalized margin, since we need to go back and forth between unnormalized and normalized margin to move from county predictions to state predictions.

Turnout factor is basically just the ratio of turnout in this election to turnout in last election. In the margin model it's part of what we're estimating. But we also drop units whose turnout factor is greater than or less than some constant. We're basically assuming that if turnout in some county is only 20% of it's last elections turnout (or greater than 200% of last election's turnout) that our results provider either made a mistake (or that we accidentally mismatched precincts), so we drop that county in our model. We can adjust the constants (20/200%) through parameters in the model so in case that there is a super low/high turnout election we don't accidentally drop too many units.

dmnapolitano commented 1 year ago

@lennybronner thank you so much for all of this! 🎉 🙌🏻 Can you say a bit about what the turnout factor is? 😅

Or actually, sorry, I think I see what you're doing. Even though the end result we're sharing with the world is the predicted margin, we still need to predict turnout in order to predict margin. But not all of our data sets include results_turnout, so you're checking to make sure it exists and if not, sum across the (applicable) results_ columns we do have. Do I have that right-ish? 😅

Yeah, we need to predict turnout in order to get the normalization constant for normalized margin, since we need to go back and forth between unnormalized and normalized margin to move from county predictions to state predictions.

Turnout factor is basically just the ratio of turnout in this election to turnout in last election. In the margin model it's part of what we're estimating. But we also drop units whose turnout factor is greater than or less than some constant. We're basically assuming that if turnout in some county is only 20% of it's last elections turnout (or greater than 200% of last election's turnout) that our results provider either made a mistake (or that we accidentally mismatched precincts), so we drop that county in our model. We can adjust the constants (20/200%) through parameters in the model so in case that there is a super low/high turnout election we don't accidentally drop too many units.

Got it!! That's awesome 🎉

What about dropping units whose turnout factors are outliers against the other units? That way, on the off chance the entire state doesn't vote (or does vote), there's no risk of dropping almost every unit in the state. If you've done some evaluation to come up with these constants, that's fine, and I know for now we're primarily interested in big (top-of-the-) ticket races anyway where this is less likely to occur. Just a thought 🤷🏻‍♀️ 😄

lennybronner commented 1 year ago

@lennybronner thank you so much for all of this! 🎉 🙌🏻 Can you say a bit about what the turnout factor is? 😅

Or actually, sorry, I think I see what you're doing. Even though the end result we're sharing with the world is the predicted margin, we still need to predict turnout in order to predict margin. But not all of our data sets include results_turnout, so you're checking to make sure it exists and if not, sum across the (applicable) results_ columns we do have. Do I have that right-ish? 😅

Yeah, we need to predict turnout in order to get the normalization constant for normalized margin, since we need to go back and forth between unnormalized and normalized margin to move from county predictions to state predictions. Turnout factor is basically just the ratio of turnout in this election to turnout in last election. In the margin model it's part of what we're estimating. But we also drop units whose turnout factor is greater than or less than some constant. We're basically assuming that if turnout in some county is only 20% of it's last elections turnout (or greater than 200% of last election's turnout) that our results provider either made a mistake (or that we accidentally mismatched precincts), so we drop that county in our model. We can adjust the constants (20/200%) through parameters in the model so in case that there is a super low/high turnout election we don't accidentally drop too many units.

Got it!! That's awesome 🎉

What about dropping units whose turnout factors are outliers against the other units? That way, on the off chance the entire state doesn't vote (or does vote), there's no risk of dropping almost every unit in the state. If you've done some evaluation to come up with these constants, that's fine, and I know for now we're primarily interested in big (top-of-the-) ticket races anyway where this is less likely to occur. Just a thought 🤷🏻‍♀️ 😄

That's a really good idea! Though I guess would necessitate a bit more computation? Do you mind adding a future ticket to implement?

dmnapolitano commented 1 year ago

@lennybronner thank you so much for all of this! 🎉 🙌🏻 Can you say a bit about what the turnout factor is? 😅

Or actually, sorry, I think I see what you're doing. Even though the end result we're sharing with the world is the predicted margin, we still need to predict turnout in order to predict margin. But not all of our data sets include results_turnout, so you're checking to make sure it exists and if not, sum across the (applicable) results_ columns we do have. Do I have that right-ish? 😅

Yeah, we need to predict turnout in order to get the normalization constant for normalized margin, since we need to go back and forth between unnormalized and normalized margin to move from county predictions to state predictions. Turnout factor is basically just the ratio of turnout in this election to turnout in last election. In the margin model it's part of what we're estimating. But we also drop units whose turnout factor is greater than or less than some constant. We're basically assuming that if turnout in some county is only 20% of it's last elections turnout (or greater than 200% of last election's turnout) that our results provider either made a mistake (or that we accidentally mismatched precincts), so we drop that county in our model. We can adjust the constants (20/200%) through parameters in the model so in case that there is a super low/high turnout election we don't accidentally drop too many units.

Got it!! That's awesome 🎉 What about dropping units whose turnout factors are outliers against the other units? That way, on the off chance the entire state doesn't vote (or does vote), there's no risk of dropping almost every unit in the state. If you've done some evaluation to come up with these constants, that's fine, and I know for now we're primarily interested in big (top-of-the-) ticket races anyway where this is less likely to occur. Just a thought 🤷🏻‍♀️ 😄

That's a really good idea! Though I guess would necessitate a bit more computation? Do you mind adding a future ticket to implement?

Sure! Thanks! 😄 🎉 The ticket is here: https://arcpublishing.atlassian.net/browse/ELEX-3298