washingtonpost / elex-live-model

a model to generate estimates of the number of outstanding votes on an election night based on the current results of the race
48 stars 5 forks source link

Elex-1453-baseline-modeling-DATA-SCIENCE-EXPERIMENTAL #50

Closed daragold closed 1 year ago

daragold commented 1 year ago

Description

This is the branch for experimenting with baseline modeling. We can use this PR space to show recent changes that have been made.

Note, we expect unit tests to fail here, as some rely on column names that we no longer use.

The regression in the model currently operates on ABSOLUTE numbers of votes (i.e. the number of votes received in reporting units is used to predict the number of votes in non-reporting units). To blend a baseline, we need to change this to shares (i.e. the % Dem in reporting units used to predict % Dem in non-reporting units).

Jira Ticket

https://arcpublishing.atlassian.net/jira/software/c/projects/ELEX/boards/1026?modal=detail&selectedIssue=ELEX-2616

Test Steps

To test as a single-run model:

  1. Use the 'sub_zero_for_nan' elex-solver branch.
  2. Model params are: election_id = "2022-11-08_USA_G" office_id = "S_county" estimands = ["turnout","dem", "gop"] geographic_unit_type = "county" historical = False unexpected_units = 0 prediction_intervals = [0.9] percent_reporting_threshold = 100 percent_reporting = 20 features = ['race_white'] aggregates = ["unit", "postal_code"] fixed_effects = ["postal_code"]

To run with the testbed:

  1. Use the 'sub_zero_for_nan' elex-solver branch and the 'new_corr_data' (or main if new_corr_data has been merged already) testbed branch
  2. Testbed parameters are same as above, with agg-model turned OFF