ELEX-3298 experiment with dropping units whose turnout factors are outliers

dmnapolitano commented 1 month ago

Description

Hi! The changes in this PR remove the turnout_factor_lower and turnout_factor_upper thresholds in favor of outlier detection. The mean and standard deviation of turnout_factor is determined per state then used to identify outliers via the mean ± (standard deviation * 4.75). This default z threshold has been selected so as to minimize the changes to the unit test results, race call times, and predictions. See the notebook here for more details on performance evaluation against the 2022 US Senate GE 🎉 Thanks!!

Jira Ticket

ELEX-3298

Test Steps

tox, elexmodel commands with non-modeled units, test bed, etc. For instance:

$ elexmodel 2022-11-08_USA_G --estimands=margin --features=baseline_normalized_margin --office_id=S_county --geographic_unit_type=county --pi_method bootstrap --national_summary
2024-10-07 10:46:25,665 INFO elexmodel.client Getting estimates: 2022-11-08_USA_G, S_county, ['margin']
2024-10-07 10:46:25,665 INFO elexmodel.client Getting config: 2022-11-08_USA_G
2024-10-07 10:46:25,807 INFO elexmodel.client Getting preprocessed data: 2022-11-08_USA_G
2024-10-07 10:46:26,060 INFO elexmodel.client Getting combined data for requested estimands
2024-10-07 10:46:26,075 INFO elexmodel.client Model parameters: 
 prediction intervals: [0.7, 0.9], percent reporting threshold: 100,                 pi_method: bootstrap, aggregates: ['postal_code', 'unit'], model settings: {'election_id': '2022-11-08_USA_G', 'office': 'S_county', 'geographic_unit_type': 'county', 'district_election': False, 'features': ['baseline_normalized_margin'], 'fixed_effects': {}, 'save_conformalization': False}
2024-10-07 10:46:26,076 INFO elexmodel.client Running model
            There are 2613 reporting and expected units.
            There are 0 unexpected units.
            There are 2 non-modeled units.
            There are 0 nonreporting units.
2024-10-07 10:46:26,076 INFO elexmodel.client non-modeled units:
{'NH': ['3300780740'], 'VT': ['5000724175']}
...

Which uses the default turnout factor z threshold of 4.75. Compared to:

$ elexmodel 2022-11-08_USA_G --estimands=margin --features=baseline_normalized_margin --office_id=S_county --geographic_unit_type=county --pi_method bootstrap --national_summary --turnout_factor_z 3
2024-10-07 10:47:56,454 INFO elexmodel.client Getting estimates: 2022-11-08_USA_G, S_county, ['margin']
2024-10-07 10:47:56,454 INFO elexmodel.client Getting config: 2022-11-08_USA_G
2024-10-07 10:47:56,592 INFO elexmodel.client Getting preprocessed data: 2022-11-08_USA_G
2024-10-07 10:47:56,881 INFO elexmodel.client Getting combined data for requested estimands
2024-10-07 10:47:56,896 INFO elexmodel.client Model parameters: 
 prediction intervals: [0.7, 0.9], percent reporting threshold: 100,                 pi_method: bootstrap, aggregates: ['postal_code', 'unit'], model settings: {'election_id': '2022-11-08_USA_G', 'office': 'S_county', 'geographic_unit_type': 'county', 'district_election': False, 'features': ['baseline_normalized_margin'], 'fixed_effects': {}, 'save_conformalization': False}
2024-10-07 10:47:56,896 INFO elexmodel.client Running model
            There are 2592 reporting and expected units.
            There are 0 unexpected units.
            There are 23 non-modeled units.
            There are 0 nonreporting units.
2024-10-07 10:47:56,897 INFO elexmodel.client non-modeled units:
{'CO': ['08111'], 'CT': ['0900108070', '0900337070'], 'FL': ['12119'], 'GA': ['13053', '13133'], 'IL': ['17159', '17165'], 'IN': ['18007'], 'KS': ['20175'], 'KY': ['21153', '21189'], 'MO': ['29155'], 'NH': ['3300332500', '3300718420', '3300780740', '3300979380'], 'NY': ['36005', '36081'], 'OH': ['39079'], 'PA': ['42089', '42101'], 'VT': ['5000724175']}
...

lennybronner commented 4 weeks ago

If we include this, one thing we will definitely want is to make the 4.75 flexible. Any chance you can make that a parameter?

dmnapolitano commented 3 weeks ago

Also curious what happens when a state has no units in or if there is only one unit in so no standard deviation can be computed? Also would be super interested to see how you evaluated this.

So, if there are no units in the state, that probably means those units haven't reported yet or they're all unexpected (hope not), and so they shouldn't ever reach the outlier (or even current turnout factor threshold) detection. If there's only one unit, pandas returns nan:

In [1]: import pandas

In [2]: df = pandas.DataFrame([{"foo" : "a", "bar" : 2}, {"foo" : "b", "bar" : 2}, {"foo" : "a", "bar" : 3}, {"foo" : "a", "bar" : 1}])

In [3]: df
Out[3]: 
  foo  bar
0   a    2
1   b    2
2   a    3
3   a    1

In [4]: df.groupby(["foo"]).agg({"bar" : ["mean", "std"]})
Out[4]: 
     bar     
    mean  std
foo          
a    2.0  1.0
b    2.0  NaN

But, I think what we want to happen there is to make sure the unit doesn't get dropped; I don't think it will, the way the code is currently written, but I'll make sure of that 👍🏻 👍🏻

The evaluation is here; LMK what you think and if I should do anything else here 🤔 😄 (It might also be helpful for me to swap in some of the other sets of test bed output I generated to do this, but they were worse 😅 )

dmnapolitano commented 3 weeks ago

If we include this, one thing we will definitely want is to make the 4.75 flexible. Any chance you can make that a parameter?

Done!! See the examples in the "Test Steps" 😄 🎉

dmnapolitano commented 3 weeks ago

Also curious what happens when a state has no units in or if there is only one unit in so no standard deviation can be computed? Also would be super interested to see how you evaluated this.

So, if there are no units in the state, that probably means those units haven't reported yet or they're all unexpected (hope not), and so they shouldn't ever reach the outlier (or even current turnout factor threshold) detection. If there's only one unit, pandas returns nan:
In [1]: import pandas

In [2]: df = pandas.DataFrame([{"foo" : "a", "bar" : 2}, {"foo" : "b", "bar" : 2}, {"foo" : "a", "bar" : 3}, {"foo" : "a", "bar" : 1}])

In [3]: df
Out[3]: 
  foo  bar
0   a    2
1   b    2
2   a    3
3   a    1

In [4]: df.groupby(["foo"]).agg({"bar" : ["mean", "std"]})
Out[4]: 
     bar     
    mean  std
foo          
a    2.0  1.0
b    2.0  NaN
But, I think what we want to happen there is to make sure the unit doesn't get dropped; I don't think it will, the way the code is currently written, but I'll make sure of that 👍🏻 👍🏻

The evaluation is here; LMK what you think and if I should do anything else here 🤔 😄 (It might also be helpful for me to swap in some of the other sets of test bed output I generated to do this, but they were worse 😅 )

Alright I confirmed using 2020-11-03_USA_G and --office_code=P that DC, which only has one unit in this data set, doesn't get dropped during outlier detection 🎉

washingtonpost / elex-live-model