washingtonpost / elex-live-model

a model to generate estimates of the number of outstanding votes on an election night based on the current results of the race
48 stars 5 forks source link

Removing duplicate code in `Estimandizer` class #96

Closed dmnapolitano closed 2 months ago

dmnapolitano commented 2 months ago

Description

Hi! The changes in this PR remove duplicate methods from the Estimandizer class. I genuinely don't know how that happened or what tox warning I was ignoring this entire time 😓 🤔

Also, I forgot to update the Github Actions to use Python 3.11 in PR #95. Hope it's ok to do so here; if not I can easily separate that out 😄

Test Steps

tox

lennybronner commented 2 months ago

Weird weird. could you do a git blame or something and check how they ended up here?

dmnapolitano commented 2 months ago

Weird weird. could you do a git blame or something and check how they ended up here?

Good idea! I've never done that before. Here's what I see:

$ git blame -L 84 src/elexmodel/handlers/data/Estimandizer.py 
cc6a5f0f (lbvienna         2023-09-21 17:29:24 -0400  84)     def add_weights(self, data_df, col_prefix):
cc6a5f0f (lbvienna         2023-09-21 17:29:24 -0400  85)         data_df[f"{col_prefix}weights"] = data_df[f"{col_prefix}turnout"]
9d5c1019 (Diane Napolitano 2023-09-07 13:17:39 -0400  86)         return data_df
795d0cde (lbvienna         2023-09-21 17:38:10 -0400  87) 
44e6b909 (lbvienna         2023-09-21 18:36:20 -0400  88)     def add_turnout_factor(self, data_df):
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  89)         # posinf and neginf are also set to zero because dividing by zero can lead to nan/posinf/neginf depending
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  90)         # on the type of the numeric in the numpy array. Assume that if baseline_weights is zero then turnout
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  91)         # would be incredibly low in this election too (ie. this is effectively an empty precinct) and so setting
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  92)         # the turnout factor to zero is fine
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  93)         data_df["turnout_factor"] = np.nan_to_num(
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  94)             data_df.results_weights / data_df.baseline_weights, nan=0, posinf=0, neginf=0
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  95)         )
9d5c1019 (Diane Napolitano 2023-09-07 13:17:39 -0400  96)         return data_df
9d5c1019 (Diane Napolitano 2023-09-07 13:17:39 -0400  97) 
422c9974 (lbvienna         2023-09-25 15:40:44 -0400  98)     def add_weights(self, data_df, col_prefix):
422c9974 (lbvienna         2023-09-25 15:40:44 -0400  99)         data_df[f"{col_prefix}weights"] = data_df[f"{col_prefix}turnout"]
422c9974 (lbvienna         2023-09-25 15:40:44 -0400 100)         return data_df
422c9974 (lbvienna         2023-09-25 15:40:44 -0400 101) 
33e04f70 (lbvienna         2023-09-21 12:57:13 -0400 102)     def add_turnout_factor(self, data_df):
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 103)         # posinf and neginf are also set to zero because dividing by zero can lead to nan/posinf/neginf depending
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 104)         # on the type of the numeric in the numpy array. Assume that if baseline_weights is zero then turnout
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 105)         # would be incredibly low in this election too (ie. this is effectively an empty precinct) and so setting
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 106)         # the turnout factor to zero is fine
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 107)         data_df["turnout_factor"] = np.nan_to_num(
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 108)             data_df.results_weights / data_df.baseline_weights, nan=0, posinf=0, neginf=0
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 109)         )
33e04f70 (lbvienna         2023-09-21 12:57:13 -0400 110)         return data_df
a9a2354a (Diane Napolitano 2023-09-07 14:24:50 -0400 111) 
795d0cde (lbvienna         2023-09-21 17:38:10 -0400 112) 

I honestly have no idea. My guess is the code was added during multiple PRs after having been removed and we somehow didn't notice 🤔

lennybronner commented 2 months ago

weird weird