Closed dmnapolitano closed 2 months ago
Weird weird. could you do a git blame or something and check how they ended up here?
Weird weird. could you do a git blame or something and check how they ended up here?
Good idea! I've never done that before. Here's what I see:
$ git blame -L 84 src/elexmodel/handlers/data/Estimandizer.py
cc6a5f0f (lbvienna 2023-09-21 17:29:24 -0400 84) def add_weights(self, data_df, col_prefix):
cc6a5f0f (lbvienna 2023-09-21 17:29:24 -0400 85) data_df[f"{col_prefix}weights"] = data_df[f"{col_prefix}turnout"]
9d5c1019 (Diane Napolitano 2023-09-07 13:17:39 -0400 86) return data_df
795d0cde (lbvienna 2023-09-21 17:38:10 -0400 87)
44e6b909 (lbvienna 2023-09-21 18:36:20 -0400 88) def add_turnout_factor(self, data_df):
440a7e06 (lbvienna 2023-09-22 10:49:31 -0400 89) # posinf and neginf are also set to zero because dividing by zero can lead to nan/posinf/neginf depending
440a7e06 (lbvienna 2023-09-22 10:49:31 -0400 90) # on the type of the numeric in the numpy array. Assume that if baseline_weights is zero then turnout
440a7e06 (lbvienna 2023-09-22 10:49:31 -0400 91) # would be incredibly low in this election too (ie. this is effectively an empty precinct) and so setting
440a7e06 (lbvienna 2023-09-22 10:49:31 -0400 92) # the turnout factor to zero is fine
440a7e06 (lbvienna 2023-09-22 10:49:31 -0400 93) data_df["turnout_factor"] = np.nan_to_num(
440a7e06 (lbvienna 2023-09-22 10:49:31 -0400 94) data_df.results_weights / data_df.baseline_weights, nan=0, posinf=0, neginf=0
440a7e06 (lbvienna 2023-09-22 10:49:31 -0400 95) )
9d5c1019 (Diane Napolitano 2023-09-07 13:17:39 -0400 96) return data_df
9d5c1019 (Diane Napolitano 2023-09-07 13:17:39 -0400 97)
422c9974 (lbvienna 2023-09-25 15:40:44 -0400 98) def add_weights(self, data_df, col_prefix):
422c9974 (lbvienna 2023-09-25 15:40:44 -0400 99) data_df[f"{col_prefix}weights"] = data_df[f"{col_prefix}turnout"]
422c9974 (lbvienna 2023-09-25 15:40:44 -0400 100) return data_df
422c9974 (lbvienna 2023-09-25 15:40:44 -0400 101)
33e04f70 (lbvienna 2023-09-21 12:57:13 -0400 102) def add_turnout_factor(self, data_df):
275f4cd8 (lbvienna 2023-09-25 16:21:07 -0400 103) # posinf and neginf are also set to zero because dividing by zero can lead to nan/posinf/neginf depending
275f4cd8 (lbvienna 2023-09-25 16:21:07 -0400 104) # on the type of the numeric in the numpy array. Assume that if baseline_weights is zero then turnout
275f4cd8 (lbvienna 2023-09-25 16:21:07 -0400 105) # would be incredibly low in this election too (ie. this is effectively an empty precinct) and so setting
275f4cd8 (lbvienna 2023-09-25 16:21:07 -0400 106) # the turnout factor to zero is fine
275f4cd8 (lbvienna 2023-09-25 16:21:07 -0400 107) data_df["turnout_factor"] = np.nan_to_num(
275f4cd8 (lbvienna 2023-09-25 16:21:07 -0400 108) data_df.results_weights / data_df.baseline_weights, nan=0, posinf=0, neginf=0
275f4cd8 (lbvienna 2023-09-25 16:21:07 -0400 109) )
33e04f70 (lbvienna 2023-09-21 12:57:13 -0400 110) return data_df
a9a2354a (Diane Napolitano 2023-09-07 14:24:50 -0400 111)
795d0cde (lbvienna 2023-09-21 17:38:10 -0400 112)
I honestly have no idea. My guess is the code was added during multiple PRs after having been removed and we somehow didn't notice 🤔
weird weird
Description
Hi! The changes in this PR remove duplicate methods from the
Estimandizer
class. I genuinely don't know how that happened or whattox
warning I was ignoring this entire time 😓 🤔Also, I forgot to update the Github Actions to use Python 3.11 in PR #95. Hope it's ok to do so here; if not I can easily separate that out 😄
Test Steps
tox