Closed dmnapolitano closed 1 year ago
So I just checked again and:
elexmodel 2021-11-02_VA_G --estimands=dem --office_id=G --geographic_unit_type=precinct --percent_reporting 60 --historical --aggregates=county_classification
does work as expected in develop
So I just checked again and:
elexmodel 2021-11-02_VA_G --estimands=dem --office_id=G --geographic_unit_type=precinct --percent_reporting 60 --historical --aggregates=county_classification
does work as expected in develop
Alright, I got this working again 🎉 but what I had to do was add in logic to check for a results_dem
column and, if it's not present and we're doing a historical run and baseline_dem
is present, create results_dem = nan
. So in other words, we had removed this from MockLiveDataHandler
:
and so I needed to add this:
😕 Any thoughts on this? 🤔
Hi @lennybronner 👋🏻 I'm having trouble figuring out what (sub)units are unexpected in the output from this command:
elexmodel 2020-11-03_USA_G --office_id=P --estimands=party_vote_share_dem --geographic_unit_type=county --pi_method=nonparametric --percent_reporting=30 --aggregates=postal_code --historical
I see "There are 0 unexpected units." every step of this run, same as develop
. Maybe we can look at this together sometime. Thanks 🤔
Alright, so here's what I think is happening:
The issue is that some of our data sets lack results_estimand
columns for some of our standard, known estimands such as total turnout
, dem
turnout, etc. An example of this is "2021-11-02_VA_G". So if you're on the develop
branch, and you run:
elexmodel 2021-11-02_VA_G --estimands=dem --office_id=G --geographic_unit_type=precinct --percent_reporting 60 --aggregates=county_classification
This command fails with a elexmodel.handlers.data.LiveData.MockLiveDataHandlerException: This is not a test election, it is missing results data
exception. However, if you add the --historical
flag, this command succeeds, because we have these two lines in develop
's MockLiveDataHandler.load_data()
:
for estimand in estimands:
if historical:
data[f"results_{estimand}"] = np.nan
Now in the estimandizer
branch, since estimand
in that code fragment above is one of our known estimands (specifically dem
), for that command, we want to ensure the same behavior: without the --historical
flag, since results_dem
is missing, throw an exception and exit. With --historical
, add a results_dem = nan
column and continue. But now that we're allowing estimands to be anything, we need some way to know that the estimand is one of our known ones and not one we're estimandizing on the fly (like party_vote_share_dem
), so we check for the presence of a corresponding baseline_estimand
. Therefore, in Estimandizer.add_estimand_results()
:
for estimand in estimands:
results_col = f"{RESULTS_PREFIX}{estimand}"
if results_col not in data_df.columns:
# will raise a KeyError if a function with the same name as `estimand` doesn't exist
try:
data_df = globals()[estimand](data_df, RESULTS_PREFIX)
except KeyError as e:
if historical and f"{BASELINE_PREFIX}{estimand}" in data_df.columns:
data_df[results_col] = nan
else:
raise e
That having been said:
try/except
like this kinda hack-y, although it is considered typical and accepted Python programming practice; but more importantly, --historical
flag to succeed? Do we even want it with the --historical
flag to succeed? And other related considerations :smile:Alright, so here's what I think is happening:
The issue is that some of our data sets lack
results_estimand
columns for some of our standard, known estimands such as totalturnout
,dem
turnout, etc. An example of this is "2021-11-02_VA_G". So if you're on thedevelop
branch, and you run:elexmodel 2021-11-02_VA_G --estimands=dem --office_id=G --geographic_unit_type=precinct --percent_reporting 60 --aggregates=county_classification
This command fails with a
elexmodel.handlers.data.LiveData.MockLiveDataHandlerException: This is not a test election, it is missing results data
exception. However, if you add the--historical
flag, this command succeeds, because we have these two lines indevelop
'sMockLiveDataHandler.load_data()
:for estimand in estimands: if historical: data[f"results_{estimand}"] = np.nan
Now in the
estimandizer
branch, sinceestimand
in that code fragment above is one of our known estimands (specificallydem
), for that command, we want to ensure the same behavior: without the--historical
flag, sinceresults_dem
is missing, throw an exception and exit. With--historical
, add aresults_dem = nan
column and continue. But now that we're allowing estimands to be anything, we need some way to know that the estimand is one of our known ones and not one we're estimandizing on the fly (likeparty_vote_share_dem
), so we check for the presence of a correspondingbaseline_estimand
. Therefore, inEstimandizer.add_estimand_results()
:for estimand in estimands: results_col = f"{RESULTS_PREFIX}{estimand}" if results_col not in data_df.columns: # will raise a KeyError if a function with the same name as `estimand` doesn't exist try: data_df = globals()[estimand](data_df, RESULTS_PREFIX) except KeyError as e: if historical and f"{BASELINE_PREFIX}{estimand}" in data_df.columns: data_df[results_col] = nan else: raise e
That having been said:
- While this works, in general I find programming to a
try/except
like this kinda hack-y, although it is considered typical and accepted Python programming practice; but more importantly,- We totally have an opportunity here to think through this logic. Do we want that command without the
--historical
flag to succeed? Do we even want it with the--historical
flag to succeed? And other related considerations 😄
This is amazing, thank you so much for figuring this out!
To answer your general question, I think the answer is yes. Here is why:
--historical
flag because we have not added the results data to this election (so in reality it cannot succeed, since it has no results from 2021 to pretend report with).--historical
flag since on election night, we will want to run it with that flag to check what these current reporting counties would have said in the 2017 governor election.What do you think?
Alright, so here's what I think is happening: The issue is that some of our data sets lack
results_estimand
columns for some of our standard, known estimands such as totalturnout
,dem
turnout, etc. An example of this is "2021-11-02_VA_G". So if you're on thedevelop
branch, and you run:elexmodel 2021-11-02_VA_G --estimands=dem --office_id=G --geographic_unit_type=precinct --percent_reporting 60 --aggregates=county_classification
This command fails with a
elexmodel.handlers.data.LiveData.MockLiveDataHandlerException: This is not a test election, it is missing results data
exception. However, if you add the--historical
flag, this command succeeds, because we have these two lines indevelop
'sMockLiveDataHandler.load_data()
:for estimand in estimands: if historical: data[f"results_{estimand}"] = np.nan
Now in the
estimandizer
branch, sinceestimand
in that code fragment above is one of our known estimands (specificallydem
), for that command, we want to ensure the same behavior: without the--historical
flag, sinceresults_dem
is missing, throw an exception and exit. With--historical
, add aresults_dem = nan
column and continue. But now that we're allowing estimands to be anything, we need some way to know that the estimand is one of our known ones and not one we're estimandizing on the fly (likeparty_vote_share_dem
), so we check for the presence of a correspondingbaseline_estimand
. Therefore, inEstimandizer.add_estimand_results()
:for estimand in estimands: results_col = f"{RESULTS_PREFIX}{estimand}" if results_col not in data_df.columns: # will raise a KeyError if a function with the same name as `estimand` doesn't exist try: data_df = globals()[estimand](data_df, RESULTS_PREFIX) except KeyError as e: if historical and f"{BASELINE_PREFIX}{estimand}" in data_df.columns: data_df[results_col] = nan else: raise e
That having been said:
- While this works, in general I find programming to a
try/except
like this kinda hack-y, although it is considered typical and accepted Python programming practice; but more importantly,- We totally have an opportunity here to think through this logic. Do we want that command without the
--historical
flag to succeed? Do we even want it with the--historical
flag to succeed? And other related considerations 😄This is amazing, thank you so much for figuring this out!
To answer your general question, I think the answer is yes. Here is why:
- The command should not succeed without the
--historical
flag because we have not added the results data to this election (so in reality it cannot succeed, since it has no results from 2021 to pretend report with).- The command should succeed with the
--historical
flag since on election night, we will want to run it with that flag to check what these current reporting counties would have said in the 2017 governor election.- Since for the most part the CLI should replicate what happens on election night, I think we should keep the functionality as is.
What do you think?
Thanks so much! 🎉 Yeah, I agree with keeping it the way it is. That logic makes sense to me 😄 👍🏻
Ok, sorry I found another weirdness 😬 . This works as expected:
elexmodel 2021-11-02_VA_G --office_id=G --estimands=dem --geographic_unit_type=precinct --pi_method=gaussian --percent_reporting=5 --historical
Now this does not work:
elexmodel 2021-11-02_VA_G --office_id=G --estimands=party_vote_share_dem --geographic_unit_type=precinct --pi_method=gaussian --percent_reporting=5 --historical
This is because in initial live data component (ie. when we are still handling 2021 data) the second part of this if statement fails: if historical and f"{BASELINE_PREFIX}{estimand}" in data_df.columns:
since baseline_party_vote_share_dem
is not in the data. I wonder what happens if we remove the second part of the if statement? Would that solve our problem or just cause more problems? 😬
Ok, sorry I found another weirdness 😬 . This works as expected:
elexmodel 2021-11-02_VA_G --office_id=G --estimands=dem --geographic_unit_type=precinct --pi_method=gaussian --percent_reporting=5 --historical
Now this does not work:
elexmodel 2021-11-02_VA_G --office_id=G --estimands=party_vote_share_dem --geographic_unit_type=precinct --pi_method=gaussian --percent_reporting=5 --historical
This is because in initial live data component (ie. when we are still handling 2021 data) the second part of this if statement fails:
if historical and f"{BASELINE_PREFIX}{estimand}" in data_df.columns:
sincebaseline_party_vote_share_dem
is not in the data. I wonder what happens if we remove the second part of the if statement? Would that solve our problem or just cause more problems? 😬
Interesting! 🤔 Yes, I think removing the and f"{BASELINE_PREFIX}{estimand}" in data_df.columns
from the if
statement should do the trick, but only in the case of a historical run because we'll be creating the baseline_party_vote_share_dem
later. If the data set is missing the other baseline_
columns, that's another problem altogether so that should fail elsewhere.
Sorry about that! I think I was trying to safeguard against a lack of baseline_estimand
column in a place where that safeguard ultimately wasn't necessary 😬 🤔
Description
Hi! The changes in this pull request should resolve ELEX-2763 and supersede PRs #59 and #73 🎉
Outstanding Issues
1. This command:elexmodel 2020-11-03_USA_G --office_id=P --estimands=party_vote_share_dem --geographic_unit_type=county --pi_method=nonparametric --percent_reporting=30 --aggregates=postal_code --historicalproduces unexpected units 🤔Fixed! 🎉3. This integration test is failing:elexmodel 2021-11-02_VA_G --estimands=dem --office_id=G --geographic_unit_type=precinct --percent_reporting 60 --historical --aggregates=county_classificationThe issue is that "baseline_dem" is present but not "results_dem", so theResolved! 🎉Estimandizer
is trying (reasonably) to estimandize a "results_dem" and failing due to a lack ofdem
estimandizer. I'm not exactly sure what to do in this situation 🤔Jira Ticket
ELEX-2763
Test Steps
tox
and/or your favoriteelexmodel
or test bed command.