washingtonpost / elex-live-model

a model to generate estimates of the number of outstanding votes on an election night based on the current results of the race
48 stars 5 forks source link

Elex 1235 create default aggregates #53

Closed lennybronner closed 1 year ago

lennybronner commented 1 year ago

Description

We now set the default aggregates dynamically based on whether the election is statewide, in which case the default aggregate is postal_code or districtwide (e.g. house elections), in which case the default aggregate is (postal_code, district).

The second change is that we no longer hard-code postal_code as the main aggregate when generating the aggregate predictions. Previously we always passed just postal_code as the largest aggregate to the model, even when were were generating county_classification predictions for House races. This meant that we wouldn't be able to create county_classification predictions for each state, district, but only for each state. This is now resolved, since we use the default aggregate for that also.

Jira Ticket

https://arcpublishing.atlassian.net/browse/ELEX-1235

Test Steps

Added unit tests to run tox. To see the new functionality run this in develop:

elexmodel 2017-11-07_VA_G --estimands=dem --office_id=Y --geographic_unit_type=precinct-district --percent_reporting 10 --unexpected_units=10 --aggregates=postal_code --aggregates=district --aggregates=county_fips

The county_fips predictions do not take into account the district, since we are aggregating over postal_code, county_fips instead of postal_code, district, county_fips. If you run the same invocation in this branch, this will be resolved.

Note

The model now forces the user to input district for district wide races (ie. when office id is H, Y or Z since otherwise the model may break when dealing with unexpected units. This is because district was not in the passed in defaults (so we do not create a district column for the unexpected units) but it's expected as part of the default aggregates when creating the list to generate the aggregate predictions. Here is an example:

elexmodel 2017-11-07_VA_G --estimands=dem --office_id=Y --geographic_unit_type=precinct-district --percent_reporting 10 --unexpected_units=10 --aggregates=postal_code --aggregates=county_fips

This can likely be fixed, if we think that is necessary.