washingtonpost / elex-live-model

a model to generate estimates of the number of outstanding votes on an election night based on the current results of the race
48 stars 5 forks source link

Elex 4549 add model flexibility #109

Open lennybronner opened 1 month ago

lennybronner commented 1 month ago

Description

Instead of being stuck with OLS as our base model for the bootstrap model, this PR gives us flexibility to use different models. We were wed to OLS because we had a quick way of computing the leave-one-out-residual, which we needed for the bootstrap, since the training residual would be biased towards zero. We now use k-fold cross validation to get an estimate for the leave-one-out residual (k-fold residual will be greater than or equal to loo residual, so this is if anything more conservative). We also add the ability to play with OLS vs. QR models. In the future, we may expand the models that are being used here.

This needs this branch of elex-solver, which allows us to calculate the k-fold residual. Tests will fail until this branch is merged/released.

Jira Ticket

https://arcpublishing.atlassian.net/browse/ELEX-4549

Test Steps

elexmodel 2017-11-07_VA_G --estimands=margin --office_id=G --geographic_unit_type=county --pi_method bootstrap  --features baseline_normalized_margin --percent_reporting 30 --aggregates postal_code --aggregates unit --model_parameters '{"model_type": "QR"}'

vs

elexmodel 2017-11-07_VA_G --estimands=margin --office_id=G --geographic_unit_type=county --pi_method bootstrap  --features baseline_normalized_margin --percent_reporting 30 --aggregates postal_code --aggregates unit --model_parameters '{"model_type": "OLS"}'