Open rishasurana opened 1 year ago
ok, so close. but I also just realized there is something weird happening with the cli. The parameter names don't match, I think (and also you will need to update the default in
cli.py
Changed it such that it can be called like this: --lambda='[0.01, 0.05, 0.90, 0.56]' , but the internal variable name is still lambda_. The other alternative is using the click.option() multiple=True parameter (as used with some of the other params) which would require us to pass in values like this: --lambda 0.01 --lambda 0.05 --lambda 0.90 --lambda 0.56
If I run the code with postal code fixed effects I get a nan error:
elexmodel 2020-11-03_USA_G --office_id=P --estimands=dem --geographic_unit_type=county --pi_method=nonparametric --percent_reporting=30 --aggregates=postal_code --aggregates=unit --fixed_effects=postal_code --lambda='[1, 10, 100, 1000]'
When getting unit predictions for each fold nonreporting_units_features = self.featurizer.featurize_heldout_data(nonreporting_units)
, there is number of NAN
values (1-2% of the total df) in nonreporting_units_features
post calling featurize_heldout_data on the test set of each fold. I can't seem to replicate this issue for other fixed-effect/aggregate combinations and it works fine when not splitting the data into test/train sets. Each time I run this command, certain states have NAN values and cause errors more often than others (HI, DE, NJ). Which states cause errors varies each time because the fold data is now randomly selected.
Example runs: Columns that have NAN values, total NAN values in dataframe
Using fillna(0)
here in the Featurizer class: new_heldout_data = new_heldout_data.join(missing_expanded_fixed_effects_df)
resolves this issue because it seems like certain states may be inconsistent in certain folds for the smaller test sets and so when we use .join
it fills them with NAN.
Switching to use .merge
doesn't work here because there are no common columns to merge on.
It appears that lambda has little to no effect on our model predictions... for most cases
Testing 2020 data:
command:
elexmodel 2020-11-03_USA_G --office_id=P --estimands=dem --geographic_unit_type=county --pi_method=gaussian --percent_reporting=50 --aggregates=postal_code --aggregates=unit --lambda='[18347589365782638975893745893653.0, 2.0, 3345683987523849082439027492748237490278420384928394829089032742355.0, 5.0, 9.0, 15.0, 25.0, 37.0, 483475289735820748302742374823742342.0, 1345325206.0, 99.0]'
output:
More than one lambda has the lowest average MAPE of: 0.067 [1.834758936578264e+31, 2.0, 3.3456839875238492e+66, 5.0, 9.0, 15.0, 25.0, 37.0, 4.834752897358207e+35, 1345325206.0, 99.0]
I've tested large and small lambda values and that still doesn't make a difference in the predictions. For example, let's consider the unit predictions for fold 1:
Lambda 1.834758936578264e+31: [421, 2420.0]
Lambda 2.0: [421, 2420.0]
Lambda 3.3456839875238492e+66: [421, 2420.0]
Lambda 5.0: [421, 2420.0]
Lambda 9.0: [421, 2420.0]
Lambda 15.0: [421, 2420.0]
Lambda 25.0: [421, 2420.0]
Lambda 37.0: [421, 2420.0]
Lambda 4.834752897358207e+35: [421, 2420.0]
Lambda 1345325206.0: [421, 2420.0]
Lambda 99.0: [421, 2420.0]
The same pattern holds true for the unit predictions of fold 2 and fold 3 (fold 2 != fold 3 ever, but fold 2 + lambda 1 = fold 2 + lambda 2). The fold train/test dataset that is given, consistently produces the same unit predictions across lambda values. So, the fold MAPE values calculated are the same for each lambda value.
I've removed default values and verified that the lambda values are correctly being passed through to the QR when called in find_optimal_lambda() and are being added to the loss function: loss function:
Sum([6.23963173e-04 1.53951848e-05 8.20251779e-04 ... 7.51431638e-06
1.22245098e-04 1.22794926e-05] @ (Promote(0.5, (1517,)) @ abs([0.37310912 0.125 0.15607195 ... 0.53658537 0.13643178 0.35820896] + -[[1.]
[1.]
[1.]
...
[1.]
[1.]
[1.]] @ var370) + Promote(0.0, (1517,)) @ ([0.37310912 0.125 0.15607195 ... 0.53658537 0.13643178 0.35820896] + -[[1.]
[1.]
[1.]
...
[1.]
[1.]
[1.]] @ var370)), None, False)
loss function with lambda:
Sum([6.23963173e-04 1.53951848e-05 8.20251779e-04 ... 7.51431638e-06
1.22245098e-04 1.22794926e-05] @ (Promote(0.5, (1517,)) @ abs([0.37310912 0.125 0.15607195 ... 0.53658537 0.13643178 0.35820896] + -[[1.]
[1.]
[1.]
...
[1.]
[1.]
[1.]] @ var370) + Promote(0.0, (1517,)) @ ([0.37310912 0.125 0.15607195 ... 0.53658537 0.13643178 0.35820896] + -[[1.]
[1.]
[1.]
...
[1.]
[1.]
[1.]] @ var370)), None, False) + 1.6786870797807898e+26 @ power(Pnorm(var370[1], 2), 2.0)
From what I've read some reasons that this could be happening are that the dataset is already well-behaved, the features are not highly correlated, or the model is not that complex.
command:
elexmodel 2020-11-03_USA_G --office_id=P --estimands=dem --geographic_unit_type=county --pi_method=gaussian --percent_reporting=50 --aggregates=postal_code --aggregates=unit --lambda='[12.0, 2.0, 33.0, 5.0, 9.0, 15.0, 25.0, 37.0, 234, 1000, 99.0]' --features=ethnicity_european --features=ethnicity_east_and_south_asian --features=ethnicity_hispanic_and_portuguese --features=ethnicity_likely_african_american
output:
More than one lambda has the lowest average MAPE of: 0.067 [12.0, 2.0, 33.0, 5.0, 9.0, 15.0, 25.0, 37.0, 234, 1000, 99.0]
command:
elexmodel 2020-11-03_USA_G --office_id=P --estimands=dem --geographic_unit_type=county --pi_method=gaussian --percent_reporting=50 --aggregates=postal_code --aggregates=unit --lambda='[12.0, 2.0, 33.0, 5.0, 9.0, 15.0, 25.0, 37.0, 234, 1000, 0, 0.000006]' --features=ethnicity_european --features=ethnicity_east_and_south_asian --features=ethnicity_hispanic_and_portuguese --features=ethnicity_likely_african_american
output:
More than one lambda has the lowest average MAPE of: 0.073 [12.0, 2.0, 33.0, 5.0, 9.0, 15.0, 25.0, 37.0, 234, 1000]
So far the only test that seems to work as intended is this integration test:
command:
elexmodel 2017-11-07_VA_G --estimands=dem --office_id=G --geographic_unit_type=precinct --aggregates=county_classification --aggregates=postal_code --fixed_effects=county_classification --percent_reporting 10 --features=ethnicity_european --features=ethnicity_hispanic_and_portuguese --lambda='[50, 0.5, 10, 100, 437837, 2, 0]'
output:
avg_MAPE values for each lambda: [0.143 0.137 0.143 0.143 0.143 0.143 0.097] best: 0
Here, for a large dataset, we are able to produce more variation (3 distinct) in terms of average MAPE. 5 of the 7 given lambdas have the same MAPE. The issue with this test working as intended means that this is less likely to be a logic issue with the code and more likely to be something else. The rest of the integration tests only have 1 distinct MAPE (all the same).
command:
elexmodel 2020-11-03_USA_G --estimands=dem --office_id=P --geographic_unit_type=county --aggregates=county_classification --aggregates=postal_code --fixed_effects=county_classification --percent_reporting 10 --features=ethnicity_european --features=ethnicity_hispanic_and_portuguese --lambda='[50, 0.5, 10, 100, 437837, 2, 0.02, 0]'
output:
avg_MAPE values for each lambda: [0.083 0.077 0.083 0.083 0.073 0.087 0.07 ] best: 0
command:
elexmodel 2020-11-03_USA_G --estimands=dem --office_id=P --geographic_unit_type=county --aggregates=county_classification --aggregates=postal_code --fixed_effects=county_classification --percent_reporting 10 --features=ethnicity_european --features=ethnicity_hispanic_and_portuguese --lambda='[50, 0.5, 10, 100, 437837, 2, 0.02, 0]'
output:
More than one lambda has the lowest average MAPE of: 0.07 [0.5, 0.02, 0] avg_MAPE values for each lambda: [0.077 0.07 0.077 0.077 0.073 0.08 0.07 0.07 ] best: 0.5
The unit tests: In the unit tests, the test datasets have a similar issue as before, generally producing a maximum of 2 unique MAPEs by the end of each run.
def test_find_optimal_lambda_tens():
"""
Test/view computing lambda
"""
lambda_ = [0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]
model_settings = {"features": ["b"]}
model = BaseElectionModel.BaseElectionModel(model_settings)
df_X = pd.DataFrame(
{
"residuals_a": [1, 2, 3, 4],
"total_voters_a": [4, 2, 9, 5],
"last_election_results_a": [5, 1, 4, 2],
"results_a": [5, 2, 8, 4],
"results_b": [3, 6, 2, 1],
"baseline_a": [9, 2, 4, 5],
"baseline_b": [9, 2, 4, 5],
"a": [2, 2, 3, 7],
"b": [2, 3, 4, 5],
}
)
new_lambda, avg_MAPE = model.find_optimal_lambda(df_X, lambda_, "a")
assert new_lambda in [0.01, 10.0] # more than one lambda value can have the same average MAPE
Here, the results oscillate between
So the best chosen lambda for this test is either 0.01 or 10.0 (since it chooses the first given lambda with the lowest average MAPE). This happens for the other 2 unit tests as well. The oscillation is due to the fact that we have now randomized the reporting units. Removing that gives us only the first result consistently.
For such a small dataset, there is a slight bit more variation (2 distinct MAPEs on average for a set of 5 lambdas). For a large dataset such as 2020-11-03_USA_G or 2017-11-07_VA_G, depending on the inputs, there is either only 1-3 distinct MAPEs given 5-10 lambdas.
My conclusion is that regularizing our model does not seem to impact the predictions in most cases, but not sure if this is valid?
Lenny should definitely weigh in, but a couple of thoughts:
0.01
in MAPE means in real terms. I think I'd want to see the actual predictions for a couple of close states with the two lambdas to understand that, though maybe Lenny has another idea of how to look at differences.
Description
We can pass a regularization constant to our model
lambda_
, but on election night we don’t know what the optimallambda_
is. Add a feature to the client that take options forlambda_
, the features and fixed effects. It should read in the data and use k-fold cross validation to find the optimal lambda as the lowest average MAPE on the k hold out sets. The function should probably return thelambda_
and the average MAPE.AC:
lambda_
values, the features and fixed effects we are using, any other inputs that are necessary and run k-fold cross validation to pick the bestlambda_
value to useJira Ticket
https://arcpublishing.atlassian.net/browse/ELEX-2418