washingtonpost / elex-live-model

a model to generate estimates of the number of outstanding votes on an election night based on the current results of the race
48 stars 5 forks source link

ELEX-2857-winsorize-error #62

Closed rishasurana closed 1 year ago

rishasurana commented 1 year ago

Description

In test_client.py::test_winsorize_intervals, the check to ensure that the intervals are <=/=> when winsorized compared to when not winsorized would fail occasionally (about once every 10 times) for a difference that is about 0.04% of the predicted data (ex 2441554.0 vs 2442544.0). This would affect the github actions tests for other PRs sometimes.

Jira Ticket

https://arcpublishing.atlassian.net/browse/ELEX-2857

Test Steps

run test_client.py::test_winsorize_intervals

lennybronner commented 1 year ago

This should actually never happen, right? We are always cutting off the largest outliers. Since you are passing in the same reporting units, the only other place there is randomness is when we are creating the conformalization set. Can you check whether these sets are the same in both runs?

rishasurana commented 1 year ago

This should actually never happen, right? We are always cutting off the largest outliers. Since you are passing in the same reporting units, the only other place there is randomness is when we are creating the conformalization set. Can you check whether these sets are the same in both runs?

Just checked - the conformalization set is the same for the winsorized and non_winsorized runs.

I remember getting this error more often (every other run) when I had the winsorized limits at 1%. I just pushed a commit that reverts the previous changes, and now has the winsorized limits at 6% from 5%. After this change, there is no error regardless of how many times I run the test since we have cut off all the possible outliers for that particular test. 5.9% works well, but 5.5% eventually resulted in an error. Could this mean that there's an optimal winsorize limit depending on the dataset?

rishasurana commented 1 year ago

The error was that the test was comparing state_data bounds rather than unit_data bounds. Only unit_data is modified by winsorizing in the results handler.

lennybronner commented 1 year ago

I'm not sure I understand what the final conclusion/decision here is?

rishasurana commented 1 year ago

I'm not sure I understand what the final conclusion/decision here is?

lower_0.9_turnout and upper_0.9_turnout are only modified for unit_data and not state_data, by the results handler when we call add_unit_intervals(). The test was comparing these values for state_data, hence causing an issue. Changing the test to look at unit_data, which is actually impacted by the winsorizing, fixes the error.

lennybronner commented 1 year ago

I'm not sure I understand what the final conclusion/decision here is?

lower_0.9_turnout and upper_0.9_turnout are only modified for unit_data and not state_data, by the results handler when we call add_unit_intervals(). The test was comparing these values for state_data, hence causing an issue. Changing the test to look at unit_data, which is actually impacted by the winsorizing, fixes the error.

Ok cool! This might be a stupid question, but why would the state data change if the unit data is the same?

rishasurana commented 1 year ago

The bootstrap function is causing the randomness. We are repeating the calculation 10,000 times with random sampling, so we occasionally end up with samples that give smaller/larger results for both State and Unit data. Setting the random_state parameter to a prime integer solves this issue without being too invasive. The calculation is still done several times but, if random_state is an int, a new RandomState instance is used, seeded with that value, when generating resamples and ensuring they are consistent.

There are two other alternatives:

  1. Modifying the test itself to manually bootstrap with a random_state. We would have to move the test outside of test_client.py and into test_gaussian_model.py, and modify it.
  2. Modifying boot_sigma to take in a value for a random_state (very invasive, we would need to update model settings/parameters almost everywhere)

Also had to update pre-commit here for the F811 error (redefinition of unused name from line n) in a file that wasn't modified by this PR, (would pass locally), but was causing an error with github actions.

jchaskell commented 1 year ago

I verified this and have some example data I can share if you want to see it Lenny.

I'm not sure if having a set seed will cause any issues? I would think not but I could be missing something.

lennybronner commented 1 year ago

This makes sense, thank you for investigating @rishasurana! We want to set a seed in the test only.

rishasurana commented 1 year ago

@lennybronner Thanks! Requesting a re-review because I had to make some updates to test_client.py after merging the generalize parameter PR.