Closed carbonphyber closed 2 years ago
Remove mail in and early voting counts as they were mistakenly just copies of the total vote.
Errors fixed looked like:
====================================================================== FAIL: test_vote_method_totals (data_tests.test_data.VoteBreakdownTotalsTest) [2020/20201103__ca__general__precinct.csv] (group='2020') ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/runner/work/openelections-data-ca/openelections-data-ca/data_tests/data_tests/test_data.py", line 158, in test_vote_method_totals self._assertTrue(data_test.passed, f"{self} [{short_path}]", short_message, full_message) File "/home/runner/work/openelections-data-ca/openelections-data-ca/data_tests/data_tests/test_data.py", line 59, in _assertTrue self.assertTrue(result, short_message) AssertionError: False is not true : There are 615 rows where the sum of ['early_voting', 'election_day', 'provisional'] is greater than 'votes': Headers: ['county', 'precinct', 'office', 'district', 'candidate', 'party', 'votes', 'early_voting', 'election_day', 'provisional']: Row 197801: ['Kern', '11155', 'Registered Voters', '', '', '', '1261', '1261', '1261', ''] Row 197802: ['Kern', '11190', 'Registered Voters', '', '', '', '1610', '1610', '1610', ''] Row 197803: ['Kern', '11195', 'Registered Voters', '', '', '', '1[36](https://github.com/carbonphyber/openelections-data-ca/runs/5245037563?check_suite_focus=true#step:5:36)6', '1366', '1366', ''] Row 197804: ['Kern', '11530', 'Registered Voters', '', '', '', '929', '929', '929', ''] Row 197805: ['Kern', '115[40](https://github.com/carbonphyber/openelections-data-ca/runs/5245037563?check_suite_focus=true#step:5:40)', 'Registered Voters', '', '', '', '1054', '1054', '1054', ''] Row 197806: ['Kern', '115[46](https://github.com/carbonphyber/openelections-data-ca/runs/5245037563?check_suite_focus=true#step:5:46)', 'Registered Voters', '', '', '', '1123', '1123', '1123', ''] Row 197807: ['Kern', '115[48](https://github.com/carbonphyber/openelections-data-ca/runs/5245037563?check_suite_focus=true#step:5:48)', 'Registered Voters', '', '', '', '1615', '1615', '1615', ''] Row 197808: ['Kern', '115[50](https://github.com/carbonphyber/openelections-data-ca/runs/5245037563?check_suite_focus=true#step:5:50)', 'Registered Voters', '', '', '', '1408', '1408', '1408', ''] Row 197809: ['Kern', '115[52](https://github.com/carbonphyber/openelections-data-ca/runs/5245037563?check_suite_focus=true#step:5:52)', 'Registered Voters', '', '', '', '1346', '1346', '1346', ''] Row 197810: ['Kern', '115[54](https://github.com/carbonphyber/openelections-data-ca/runs/5245037563?check_suite_focus=true#step:5:54)', 'Registered Voters', '', '', '', '1660', '1660', '1660', ''] [Truncated to 10 examples]
In this example, I used a VSCode Find-Replace RegEx to search for lines matching:
Kern,([0-9]+),Registered Voters,,,,([0-9]+),\2,\2,
And replaced them with
Kern,$1,Registered Voters,,,,$2,,,
The logic of the RegEx pattern was to look for Kern County records which had identical values for the votes, early_voting, election_day columns, then empty the values for early_voting, election_day columns.
votes
early_voting
election_day
Remove mail in and early voting counts as they were mistakenly just copies of the total vote.
Errors fixed looked like:
In this example, I used a VSCode Find-Replace RegEx to search for lines matching:
And replaced them with
The logic of the RegEx pattern was to look for Kern County records which had identical values for the
votes
,early_voting
,election_day
columns, then empty the values forearly_voting
,election_day
columns.