openelections / openelections-data-dc

Pre-processed election results for District of Columbia elections
0 stars 5 forks source link

Duplicate and inconsistent entries in 2004 general precinct file #21

Open warwickmm opened 3 years ago

warwickmm commented 3 years ago

2004/20041102dcgeneral__precinct.csv contains not only duplicate rows, but rows with inconsistent vote values as well. Many of these were introduced in revision 9baad1ec214ae4ebc0df069a0fd4a3cf3481b7f8.

        Row 1888: ['4', '65', 'PRESIDENT', '', 'DEM', 'KERRY/EDWARDS', '1782']
        Row 1924: ['4', '65', 'PRESIDENT', '', 'DEM', 'KERRY/EDWARDS', '2074']
        Row 1889: ['4', '65', 'PRESIDENT', '', 'SGN', 'COBB/LaMARCHE', '1']
        Row 1925: ['4', '65', 'PRESIDENT', '', 'SGN', 'COBB/LaMARCHE', '7']
        Row 1890: ['4', '65', 'PRESIDENT', '', 'LIB', 'BADNARIK/CAMPAGNA', '1']
        Row 1926: ['4', '65', 'PRESIDENT', '', 'LIB', 'BADNARIK/CAMPAGNA', '1']
        Row 1891: ['4', '65', 'PRESIDENT', '', 'SWP', 'HARRIS/TROWE', '0']
        Row 1927: ['4', '65', 'PRESIDENT', '', 'SWP', 'HARRIS/TROWE', '0']
        Row 1892: ['4', '65', 'PRESIDENT', '', 'REP', 'BUSH/CHENEY', '66']
        Row 1928: ['4', '65', 'PRESIDENT', '', 'REP', 'BUSH/CHENEY', '120']
        Row 1893: ['4', '65', 'PRESIDENT', '', 'IND', 'NADER/CAMEJO', '3']
        Row 1929: ['4', '65', 'PRESIDENT', '', 'IND', 'NADER/CAMEJO', '9']
        Row 1894: ['4', '65', 'PRESIDENT', '', '', 'Write in', '4']
        Row 1930: ['4', '65', 'PRESIDENT', '', '', 'Write in', '4']
        Row 1935: ['4', '65', 'PRESIDENT', '', '', 'Write in', '9']
        Row 1895: ['4', '65', 'PRESIDENT', '', '', 'Total', '1,857']
        Row 1931: ['4', '65', 'PRESIDENT', '', '', 'Total', '2,215']
        Row 1936: ['4', '65', 'PRESIDENT', '', '', 'Total', '2,187']
        Row 1921: ['4', '65', 'DISTRICT II MEMBER OF THE BOARD OF EDU', '', '', 'Total', '1,754']
        Row 1923: ['4', '65', 'DISTRICT II MEMBER OF THE BOARD OF EDU', '', '', 'Total', '2,241']
        Row 1941: ['4', '65', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Write in', '14']
        Row 1946: ['4', '65', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Write in', '9']
        Row 1942: ['4', '65', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Total', '3,190']
        Row 1947: ['4', '65', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Total', '2,044']
        Row 2086: ['5', '73', 'PRESIDENT', '', '', 'Write in', '4']
        Row 2091: ['5', '73', 'PRESIDENT', '', '', 'Write in', '6']
        Row 2087: ['5', '73', 'PRESIDENT', '', '', 'Total', '1,330']
        Row 2092: ['5', '73', 'PRESIDENT', '', '', 'Total', '1,296']
        Row 2097: ['5', '73', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Write in', '14']
        Row 2102: ['5', '73', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Write in', '10']
        Row 2098: ['5', '73', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Total', '1,874']
        Row 2103: ['5', '73', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Total', '1,203']
        Row 2126: ['5', '75', 'PRESIDENT', '', 'DEM', 'KERRY/EDWARDS', '1459']
        Row 2150: ['5', '75', 'PRESIDENT', '', 'DEM', 'KERRY/EDWARDS', '410']
        Row 2127: ['5', '75', 'PRESIDENT', '', 'SGN', 'COBB/LaMARCHE', '9']
        Row 2151: ['5', '75', 'PRESIDENT', '', 'SGN', 'COBB/LaMARCHE', '0']
        Row 2128: ['5', '75', 'PRESIDENT', '', 'LIB', 'BADNARIK/CAMPAGNA', '4']
        Row 2152: ['5', '75', 'PRESIDENT', '', 'LIB', 'BADNARIK/CAMPAGNA', '0']
        Row 2129: ['5', '75', 'PRESIDENT', '', 'SWP', 'HARRIS/TROWE', '1']
        Row 2153: ['5', '75', 'PRESIDENT', '', 'SWP', 'HARRIS/TROWE', '1']
        Row 2130: ['5', '75', 'PRESIDENT', '', 'REP', 'BUSH/CHENEY', '46']
        Row 2154: ['5', '75', 'PRESIDENT', '', 'REP', 'BUSH/CHENEY', '19']
        Row 2131: ['5', '75', 'PRESIDENT', '', 'IND', 'NADER/CAMEJO', '12']
        Row 2155: ['5', '75', 'PRESIDENT', '', 'IND', 'NADER/CAMEJO', '1']
        Row 2132: ['5', '75', 'PRESIDENT', '', '', 'Write in', '3']
        Row 2156: ['5', '75', 'PRESIDENT', '', '', 'Write in', '1']
        Row 2161: ['5', '75', 'PRESIDENT', '', '', 'Write in', '0']
        Row 2133: ['5', '75', 'PRESIDENT', '', '', 'Total', '1,534']
        Row 2157: ['5', '75', 'PRESIDENT', '', '', 'Total', '432']
        Row 2162: ['5', '75', 'PRESIDENT', '', '', 'Total', '421']
        Row 2147: ['5', '75', 'SHADOW SENATOR', '', '', 'Total', '1,402']
        Row 2149: ['5', '75', 'SHADOW SENATOR', '', '', 'Total', '438']
        Row 2167: ['5', '75', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Write in', '7']
        Row 2172: ['5', '75', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Write in', '1']
        Row 2168: ['5', '75', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Total', '624']
        Row 2173: ['5', '75', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Total', '405']
        Row 2202: ['5', '78', 'PRESIDENT', '', '', 'Write in', '0']
        Row 2207: ['5', '78', 'PRESIDENT', '', '', 'Write in', '2']
        Row 2203: ['5', '78', 'PRESIDENT', '', '', 'Total', '1,415']
        Row 2208: ['5', '78', 'PRESIDENT', '', '', 'Total', '1,397']
        Row 2213: ['5', '78', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Write in', '14']
        Row 2218: ['5', '78', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Write in', '4']
        Row 2214: ['5', '78', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Total', '1,772']
        Row 2219: ['5', '78', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Total', '1,302']
        Row 2473: ['6', '90', 'PRESIDENT', '', '', 'Write in', '5']
        Row 2478: ['6', '90', 'PRESIDENT', '', '', 'Write in', '9']
        Row 2474: ['6', '90', 'PRESIDENT', '', '', 'Total', '1,298']
        Row 2479: ['6', '90', 'PRESIDENT', '', '', 'Total', '1,272']
        Row 2484: ['6', '90', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Write in', '16']
        Row 2489: ['6', '90', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Write in', '22']
        Row 2485: ['6', '90', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Total', '1,856']
        Row 2490: ['6', '90', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Total', '1,066']
        Row 2600: ['7', '95', 'PRESIDENT', '', '', 'Write in', '2']
        Row 2605: ['7', '95', 'PRESIDENT', '', '', 'Write in', '2']
        Row 2601: ['7', '95', 'PRESIDENT', '', '', 'Total', '775']
        Row 2606: ['7', '95', 'PRESIDENT', '', '', 'Total', '754']
        Row 2616: ['7', '95', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Write in', '11']
        Row 2621: ['7', '95', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Write in', '0']
        Row 2617: ['7', '95', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Total', '756']
        Row 2622: ['7', '95', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Total', '725']
        Row 2656: ['7', '97', 'PRESIDENT', '', '', 'Write in', '1']
        Row 2661: ['7', '97', 'PRESIDENT', '', '', 'Write in', '0']
        Row 2657: ['7', '97', 'PRESIDENT', '', '', 'Total', '640']
        Row 2662: ['7', '97', 'PRESIDENT', '', '', 'Total', '626']
        Row 2672: ['7', '97', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Write in', '23']
        Row 2677: ['7', '97', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Write in', '1']
        Row 2673: ['7', '97', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Total', '620']
        Row 2678: ['7', '97', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Total', '596']
        Row 2712: ['7', '100', 'PRESIDENT', '', '', 'Write in', '0']
        Row 2717: ['7', '100', 'PRESIDENT', '', '', 'Write in', '2']
        Row 2713: ['7', '100', 'PRESIDENT', '', '', 'Total', '980']
        Row 2718: ['7', '100', 'PRESIDENT', '', '', 'Total', '948']
        Row 2728: ['7', '100', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Write in', '17']
        Row 2733: ['7', '100', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Write in', '6']
        Row 2729: ['7', '100', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Total', '952']
        Row 2734: ['7', '100', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Total', '895']
        Row 2984: ['7', '109', 'PRESIDENT', '', '', 'Write in', '3']
        Row 2989: ['7', '109', 'PRESIDENT', '', '', 'Write in', '1']
        Row 2985: ['7', '109', 'PRESIDENT', '', '', 'Total', '724']
        Row 2990: ['7', '109', 'PRESIDENT', '', '', 'Total', '722']
        Row 3000: ['7', '109', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Write in', '13']
        Row 3005: ['7', '109', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Write in', '5']
        Row 3001: ['7', '109', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Total', '711']
        Row 3006: ['7', '109', 'ARD SEVEN MEMBER OF THE COUNCIL', '', '', 'Total', '680']
        Row 3503: ['2', '129', 'PRESIDENT', '', '', 'Write in', '3']
        Row 3508: ['2', '129', 'PRESIDENT', '', '', 'Write in', '3']
        Row 3504: ['2', '129', 'PRESIDENT', '', '', 'Total', '819']
        Row 3509: ['2', '129', 'PRESIDENT', '', '', 'Total', '788']
        Row 3519: ['2', '129', 'ARD TWO MEMBER OF THE COUNCIL', '', '', 'Write in', '4']
        Row 3524: ['2', '129', 'ARD TWO MEMBER OF THE COUNCIL', '', '', 'Write in', '6']
        Row 3520: ['2', '129', 'ARD TWO MEMBER OF THE COUNCIL', '', '', 'Total', '771']
        Row 3525: ['2', '129', 'ARD TWO MEMBER OF THE COUNCIL', '', '', 'Total', '705']
        Row 3711: ['1', '137', 'PRESIDENT', '', '', 'Write in', '0']
        Row 3716: ['1', '137', 'PRESIDENT', '', '', 'Write in', '3']
        Row 3712: ['1', '137', 'PRESIDENT', '', '', 'Total', '531']
        Row 3717: ['1', '137', 'PRESIDENT', '', '', 'Total', '522']
        Row 3722: ['1', '137', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Write in', '9']
        Row 3727: ['1', '137', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Write in', '8']
        Row 3723: ['1', '137', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Total', '697']
        Row 3728: ['1', '137', 'T-LARGE MEMBER OF THE COUNCIL', '', '', 'Total', '465']
warwickmm commented 3 years ago

Looking at this a bit more closely, several other issues have become apparent. Consider the following example:

https://github.com/openelections/openelections-data-dc/blob/2dbf228868140279bf8106680d0a6fd92dd2a7e4/2004/20041102__dc__general__precinct.csv#L1922-L1936

image

This pattern where the precinct and offices are incorrect repeats throughout the file.

See the following source for comparison.