Closed kmmurray108 closed 2 months ago
Great, so it seems like Elko, OH is the only county where our CVR data may be incompelte. In other cases, it's the precinct returns that are "incomplete" (or at least different from the state's website). I wonder why it's so systematic?
Resolved Carson City, NV. Otherwise looks good from the CVR perspective
I've taken a look at all of these counties, and the problems are straightforward and usually consistent — the precinct data contains some precincts where candidates earn '-1' votes. Precincts numbered 88 frequently display this problem. I'm working on updating the data, but while it was quick to solve the problems with US PRESIDENT data, it's taking more time than expected for other contests and minor party candidates. I'll include a list here of the precincts that cause the discrepancies in the major-party presidential data.
Replacing the '-1' vote counts with the correct vote counts for Joe Biden and Donald Trump resolve discrepancies between the Harvard and Baltz et al. datasets—leading to exact matches.
I also note that NV is one of the states that masks vote totals for jurisdictions with few votes.
DOUGLAS: Precincts 30, 88
ELKO: Precincts 17, unclear where else
EUREKA: Precinct 88
NYE: Precinct 88, Mercury 08
STOREY: Precincts 11, 12
WASHOE:
"GER-WADS 7589"
"INCLINE VILLAGE 8111"
"INCLINE VILLAGE 8125"
"RENO-VERDI 1051"
"RENO-VERDI 1058"
"RENO-VERDI 2043"
"RENO-VERDI 2073"
"RENO-VERDI 3002"
"RENO-VERDI 3003"
"RENO-VERDI 3013 (MP)"
"RENO-VERDI 3025"
"RENO-VERDI 3027"
"RENO-VERDI 3039"
"RENO-VERDI 4015 (MP)"
"RENO-VERDI 4040"
"RENO-VERDI 5054"
"RENO-VERDI 7100"
"RENO-VERDI 7302"
"RENO-VERDI 7321"
"RENO-VERDI 7528"
"RENO-VERDI 7539 (MP)"
"RENO-VERDI 7549 (MP)"
"RENO-VERDI 7556"
"RENO-VERDI 7558"
"RENO-VERDI 7567"
"RENO-VERDI 8115"
"RENO-VERDI 8127"
"RENO-VERDI 8128"
"RENO-VERDI 8212"
"RENO-VERDI 8218"
"RENO-VERDI 8226"
"RENO-VERDI 8265"
"SPARKS 6503"
"SPARKS 7315"
"SPARKS 7317"
"SPARKS 7400 (MP)"
"SPARKS 7403"
"SPARKS 7406"
"SPARKS 7413"
"SPARKS 7428 (MP)"
"SPARKS 7509"
"SPARKS 7518 (MP)"
"SPARKS 7583"
"SPARKS 7584"
"SPARKS 7585"
"SPARKS 7594 (MP)"
WHITE PINE: Precinct 88
Thanks Zachary. So I think we'll solve this if we update the totals in CVR_parquet/returns/by-county
with numbers that are not contaminated by the "-1s". @sbaltzmit do you have a Nevada file that has county-candidate-level totals like this? Same format as https://github.com/kuriwaki/cvr_harvard-mit_scripts/issues/160#issuecomment-2150279052 would be great.
What is the format that you're looking for -- the total number of votes that each candidate received in each county in Nevada, with the * in the raw data replaced by 0 instead of by -1?
@sbaltzmit "total number of votes that each candidate received in each county in Nevada" is exactly right. E.g. data with one row for Joe Biden in Elko County, NV --- hopefully, if the CVR is accurately, that would shows 4566 votes?
"What is the format that you're looking for -- the total number of votes that each candidate received in each county in Nevada, with the * in the raw data replaced by 0 instead of by -1?"
I've been manually replacing the -1s where I can in a spreadsheet of NV precinct returns that I'll be sending to you. All of our NV counties should be fixed after that except for Elko. I'm working on Washoe this afternoon; afterwards, I'll be ready to send it to you. (Or I can send the everything-but-Washoe-and-Elko spreadsheet sooner, if that's better.)
Ok, I didn't know that was in the works, great. Yes, Zachary if you already have that almost done, please post it here next week once you think you have all the Nevada counties. Then I can merge it in like Mason did for New Mexico.
Alright, the spreadsheet is here. Hopefully that resolves most counties' issues (besides Elko's).
nv_updated_county-candidate_totals.csv
nv_updated_county-candidate_totals.csv should be attached (let me know if it isn't), and contains the corrected precinct data.
nv_precinct_data_UPDATED_0627_2.csv should also be attached (I'm wary of this one not loading), which contains the pre-aggregation precinct-level returns for these NV counties, with the '-1' masked entries corrected. nv_precinct_data_UPDATED_0627_2.csv
Great. I compared your first file's numbers in Douglas with the CVR counts, and they seem to match exactly (unlike the prior Baltz et al. data).
@zdj-garai can you just explain here where/how you got those numbers in nv_updated_county-candidate_totals.csv? From each official county websites?
I used the CVR data we have to go county by county, adding up the number of votes for each candidate in each precinct. The counties did their due diligence and properly masked the data, but the CVR data was not masked, making this possible.
I see, so your uploaded counts are constructed by the CVR itself. I think that means that your counts are not a falsifiable test of whether the CVR counts are complete and correct? It's instead taking it for granted that the CVR data is complete and correct.
I wonder if there is evidence you found that indicates "the CVR data was not masked"? And I think we do need to check the county official websites if they reported an unmasked county-level count?
On Thu, Jun 27, 2024 at 2:43 PM zdj-garai @.***> wrote:
I used the CVR data we have to go county by county, adding up the number of votes for each candidate in each precinct. The counties did their due diligence and properly masked the data, but the CVR data was not masked, making this possible.
— Reply to this email directly, view it on GitHub https://github.com/kuriwaki/cvr_harvard-mit_scripts/issues/39#issuecomment-2195447057, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7IA4MYIIJJYTP4YVKMP33ZJRMNJAVCNFSM6AAAAABHLSGIMCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJVGQ2DOMBVG4 . You are receiving this because you commented.Message ID: @.***>
Eek! That was actually the concern I had—it's a little circular, but the data all matches, and it's unlikely that the raccoon army constructed thousands of rows of data that aggregate up to the county level in the proper format; of course, it's still not as falsifiable as I'd like. Unfortunately we're slammed with election-related work at MEDSL and I've been instructed to dedicate my focus in the upcoming weeks to writing and polishing our reports so we can publish them promptly, so I won't be able to work more on CVR validation/cleaning for the next few weeks.
No worries. @sbaltzmit do you have a file described in https://github.com/kuriwaki/cvr_harvard-mit_scripts/issues/39#issuecomment-2168430500 already floating around? Basically a county-candidate file that is taken from official reports and not contaminated by redaction.
If not, someone could go spend an hour manually entering the county x candidate counts in each of the following districts from https://www.nvsos.gov/silverstate2020gen/USPresidential/ into our standard format. I just checked Douglas county on that page and it looks like their counts exactly match the CVR counts (nice). In other words, it does appear that the CVR is unredacted.
These are the districts that are contained in our non-green counties.
office district
1 US PRESIDENT FEDERAL
2 US HOUSE 002
3 US HOUSE 004
4 STATE SENATE 019
5 STATE SENATE 015
6 STATE HOUSE 038
7 STATE HOUSE 039
8 STATE HOUSE 033
9 STATE HOUSE 032
10 STATE HOUSE 036
11 STATE HOUSE 024
12 STATE HOUSE 025
13 STATE HOUSE 026
14 STATE HOUSE 027
15 STATE HOUSE 030
16 STATE HOUSE 031
17 STATE HOUSE 040
Here is the current distribution of Nevada counties:
1 0 difference 3
2 any < 1% mismatch 4
3 any < 5% mismatch 6
4 red 2
I'm afraid I only have it for US PRESIDENT. Our county-level numbers would have come from the state shortly after the election
ok, let's expand that, with a focus on the 1%/5% counties. Thanks @taransamarth for taking this on.
@kuriwaki, here's the results for all USH races + the State Senate + House races you flagged: nv_res_complete.csv
Thanks @taransamarth and everyone -- this looks good and Washoe + White Pine will be moving to <1% because of this. Taran, the file looked good but it had a duplicate entry for Sena Loyd at the end. I edited one of her counties to Carson City, and put it in the Dropbox returns/raw/nv_res_complete.csv
Issues
Carson City Libertarian Presidential Candidate Jo Jorgensen’s name and vote count are missing from MEDSL data. Harvard and precinct data vote totals confirmed by the state website.
Douglas Vote counts in the precinct dataset do not match the others. Harvard and MEDSL vote totals were confirmed by the state website.
Elko Vote counts in the precinct dataset do not match the others. The state website reports different vote totals than either Harvard, MEDSL, or precinct data.
Eureka Vote counts for President in the precinct dataset do not match the others. Harvard and MEDSL vote totals were confirmed by the state website.
Lincoln Republican State House Candidate Gregory T Hafen II missing from the precinct data. Harvard and MEDSL data vote totals confirmed by the state website.
Nye Republican State House Candidate Gregory T Hafen II missing from the precinct data. Harvard and MEDSL data vote totals confirmed by the state website.
Vote counts for President, House, and State Senate in the precinct dataset do not match the others. Harvard and MEDSL data vote totals confirmed by the state website.
Storey Vote counts for all offices in the precinct dataset do not match the others. Harvard and MEDSL data vote totals confirmed by the state website.
Washoe Vote counts for all offices in the precinct dataset do not match the others. Harvard and MEDSL data vote totals confirmed by the state website.
White Pine Vote counts for President in the precinct dataset do not match the others. Harvard and MEDSL data vote totals confirmed by the state website.
Takeaway
Most of the discrepancies are a result of the precinct data being slightly off from the MEDSL and Harvard datasets. In most cases, the MEDSL and Harvard vote totals mirror the official county totals, and therefore can be presumed correct. Additionally, undervotes are not reported in the precinct dataset, which accounts for some discrepancies.