kuriwaki / cvr_harvard-mit_scripts

6 stars 1 forks source link

[NV] -1s for small precincts #39

Closed kmmurray108 closed 2 months ago

kmmurray108 commented 4 months ago

Issues

Carson City Libertarian Presidential Candidate Jo Jorgensen’s name and vote count are missing from MEDSL data. Harvard and precinct data vote totals confirmed by the state website.

Douglas Vote counts in the precinct dataset do not match the others. Harvard and MEDSL vote totals were confirmed by the state website.

Elko Vote counts in the precinct dataset do not match the others. The state website reports different vote totals than either Harvard, MEDSL, or precinct data.

Eureka Vote counts for President in the precinct dataset do not match the others. Harvard and MEDSL vote totals were confirmed by the state website.

Lincoln Republican State House Candidate Gregory T Hafen II missing from the precinct data. Harvard and MEDSL data vote totals confirmed by the state website.

Nye Republican State House Candidate Gregory T Hafen II missing from the precinct data. Harvard and MEDSL data vote totals confirmed by the state website.

Vote counts for President, House, and State Senate in the precinct dataset do not match the others. Harvard and MEDSL data vote totals confirmed by the state website.

Storey Vote counts for all offices in the precinct dataset do not match the others. Harvard and MEDSL data vote totals confirmed by the state website.

Washoe Vote counts for all offices in the precinct dataset do not match the others. Harvard and MEDSL data vote totals confirmed by the state website.

White Pine Vote counts for President in the precinct dataset do not match the others. Harvard and MEDSL data vote totals confirmed by the state website.

Takeaway

Most of the discrepancies are a result of the precinct data being slightly off from the MEDSL and Harvard datasets. In most cases, the MEDSL and Harvard vote totals mirror the official county totals, and therefore can be presumed correct. Additionally, undervotes are not reported in the precinct dataset, which accounts for some discrepancies.

kuriwaki commented 4 months ago

Great, so it seems like Elko, OH is the only county where our CVR data may be incompelte. In other cases, it's the precinct returns that are "incomplete" (or at least different from the state's website). I wonder why it's so systematic?

mreece13 commented 4 months ago

Resolved Carson City, NV. Otherwise looks good from the CVR perspective

zdj-garai commented 3 months ago

I've taken a look at all of these counties, and the problems are straightforward and usually consistent — the precinct data contains some precincts where candidates earn '-1' votes. Precincts numbered 88 frequently display this problem. I'm working on updating the data, but while it was quick to solve the problems with US PRESIDENT data, it's taking more time than expected for other contests and minor party candidates. I'll include a list here of the precincts that cause the discrepancies in the major-party presidential data.

Replacing the '-1' vote counts with the correct vote counts for Joe Biden and Donald Trump resolve discrepancies between the Harvard and Baltz et al. datasets—leading to exact matches.

I also note that NV is one of the states that masks vote totals for jurisdictions with few votes.

DOUGLAS: Precincts 30, 88

ELKO: Precincts 17, unclear where else

EUREKA: Precinct 88

NYE: Precinct 88, Mercury 08

STOREY: Precincts 11, 12

WASHOE: "GER-WADS 7589"
"INCLINE VILLAGE 8111" "INCLINE VILLAGE 8125" "RENO-VERDI 1051"
"RENO-VERDI 1058"
"RENO-VERDI 2043"
"RENO-VERDI 2073"
"RENO-VERDI 3002"
"RENO-VERDI 3003"
"RENO-VERDI 3013 (MP)" "RENO-VERDI 3025"
"RENO-VERDI 3027"
"RENO-VERDI 3039"
"RENO-VERDI 4015 (MP)" "RENO-VERDI 4040"
"RENO-VERDI 5054"
"RENO-VERDI 7100"
"RENO-VERDI 7302"
"RENO-VERDI 7321"
"RENO-VERDI 7528"
"RENO-VERDI 7539 (MP)" "RENO-VERDI 7549 (MP)" "RENO-VERDI 7556"
"RENO-VERDI 7558"
"RENO-VERDI 7567"
"RENO-VERDI 8115"
"RENO-VERDI 8127"
"RENO-VERDI 8128"
"RENO-VERDI 8212"
"RENO-VERDI 8218"
"RENO-VERDI 8226"
"RENO-VERDI 8265"
"SPARKS 6503"
"SPARKS 7315"
"SPARKS 7317"
"SPARKS 7400 (MP)"
"SPARKS 7403"
"SPARKS 7406"
"SPARKS 7413"
"SPARKS 7428 (MP)"
"SPARKS 7509"
"SPARKS 7518 (MP)"
"SPARKS 7583"
"SPARKS 7584"
"SPARKS 7585"
"SPARKS 7594 (MP)"

WHITE PINE: Precinct 88

kuriwaki commented 3 months ago

Thanks Zachary. So I think we'll solve this if we update the totals in CVR_parquet/returns/by-county with numbers that are not contaminated by the "-1s". @sbaltzmit do you have a Nevada file that has county-candidate-level totals like this? Same format as https://github.com/kuriwaki/cvr_harvard-mit_scripts/issues/160#issuecomment-2150279052 would be great.

sbaltzmit commented 3 months ago

What is the format that you're looking for -- the total number of votes that each candidate received in each county in Nevada, with the * in the raw data replaced by 0 instead of by -1?

kuriwaki commented 3 months ago

@sbaltzmit "total number of votes that each candidate received in each county in Nevada" is exactly right. E.g. data with one row for Joe Biden in Elko County, NV --- hopefully, if the CVR is accurately, that would shows 4566 votes?

"What is the format that you're looking for -- the total number of votes that each candidate received in each county in Nevada, with the * in the raw data replaced by 0 instead of by -1?"

zdj-garai commented 3 months ago

I've been manually replacing the -1s where I can in a spreadsheet of NV precinct returns that I'll be sending to you. All of our NV counties should be fixed after that except for Elko. I'm working on Washoe this afternoon; afterwards, I'll be ready to send it to you. (Or I can send the everything-but-Washoe-and-Elko spreadsheet sooner, if that's better.)

kuriwaki commented 3 months ago

Ok, I didn't know that was in the works, great. Yes, Zachary if you already have that almost done, please post it here next week once you think you have all the Nevada counties. Then I can merge it in like Mason did for New Mexico.

zdj-garai commented 3 months ago

Alright, the spreadsheet is here. Hopefully that resolves most counties' issues (besides Elko's).

zdj-garai commented 3 months ago

nv_updated_county-candidate_totals.csv

nv_updated_county-candidate_totals.csv should be attached (let me know if it isn't), and contains the corrected precinct data.

nv_precinct_data_UPDATED_0627_2.csv should also be attached (I'm wary of this one not loading), which contains the pre-aggregation precinct-level returns for these NV counties, with the '-1' masked entries corrected. nv_precinct_data_UPDATED_0627_2.csv

kuriwaki commented 3 months ago

Great. I compared your first file's numbers in Douglas with the CVR counts, and they seem to match exactly (unlike the prior Baltz et al. data).

@zdj-garai can you just explain here where/how you got those numbers in nv_updated_county-candidate_totals.csv? From each official county websites?

zdj-garai commented 3 months ago

I used the CVR data we have to go county by county, adding up the number of votes for each candidate in each precinct. The counties did their due diligence and properly masked the data, but the CVR data was not masked, making this possible.

kuriwaki commented 3 months ago

I see, so your uploaded counts are constructed by the CVR itself. I think that means that your counts are not a falsifiable test of whether the CVR counts are complete and correct? It's instead taking it for granted that the CVR data is complete and correct.

I wonder if there is evidence you found that indicates "the CVR data was not masked"? And I think we do need to check the county official websites if they reported an unmasked county-level count?

On Thu, Jun 27, 2024 at 2:43 PM zdj-garai @.***> wrote:

I used the CVR data we have to go county by county, adding up the number of votes for each candidate in each precinct. The counties did their due diligence and properly masked the data, but the CVR data was not masked, making this possible.

— Reply to this email directly, view it on GitHub https://github.com/kuriwaki/cvr_harvard-mit_scripts/issues/39#issuecomment-2195447057, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7IA4MYIIJJYTP4YVKMP33ZJRMNJAVCNFSM6AAAAABHLSGIMCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJVGQ2DOMBVG4 . You are receiving this because you commented.Message ID: @.***>

zdj-garai commented 3 months ago

Eek! That was actually the concern I had—it's a little circular, but the data all matches, and it's unlikely that the raccoon army constructed thousands of rows of data that aggregate up to the county level in the proper format; of course, it's still not as falsifiable as I'd like. Unfortunately we're slammed with election-related work at MEDSL and I've been instructed to dedicate my focus in the upcoming weeks to writing and polishing our reports so we can publish them promptly, so I won't be able to work more on CVR validation/cleaning for the next few weeks.

kuriwaki commented 3 months ago

No worries. @sbaltzmit do you have a file described in https://github.com/kuriwaki/cvr_harvard-mit_scripts/issues/39#issuecomment-2168430500 already floating around? Basically a county-candidate file that is taken from official reports and not contaminated by redaction.

If not, someone could go spend an hour manually entering the county x candidate counts in each of the following districts from https://www.nvsos.gov/silverstate2020gen/USPresidential/ into our standard format. I just checked Douglas county on that page and it looks like their counts exactly match the CVR counts (nice). In other words, it does appear that the CVR is unredacted.

These are the districts that are contained in our non-green counties.

   office       district   
 1 US PRESIDENT FEDERAL 
 2 US HOUSE     002     
 3 US HOUSE     004     
 4 STATE SENATE 019     
 5 STATE SENATE 015     
 6 STATE HOUSE  038     
 7 STATE HOUSE  039     
 8 STATE HOUSE  033     
 9 STATE HOUSE  032     
10 STATE HOUSE  036     
11 STATE HOUSE  024     
12 STATE HOUSE  025     
13 STATE HOUSE  026     
14 STATE HOUSE  027     
15 STATE HOUSE  030     
16 STATE HOUSE  031     
17 STATE HOUSE  040  

Here is the current distribution of Nevada counties:

1 0 difference          3
2 any < 1% mismatch     4
3 any < 5% mismatch     6
4 red                   2
sbaltzmit commented 3 months ago

I'm afraid I only have it for US PRESIDENT. Our county-level numbers would have come from the state shortly after the election

nv_res.csv

kuriwaki commented 3 months ago

ok, let's expand that, with a focus on the 1%/5% counties. Thanks @taransamarth for taking this on.

taransamarth commented 2 months ago

@kuriwaki, here's the results for all USH races + the State Senate + House races you flagged: nv_res_complete.csv

kuriwaki commented 2 months ago

Thanks @taransamarth and everyone -- this looks good and Washoe + White Pine will be moving to <1% because of this. Taran, the file looked good but it had a duplicate entry for Sena Loyd at the end. I edited one of her counties to Carson City, and put it in the Dropbox returns/raw/nv_res_complete.csv