kuriwaki / cvr_harvard-mit_scripts

6 stars 1 forks source link

[CA] San Diego County #305

Closed kuriwaki closed 2 months ago

kuriwaki commented 2 months ago

I see this is listed as "delim" and not found yet. Maybe @aconevska was getting to this, but I also took a quick look.

Harvard's candidate-level counts match exactly to the digit. MEDSL's version overcounts by about 36 votes, which is about 0.002% in this big state.

It does not look good to show we are over-counting, so I would recommend use Harvard's version unless it gets resolved in MEDSL's next run.

kuriwaki commented 2 months ago

@aconevska has more tidbits to share here because she looked at this county closely for her other paper

mreece13 commented 2 months ago

Hmmm, I've identified the issue on my end. It looks like San Diego inexplicably repeats the header randomly in the middle of the file and this is adding 12 votes to everything. Let me write a little unix script to remove those rows, which I can hopefully use on some other counties that I've seen this issue for as well (I didn't think it was causing an issue but seems that it is).

aconevska commented 2 months ago

@mreece13 Ahh, ok, that sounds like it makes sense. I'm going through the raw file ("cvr.csv") manually to see if I can identify that on my end as well.

If for some reason there are still excess votes after you remove those rows, let me know and I can do more checks.

mreece13 commented 2 months ago

I think it is fine in your data, the raw totals add up to your numbers? It is probably related to how we identify 'votes' with these files that use the 1/0 standard. I have only been identifying the converse, i.e. when the cell is 0 (or something related to privacy redaction) and then treating everything else as a vote. It's been a bit tricky to write a parser that works for both 1/0 standards and one's where it is just the candidate's name.

Also, the issue begins at row 294481

mreece13 commented 2 months ago

Okay I've fixed this for future runs, it should be ready for the next build.