openelections / openelections-core

Core repo for election results data acquisition, transformation and output.
MIT License
176 stars 96 forks source link

Fix bad Washington files #212

Closed ghing closed 10 years ago

ghing commented 10 years ago

@EricLagerg clearly documented files that can't be loaded because they seem to contain HTML error page output or that are unwieldy Excel spreadsheet. Try to fix the URLs (likely in the datasource) so that real data can be downloaded and explicitly document the nasty Excel files.

ghing commented 10 years ago

These CSV files all have the same problem:

20090818waprimarypiercecounty.csv 20090818waprimaryferrycounty.csv 20090818waprimarywahkiakumcounty.csv 20090818waprimarywhatcomcounty.csv 20090818waprimary__pend_oreillecounty.csv 20090818waprimarykitsapcounty.csv 20090818waprimarykittitas__county.csv

Output stored in the file says:

<html><head><title>Object moved</title></head><body>
<h2>Object moved to <a href="%2fresults%2f20090818%2fexport%2f20090818_Kittitas.csv">here</a>.</h2>
</body></html>

When I try to grab the URL with wget, it seems like there's a redirect loop:

wget http://vote.wa.gov/results/20090818/export/20090818_Pierce.csv
--2014-10-08 18:56:17--  http://vote.wa.gov/results/20090818/export/20090818_Pierce.csv
Resolving vote.wa.gov (vote.wa.gov)... 64.146.248.150
Connecting to vote.wa.gov (vote.wa.gov)|64.146.248.150|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: /results/20090818/export/20090818_Pierce.csv [following]
--2014-10-08 18:56:18--  http://vote.wa.gov/results/20090818/export/20090818_Pierce.csv
Reusing existing connection to vote.wa.gov:80.
HTTP request sent, awaiting response... 302 Found
Location: /results/20090818/export/20090818_Pierce.csv [following]
--2014-10-08 18:56:18--  http://vote.wa.gov/results/20090818/export/20090818_Pierce.csv
Reusing existing connection to vote.wa.gov:80.
HTTP request sent, awaiting response... 302 Found
...
20 redirections exceeded.

I'm going to check and make sure there aren't different URLs that are accessible from website, but otherwise, it seems like the path forward on these is to contact the Washington Secretary of State and let them know their webserver is misconfigured.

ghing commented 10 years ago

I took a look at http://vote.wa.gov/results/20090818/Export.html and found that there aren't entries for any of the counties corresponding to the files listed above:

ghing commented 10 years ago

Emailed Nick about the missing counties.

ericlagergren commented 10 years ago

Perhaps I should talk to my boss about sponsoring a bill that reforms election data in Washington state?

Anyway, should we try to find a way to transform the unwieldy Excel data before we load it into the db? It'd definitely have to be on a per-file basis, as some of the county auditors had a little way too much creative license with the file formatting.

ghing commented 10 years ago

I got a response from Nick about those missing CSVs from 2009:

Those counties did not hold a Primary in 2009. (Kitsap County did not participate in our elections database at that time, so there may have been local contests there that were not entered into the database. The same is true for King and Yakima Counties, but they do appear in the results because they had state-level contests that year.)

The only state-level contests on the Primary ballot in 2009 were 3 unexpired legislative terms and one unexpired Court of Appeals term. Local contests do not have a Primary unless at least two candidates file.

I'll update the datasource to not generate mappings for these counties for this election.

ghing commented 10 years ago

Closing. The datasource has been updated to not create mappings for the counties that don't have results for a particular election. I've created a separate issues for the results that need one-off loader classes as that's going to be a long, slow row to hoe. The issue is https://github.com/openelections/core/issues/225.