Convert Arkansas data from PDF to CSV

openelections / openelections-core

Core repo for election results data acquisition, transformation and output.

MIT License

176 stars 96 forks source link

Convert Arkansas data from PDF to CSV #138

Closed ghing closed 10 years ago

ghing commented 10 years ago

When done, add CSVs to mappings and adjust/implement Datasource.filename_url_pairs/Datasource.unprocessed_filename_url_pairs accordingly.

Files are:

us/ar/cache/20020521arprimary.pdf
us/ar/cache/20020611arprimary_runoff.pdf
us/ar/cache/20021105argeneral.pdf

ghing commented 10 years ago

@dwillis, @zstumgoren, unlike West Virginia, the Arkansas PDFs aren't nicely formatted into a single table. Here's an example for the 2002 primary.

There would be a lot of copying and pasting because there's at least one table per contest/per county.

pdttotext -layout gives us something we can work with, but the parsing code would be non-trivial. Should this be done in a preprocessing script with the output in a separate git repo, or just done as part of the loader?

dwillis commented 10 years ago

My preference would be for a preprocessing script with the output in a separate git repo, because it definitely isn't trivial. Putting it in the loader seems to me to expand the loader's duties pretty far.

ghing commented 10 years ago

@dwillis That's fine by me. However, I don't think it's terribly outside of the loader's duties because some vintages of Maryland CSVs had similarly annoying layouts that had to be parsed. I'll do this as a separate script. Moving the code into the loader shouldn't be too hard if we decide to go that route later.

zstumgoren commented 10 years ago

+1 to preprocessing script approach. Can we put these scripts inside the new repo (e.g. , rather than in core? This way the code lives side-by-side with the data it generates.

ghing commented 10 years ago

@zstumgoren, I was definitely planning on putting the script inside the data repo.

ghing commented 10 years ago

First pass of this is done as of https://github.com/openelections/openelections-data-ar/commit/6d54310fd5638723ce2f78f81922eda36dfa8733