Closed ghing closed 10 years ago
@dwillis, @zstumgoren, unlike West Virginia, the Arkansas PDFs aren't nicely formatted into a single table. Here's an example for the 2002 primary.
There would be a lot of copying and pasting because there's at least one table per contest/per county.
pdttotext -layout
gives us something we can work with, but the parsing code would be non-trivial. Should this be done in a preprocessing script with the output in a separate git repo, or just done as part of the loader?
My preference would be for a preprocessing script with the output in a separate git repo, because it definitely isn't trivial. Putting it in the loader seems to me to expand the loader's duties pretty far.
@dwillis That's fine by me. However, I don't think it's terribly outside of the loader's duties because some vintages of Maryland CSVs had similarly annoying layouts that had to be parsed. I'll do this as a separate script. Moving the code into the loader shouldn't be too hard if we decide to go that route later.
+1 to preprocessing script approach. Can we put these scripts inside the new repo (e.g. , rather than in core? This way the code lives side-by-side with the data it generates.
@zstumgoren, I was definitely planning on putting the script inside the data repo.
First pass of this is done as of https://github.com/openelections/openelections-data-ar/commit/6d54310fd5638723ce2f78f81922eda36dfa8733
When done, add CSVs to mappings and adjust/implement Datasource.filename_url_pairs/Datasource.unprocessed_filename_url_pairs accordingly.
Files are: