openva / crump

A parser for the Virginia State Corporation Commission's business registration records.
https://vabusinesses.org/
MIT License
20 stars 3 forks source link

Modify parser to ingest new CSV format #114

Open waldoj opened 8 years ago

waldoj commented 8 years ago

As of August 1, the SCC is publishing CSV instead of their gnarly fixed-width format. As of October 31, they will no longer publish the old format. So that provides a 3-month window in which to upgrade Crump to use CSV. That ought to mostly involve simplifying it a great deal.

waldoj commented 8 years ago

Next up: modify the file iterator so that, instead of ingesting them naively, they are ingested as CSV. Then modify:

line[name] = current_line[1][start:end].strip()

to instead strip and rename the field, but otherwise leave it alone.

waldoj commented 8 years ago

Now the problem is that CSV and JSON are both being streamed into a single file.