openva / crump

A parser for the Virginia State Corporation Commission's business registration records.
https://vabusinesses.org/
MIT License
20 stars 3 forks source link

Fix 9999-99-99 expiration dates #88

Closed waldoj closed 9 years ago

waldoj commented 9 years ago

We're getting a bunch of 9999-99-99 dates, yielding this reasonable Elasticsearch error:

status":400,"error":"MapperParsingException[failed to parse [expiration_date]]; nested: MapperParsingException[failed to parse date field [9999-99-99], tried both date format [dateOptionalTime], and timestamp number with locale []]; nested: IllegalFieldValueException[Cannot parse \"9999-99-99\": Value 99 for monthOfYear must be in the range [1,12]]

I'm pretty sure it's a result of dfe720b617b3cb18cc91f812a1a7089036301e1b. Casting everything as strings is going poorly. Perhaps the solution is to cast floats as strings. Or, better, maybe we just need to do this:

line['coordinates'] = [str(coordinates[1]), str(coordinates[0])]

and skip this business of casting everything as a string.

waldoj commented 9 years ago

Nope, that didn't fix it. We're still getting thousands (38,070 in all) of entries like this:

T059659,"PEL ENGINEERS & CONSULTANTS, PLLC",ACTIVE,2015-01-06,9999-99-99,2015-01-06,NORTH CAROLINA,70,225 ELMCREST DR,,HOLLY SPRINGS,NC,27540,,,INCORP SERVICES INC,7288 HANOVER GREEN DR,,MECHANICSVILLE,VA,23111,2015-01-06,5,HANOVER COUNTY

That field is expiration_date.

waldoj commented 9 years ago

Yes, e48aaa2 fixed things.