seanpianka / Zipcodes

A simple library for querying U.S. zipcodes.
MIT License
78 stars 15 forks source link

Update zipcodes db #14

Closed TJBANEY closed 3 years ago

TJBANEY commented 4 years ago

This updates zips.json to one retrieved on August of 2020.

seanpianka commented 4 years ago

Hi @TJBANEY and thanks for the PR. I'm hesitant to merge this update since users have experienced reliability issues in the past with data from [1], see #3 and #4. I patched this by combining data from [1] and [2], specifically merging the longitude/latitude data from [2] into [1].

The scripts to perform this update are located under /ci/data, and if your PR could include these modifications, I could approve and merge this.

I filed #7 as a way to track the work of automating this multi-step data reliability changes, but I haven't gotten time to figure the rest of it out.

[1] https://www.unitedstateszipcodes.org [2] https://worldpostalcode.com/united-states/

wang-yinan commented 3 years ago

Commenting here as I'm interested in updating the dataset (as well as for my own use). How did you extract the geocodes for source [2]? @seanpianka

seanpianka commented 3 years ago

I invoke this file [0] directly and it loads the data from ci/data and combines them both into the final dataset that is used by the library.

[0] defines a dict that holds the "schema" the JSON returned by library (query) calls. The dict's keys are the field name from the transformed dataset, and the value is a dict that contains the "public" field name in this library's API, along with an optional pre-processing function for the transformation.

As mentioned above, the data comes from two different datasources. We need to download the recent versions of those, place them with the same final names into ci/data, and run the script. Once that's done, I can release a new version.

wang-yinan commented 3 years ago

@seanpianka got it. Was looking for instructions on getting the geocode csv from that second link (https://worldpostalcode.com/united-states/), as it seems to just link to a website with lookup capabilities but no links for a csv download and/or export.

seanpianka commented 3 years ago

I merged the original dataset with more accurate geolocation data from 2019 into a download of the main [1] zipcode dataset from 3 Oct. 2021. It was merged by running the dataset generation script:

$ python scripts/build_zipcode_dataset.py
("GPS Keys: ['ZipCode', 'City', 'State', 'Latitude', 'Longitude', "
 "'Classification', 'Population']")
("Base Keys: ['zip', 'type', 'decommissioned', 'primary_city', "
 "'acceptable_cities', 'unacceptable_cities', 'state', 'county', 'timezone', "
 "'area_codes', 'world_region', 'country', 'latitude', 'longitude', "
 "'irs_estimated_population']")
Updated GPS from GPS CSV in 0.026109933853149414 seconds.
Writing zipcode information for 42724 places
To zip for production, run:
$ bzip2 zips.json

This has been released as a 1.2.0.