seanpianka / Zipcodes

A simple library for querying U.S. zipcodes.
MIT License
78 stars 15 forks source link

Update the zipcode database monthly #7

Open seanpianka opened 4 years ago

seanpianka commented 4 years ago

Currently, the zipcode database can be out-of-sync because no one has made manual updates to the zipcodes.json data-file (which contains the zipcode data available in this package).

Goal: When https://www.unitedstateszipcodes.org releases an updated zipcode dataset, create a new release of this packages with the updated dataset.

Solution: Create a cronjob to perform the following steps monthly.

$ git clone https://github.com/seanpianka/Zipcodes
$ cd Zipcodes
$ python ci/__init__.py
$ bzip2 zips.json
$ mv zips.json.bz2 zipcodes/
$ bash scripts/get-next-patch-version "${current_version}"
$ bash scripts/create-new-python-wheel-release
$ bash scripts/add-to-git-and-publish-to-pypi
kenvenner commented 2 years ago

@seanpianka - are you looking for a volunteer to create a autoamted job to run the steps described above and push a PR to the repo each month with a new/updated zip code database. If you are - i could instrument this most likely and deliver this. As I rean into an issue just now that the DBMS is out of date - a zipcode is failing that I assume would pass if this library/tool was current. Let me know

Ken

seanpianka commented 2 years ago

Yes, I'm certainly open to pull requests that can automate this! As you know, it's important that it's updated regularly, but I don't have time to do so manually. A GitHub Actions pipeline that does this would be a great help!

kenvenner commented 2 years ago

Great - i assume you are pulling the source data from USPS as an individual - the free version? I will plan on doing the same

kenvenner commented 2 years ago

the ci folder does not appear to be checked in to the repo? python ci/init.py

seanpianka commented 2 years ago

Yes, that's the db I've used the last few times. Additionally, the script for building the dataset merges in GPS data (lat/lon) from a separate dataset focused on GPS accuracy.

This script can be found in scripts/, I think I removed the ci/ folder in a recent commit.

kenvenner commented 2 years ago

There are two data sources in your scripts:

https://www.unitedstateszipcodes.org/zip-code-database/ is obtained from https://www.unitedstateszipcodes.org/zip-code-database/# and is loaded in base_zipcodes_filename = "scripts/data/zip_code_database.csv"

not sure what the data source is for this file: gps_zipcodes_filename = "scripts/data/zip-codes-database-FREE.csv"

Can you provide me where this file comes from?

seanpianka commented 2 years ago

I am honestly not sure where I downloaded this from, and I neglected to document this anywhere it seems.

The goal here is to have an alternate zipcode dataset that we can use to update/override the lat/lon values in the unitedstateszipcodes.org dataset. The following sources should be suitable enough for this purpose:

https://www.uszipcodeslist.com/ https://simplemaps.com/data/us-zips

In the script to generate the final dataset, it makes a best-effort attempt to update the existing zipcodes with available lat/lon data from the other dataset. If one dataset does not include a zipcode present in the other dataset, it is fine to simply skip that value and leave the lat/lon data as-is.