This is documentation of my efforts to figure out what the hell is going on with USPS collection boxes. The theory and hope here is that, if a collection box is removed, that removal will eventually be reflected in the data provided by USPS.
Please please please, if you have the skills and desire to make something good out of what's posted here, do so! Map it out, analyze it, etc. etc.
I made a request to USPS in August 2019 for a list of all collection boxes and post offices. I used that data to populate my website MailboxLocate. They kindly responded with two spreadsheets containing both those lists.
That response is available to download here: foia_2019-09-27.zip
I'm running a regular scrape of the USPS PO Locator. This scrape produces a zip file containing approximately 32,000 JSON documents. If you want to use the JSON files directly, note that a collection box may appear in multiple files. You need to de-dup the collection boxes using the locationID.
I hacked up a script parse_all_collection_boxes.rb which converts the mess of JSON files into a CSV file. For consistency, the CSV file uses the same column headers as the coll report.xlsx
file from the FOIA data.
date | json | csv |
---|---|---|
2020-08-15 | collection_boxes_2020-08-15.zip | collection_boxes_2020-08-15.csv |
2020-08-17 | collection_boxes_2020-08-17.zip | collection_boxes_2020-08-17.csv |
2020-08-19 | collection_boxes_2020-08-19.zip | collection_boxes_2020-08-19.csv |
I hacked up a script diff.rb to compare the collection boxes in the FOIA dataset vs the scraped one. See the script for details regarding the algorithm. You can download the results here:
removed_2019_to_2020.csv collection boxes that appear in the FOIA dataset but not the 8/15/2020 scraped dataset
added_2019_to_2020.csv collection boxes that appear in the 8/15/2020 scraped dataset but not the FOIA dataset
Below is an analysis showing the difference in # of collection boxes between the FOIA and 8/15/2020 scraped data. There is a large drop in the number of reported boxes in DC. This is because many of the boxes in DC are not currently listed in the USPS PO Locator tool (maybe due to a software error with that tool?)
state | 2019 count | 2020 count | change | change pct |
---|---|---|---|---|
DC | 792 | 115 | -677 | -85% |
AK | 424 | 395 | -29 | -7% |
OR | 2618 | 2445 | -173 | -7% |
NE | 2131 | 2063 | -68 | -3% |
MT | 1447 | 1404 | -43 | -3% |
CA | 16199 | 15729 | -470 | -3% |
ND | 932 | 905 | -27 | -3% |
WA | 3630 | 3531 | -99 | -3% |
RI | 991 | 967 | -24 | -2% |
UT | 1128 | 1104 | -24 | -2% |
MA | 7815 | 7649 | -166 | -2% |
MD | 5092 | 4992 | -100 | -2% |
VA | 4966 | 4870 | -96 | -2% |
NM | 1480 | 1453 | -27 | -2% |
GA | 3763 | 3695 | -68 | -2% |
OH | 8883 | 8737 | -146 | -2% |
NH | 1584 | 1558 | -26 | -2% |
KY | 3350 | 3296 | -54 | -2% |
MN | 4488 | 4417 | -71 | -2% |
AZ | 2590 | 2553 | -37 | -1% |
NJ | 6344 | 6254 | -90 | -1% |
VI | 71 | 70 | -1 | -1% |
WI | 4306 | 4249 | -57 | -1% |
TX | 9863 | 9745 | -118 | -1% |
FL | 6104 | 6031 | -73 | -1% |
NV | 1020 | 1008 | -12 | -1% |
IN | 4492 | 4445 | -47 | -1% |
HI | 615 | 609 | -6 | -1% |
IL | 10869 | 10769 | -100 | -1% |
KS | 2483 | 2461 | -22 | -1% |
WY | 524 | 520 | -4 | -1% |
PR | 850 | 844 | -6 | -1% |
VT | 1095 | 1089 | -6 | -1% |
NY | 18273 | 18173 | -100 | -1% |
AR | 2824 | 2810 | -14 | 0% |
TN | 2721 | 2709 | -12 | 0% |
IA | 3667 | 3652 | -15 | 0% |
PA | 13378 | 13324 | -54 | 0% |
ME | 2026 | 2018 | -8 | 0% |
MI | 6511 | 6487 | -24 | 0% |
LA | 2282 | 2274 | -8 | 0% |
OK | 2607 | 2598 | -9 | 0% |
WV | 2630 | 2621 | -9 | 0% |
CO | 3223 | 3212 | -11 | 0% |
AL | 2937 | 2927 | -10 | 0% |
NC | 4873 | 4857 | -16 | 0% |
MS | 1766 | 1761 | -5 | 0% |
ID | 1093 | 1090 | -3 | 0% |
MO | 4927 | 4918 | -9 | 0% |
CT | 2847 | 2844 | -3 | 0% |
DE | 618 | 618 | 0 | 0% |
SD | 964 | 964 | 0 | 0% |
SC | 2286 | 2290 | 4 | 0% |
I set up a scheduled job to regularly scrape the collection box data. I will update this document as more files become available.
I don't own any of this data. So, I have no right to put restrictions on their use. I also have no desire to put restrictions on their use. Have fun!
As for the source code and this documentation: This project is licensed under the terms of the MIT license.