mattyschell / mobilelatlong2cloud

0 stars 0 forks source link

what is the least bad plan for ZIP codes? #11

Closed mattyschell closed 2 days ago

mattyschell commented 1 month ago

The US Postal Service assigns Zone Improvement Program codes to USPS addresses (point locations). Because no one publishes official ZIP Code polygons this dataset is a particularly poor choice for use in a reverse geocoder which is fundamentally a point-in-polygon operation.

As of 2024 AD the ZIP Code values returned by DCP's Geosupport are based on CSCL address points. Those address points take their ZIP Codes from CSCL Centerlines, which DCP maintains by internal processes.

(edit 20240813: this next paragraph is false) The ZIP Code polygons in CSCL are no longer tied to any NYC reality. CSCL ZIP code polygons are not official. ZIP Codes have been scrubbed from NYC Open data. The defunct ZIP Codes in CSCL are intended mainly for visual cartography.

Note that if we do use CSCL ZIP codes in the mobilelatlong project some polygons overlap. This same condition exists in the legacy application. Presumably the application has some mechanism for de-duplicating a point-in-polygon that returns 2 ZIP Codes.

ZIP Code 10005 CONTAINS ZIP Code 10043

image

One alternative is to use US Census Bureau ZIP Code Tabulation Areas. (ZCTAS)

@mlipper in between ruthlessly dunking on ZIP Codes I am also asking you to review and make a decision here. From my perspective there is no right or wrong answer. Thanks

mattyschell commented 1 month ago

Test if a point inside 10043 returns one or two zip codes:

select 'ZIP CODE | ' || a.feature_value from geo_districts a where a.layer_name = 'ZIP CODE' and ST_Contains(a.geom, (select ST_SETSRID(ST_GeomFromText('POINT(982296 195837)'),2263)

As of today this returns

ZIP CODE | 10005 ZIP CODE | 10043

Consider at least spatial differencing the shapes where we have overlaps. In this location for example only 10043 should be returned.

mattyschell commented 1 month ago

added zctas in https://github.com/mattyschell/mobilelatlong2cloud/commit/8b88d527c8c50972877883649578ac0d61223c21

mattyschell commented 1 month ago

It is good to state clearly that ZIP codes are a nonsensical value to return in a reverse geocoder. That stated, zctas a pretty close.

There are differences around the "edges" of zcta polygons as expected. zctas aggregate up statistically from lower level geographies that do not follow ZIP code boundaries.

A major visual difference is Central Park. Historically the mobile_latlong reverse geocoder has returned CSCL ZIP code 00083. This is a totally bogus made up ZIP code.

image

NYC geocoders currently return 10000 for Central Park. 10000 is also a totally bogus made up ZIP code.

image

Here are zctas: image

mlipper commented 1 month ago

I believe Tom S. paid for an "official" USPS (or some organization) dataset during his brief tenure as GIS Overlord. I think @cchendoitt knows the details.

mattyschell commented 1 month ago

You are probably thinking of Melissa data. We sometimes use commercial data like Melissa to update "subaddresses" in CSCL lingo. I wonder if DCP also uses Melissa data as a source for addresspoint ZIP codes?

https://github.com/mattyschell/cscl-subaddress-adhoc

In any case this is a related, but different, issue.

mlipper commented 1 month ago

Hmmm. Sad. Normally, I'd ask @mattyschell but I guess that won't work here. Mohamed, Yitz, Rodrigo, Wilford Brimley?

mattyschell commented 1 month ago

Thanks for taking the time to link to Wilford Brimley's imdb page. Good clarification.

From my perspective this is not an issue about data sources. Instead I'm asking the application architect to reckon with the statement:

ZIP codes are a nonsensical value to return in a reverse geocoder

mattyschell commented 4 weeks ago

According to the experts at NYC Dept of City Plannning:

By default, the centerline tool picks up zip code from the underlying zip code polygon layer in CSCL. To check on current data, we use to run a comparison to Melissa Data. It's been a few years since we did a comparison.

In which case we should simply continue to use the defunct CSCL ZIP codes for the mobile_latlong project.

mattyschell commented 2 days ago

The consumers of the service no longer require ZIP codes. Closing.