mattyschell / cscl-subaddress-matched

Creative Commons Zero v1.0 Universal
0 stars 0 forks source link

What do NULL Melissa suites indicate? #1

Closed mattyschell closed 3 years ago

mattyschell commented 3 years ago

Determine the meaning of NULL melissa suites.

Case 1, probably ignore that 6th row. But why is it there? Does it suggest that there exists a valid unit-free addressable location here?

ADDRESS SUITE ADDRESSPOINTID
10102 95th Ave Apt 1F 219181
10102 95th Ave Apt 1R 219181
10102 95th Ave Apt 2A 219181
10102 95th Ave Apt 2F 219181
10102 95th Ave Apt 2R 219181
10102 95th Ave NULL 219181

Case 2, does this mean delete any subaddresses associated with this address, if subaddresses exist? This is the only record for the address.

ADDRESS SUITE ADDRESSPOINTID
1506 Commonwealth Ave 2044259
mattyschell commented 3 years ago

Summary statistics that sorta indicate this interpretation of NULL melissa suites is correct:

654,702 -- Number of NULL suites in melissa data that have no matching address point in subaddress

1,994 -- Number of NULL suites in melissa data with a matching subaddress. This smallish number of subaddresses would be deleted

mattyschell commented 3 years ago

Counterpoint: Looking at a few of the 1,994 addresses they look like they really do have subaddress units and the melissa data is wrong.

Address point id 21158 with 32 subaddresses, Apt L1 through 4H, looks accurate. Why does melissa deliver this address as having no units?

image

mattyschell commented 3 years ago

Case 1: According to the experts the null melissa suites are the "base address" for a group of units and should be completely ignored.

Case 2: We will attempt to get the original melissa data prior to geocoding. It is possible that this is an error in geocoding and the melissa data includes the units. Overall, however, most signs point to ignoring NULL melissa suites in this case as well.

mattyschell commented 3 years ago

Case 2: These are not errors in geocoding, melissa data truly says there are no units for these addresses. I think there may be a combination of edge cases and bad data behind these 1,994 addresses where NYC has units and Melissa says nope.

  1. NYC has units but they are unoccupied. See for example address point 58981, 11207 Queens Blvd. NYC says lots of units, Melissa says none, a quick glance at this location and its a big complex maybe no one lives in it?

  2. NYC has units but they are mistakes, like lots of duplicate NULL units. See for example address point 10171675, 13536 Roosevelt Ave, where NYC has 7 NULL subaddresses associated with this address.

mattyschell commented 3 years ago

The experts suggest another line of inquiry - review Melissa deliveries from prior years to see if these same (or similar) 1,994 "case 2" addresses are without units in the deliveries.

mattyschell commented 3 years ago

After the 2020 update and some cleanup of errors in subaddress the count is down slightly to 1845. 1,845 melissa addresses indicate no units at an address where cscl says subaddresses do in fact exist.

mattyschell commented 3 years ago

Same pattern in the 2019 Melissa delivery. Most (close to all) of the same addresses appear without suites in the geocoded melissa data while very definitely having subaddresses in CSCL.

Below observe the facts without biases of the head or heart. Determine the arc's path, stroll leisurely to its terminus and the truth will fall at our feet.

mattyschell commented 3 years ago

Addresspoint id 2011474 with 35 subaddresses at 922 Southern Blvd, Bronx.

image

image

mattyschell commented 3 years ago

Addresspoint id 2001794 with 11 subaddresses at 2817 3rd Ave, Bronx.

image

image

mattyschell commented 3 years ago

Addresspoint id 6345 with 2 subaddresses at 3912 Crescent St, Long Island City.

image

image

mattyschell commented 3 years ago

Addresspoint id 127246 with 12 subaddresses at 13324 41st Ave, Flushing, Queens

image

image

mattyschell commented 3 years ago

Conclusion: NULL Melissa suites indicate one of several possibilities.

In any case processing NULL Melissa suites is, for now, not a task we can accomplish in this project.

mattyschell commented 3 years ago

The experts reviewed this issue on July 7 2021 and have a different take. When Melissa suites for an address point are NULL we should in fact delete the subaddresses in CSCL.

When buildings are demolished or have a type 1 alterations editors will remove the address point from CSCL and create a new address point.

In "an address point in transition" cases editors have no reason to touch the building footprint in our current staffing and procedural setup. When a building is gutted and rehabbed with a new set of subaddresses we are entirely reliant on Melissa data to tell us what to do with the associated subaddresses.

Final answer - when we encounter only a NULL suite in the geocoded melissa data with matching subaddresses in CSCL we should delete the subaddresses from CSCL. See new issue for implementation.