openaddresses / openaddresses

A global repository of open address, building, and parcel data.
http://openaddresses.io/
BSD 3-Clause "New" or "Revised" License
2.84k stars 853 forks source link

us/wa/king has many empty values for city #4405

Open petershank opened 5 years ago

petershank commented 5 years ago

Yay, my second contribution! I want to dig deeper and learn to write better issues and help fix things, but I am still finding my way around. Please point me in the right direction!

us/wa/king has a significant portion of rows with an empty city value. The source type seems to be an ESRI file, which I have no experience with. Where should I go to learn enough about ESRI to look at the original data and start to be helpful?

-Peter

justinelliotmeyers commented 5 years ago

go here: http://results.openaddresses.io/?runs=all#runs

http://results.openaddresses.io/sources/us/wa/king

(currently being pulled: https://s3.amazonaws.com/data.openaddresses.io/runs/564461/us/wa/king.zip )

Look at source of data (http://gismaps.kingcounty.gov/arcgis/rest/services/Address/KingCo_AddressPoints/MapServer/0 ), download new raw version of data and see if the attributes exist. If they do, explain what fields should be used and hopefully someone here can implement them when we assign the way the machine processes the data. Let me know if you need more help than that! Cheers! Justin

Just looking at a quick and dirty query, http://gismaps.kingcounty.gov/arcgis/rest/services/Address/KingCo_AddressPoints/MapServer/0/query?where=1%3D1&text=&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&relationParam=&outFields=*&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&returnDistinctValues=false&resultOffset=&resultRecordCount=&f=html

you can see the raw data comes like that. You could always call King County and ask why it has null values or is missing data.

Also, in their raw address points here: http://www5.kingcounty.gov/gisdataportal/Default.aspx 115147/715405 "CTYNAME" = ' ' - So you can call King County and ask them why, or spatially look at the data

justinelliotmeyers commented 5 years ago

king_nulls

nulls

thismakessand commented 5 years ago

I just noticed this yesterday, glad to see there's already an issue opened for it! It looks like there's also a "POSTALCTYNAME" in the source data that is populated when "CTYNAME" is not.

nvkelso commented 5 years ago

+1 for datasets that distinguish (legal) city name from postal city name :)

thismakessand commented 5 years ago

Is there any existing conform attribute function that can be used to take POSTALCTYNAME when CTYNAME is null?
Some times they are both populated (with the same value) so I don't think the join function will work.

trescube commented 5 years ago

Not that I'm aware of, but that's a great idea! I've seen a number of sources where that would be useful.

thismakessand commented 5 years ago

actually @trescube, it looks like you already added it! I'm seeing a "first_non_empty" function in the openaddresses/machine code and the tests confirm it does what it sounds like: https://github.com/openaddresses/machine/blob/d06c1f1dc46b1ff72ed16dfc8fdfd99c10b633f8/openaddr/tests/conform.py#L1295

I don't see it being used by any other sources, but if it's okay to use, let me know and I can put in a PR to fix this issue.

trescube commented 5 years ago

Ha! I completely forgot about that