whosonfirst / whosonfirst-sources

Where things come from in Who's On First.
Other
21 stars 13 forks source link

Standardize on unknown instead of missing #112

Closed nvkelso closed 4 years ago

nvkelso commented 6 years ago

We've used two different "sources" to indicate missing &/or unknown information in Who's On First. We should standardize on just one, I propose keeping unknown and marking missing deprecated with a related PR in the data repo to toggle them around.

It would also be helpful to report how many records (count and as % of project) and what type of properties they're associated with.

Related: https://github.com/whosonfirst/whosonfirst-sources/issues/65.

stepps00 commented 6 years ago

Related:

https://github.com/whosonfirst/py-mapzen-whosonfirst-export/blob/master/mapzen/whosonfirst/export/__init__.py#L420

https://github.com/whosonfirst/py-mapzen-whosonfirst-import/blob/master/mapzen/whosonfirst/importer/woedb.py#L47

stepps00 commented 6 years ago

List of ids for records that have a "missing" src:geom: missing_src_geom.txt

List of ids for records that have an "unknown" src:geom: unknown_source_geom.txt

List of ids for records that have an "unknown" src:geom_alt: has_unknown_source_geom_alt.txt

stepps00 commented 4 years ago

The linked PRs above swap out "missing" values, replacing them with "unknown". In total, there are 9,917 cases out of ~4.8+ million admin records.

There is no need for code changes; out export code uses "unknown" when a source is not available.

Once the PRs are merged, we can close this issue out.

nvkelso commented 4 years ago

All those PRs look good to me, thanks!