whosonfirst / whosonfirst-properties

What things mean in Who's On First documents
Other
5 stars 5 forks source link

"Ownership" of WOF Properties? #100

Closed vicchi closed 11 months ago

vicchi commented 4 years ago

Within a lot of formal UK open geodata, currently represented by OS and London DataStore data sources in WOF, there's property definitions for codes within the formal UK admin hierarchy, see ./properties/os in this repo, where all the properties except nhs_ha_code, nhs_regional_ha_code and positional_quality_code define what are really GSS (Government Statistical Service) codes. See https://en.wikipedia.org/wiki/ONS_coding_system for more information.

Whilst these codes are percolated down into other UK open geo data, the authoritative source is the UK's Office for National Statistics (ONS), Register of Geographic Codes (RGC) data set, the latest version of which is here: https://geoportal.statistics.gov.uk/datasets/register-of-geographic-codes-april-2020-for-the-united-kingdom

All of which is a long winded background to the fact that I'm planning on putting in GSS concordance as well as GSS code properties for all applicable levels in the admin hierarchy in an upcoming PR for the whosonfirst-data/whosonfirst-data-admin-gb repo which will have updated admin boundaries for the UK, sourced from ONS open geo data. See also whosonfirst/whosonfirst-sources#175 which adds ONS as a new data source.

So when I do this, do I duplicate the property definitions, found here in ./properties/os or look at some way of migrating those properties to ONS specific properties? If that makes sense?

thisisaaronland commented 4 years ago

It sounds like you should create properties/gss/*.json ?

vicchi commented 4 years ago

@thisisaaronland That makes complete sense and will handle the upcoming GSS codes when I'm finished a) setting up the full pipeline to be able to build/verify any changes I make and b) submit a PR for them.

But what about the existing properties, which I suspect are more than just OS ... leave them as is and worry about this another day?

thisisaaronland commented 4 years ago

Can you copy/paste an example (or pseudo-example) of these properties and where they overlap?

vicchi commented 4 years ago

@thisisaaronland Actually this affects the data that came from OS Northern Ireland as well ... case in point whosonfirst-data-admin-gb:data/136/069/924/7/1360699247.geojson has the property "ni-os:lgdcode":"N09000011", which according to the RGC, N09 is a Local Government District (LGD) for NI and N09000011 is the LGD for Ards and North Downs.

See also https://github.com/whosonfirst-data/whosonfirst-data-postalcode-gb/pull/4, where @tomtaylor is importing ONS PD data, same code set there.

Plus I also suspect, but haven't confirmed, that you'll find these GSS codes in any data that originated from OS, such as CodePoint Open.

So basically, we can either recognise that this is a short coming and keep these codes duplicated under different properties in the data with different property prefixes, which sort of occludes the fact that they're a common linking identifier in UK open data ... or we (which probably means me) can look through all the UK WOF data to find out where these codes crop up and rationalise them under a common prefix, be that GSS which relates to the code itself or ONS which relates to the authoritative source of the code.

thisisaaronland commented 4 years ago

I see.

Generally the convention has been to leave "source" properties untouched. If I am understanding things then there would be a mulitple UK data sources publishing the same data (values) but with different keys?

I think I keep the source data as-is pending some broader decision about whether the properties in question need to remain and add a new mz: (previously "mapzen" not "metazen") property with the rationalized identifier. It does suggest that a mz:foo <--> gss:bar concordances map would be helpful. Maybe?

Paging @stepps00 for #feelings.

vicchi commented 4 years ago

That's pretty much the case ... as a general rule of thumb almost all open data that comes from OS (UK) and OS NI (which isn't the same as OS UK but that's another long convoluted conversation), tends to reference a GSS code where that makes sense.

Where it makes sense is if the geography that being referred to is something that ONS is interested in or it resides inside or is otherwise related to something that ONS is interested in, then it'll have a GSS code.

GSS codes are defined for lots of things, but a brief selection is admin areas, both formal (counties, boroughs, unitary authorities), electoral (districts, wards), census (things called output areas such as LSOAs, MSOAs), health, education or emergency service areas and lots of other stuff.

But in the main if you think of GSS codes as being identifiers for an admin hierarchy which covers a lot more than just formal admin geographies, so like a WOF or WOE id, then you're on the right lines.

Both OS flavours tend to provide GSS codes as a sort of concordance (except they don't call it that) in their data, both open and closed, as does ONS for most of the data they provide or at least where it makes a modicum of sense to do so.

But it all comes back to the fact that ONS are the custodians of the GSS code and they issue updates to these, plus a full change log, whenever something changes, such as when there's electoral boundary changes.

All of which is more than you'll probably ever need to know, but might be helpful.

nvkelso commented 4 years ago

Sounds like a case for pushing updates / corrections to WOF concordances to me?

vicchi commented 4 years ago

@nvkelso So "just" replace the current properties that contain GSS codes in their varied forms to just being entries in the concordances object in the GeoJSON then? Or keep the properties as is and add the concordances?

nvkelso commented 4 years ago

Generally additive is the right approach, at least for some period of time, for the not MZ, not WOF prefixed properties, before they are retired.

These sound like concordances and should be promoted to that WOF top-level property so we don't force consumers to dip into other less well known name spaces to find them.

@stepps00 to chime in, too, please.

stepps00 commented 4 years ago

"So basically, we can either recognise that this is a short coming and keep these codes duplicated under different properties in the data with different property prefixes [...]"

:+1: IMO, Who's On First can/should catalogue each source's properties as they are maintained upstream at the source. So, it's okay for Who's On First to keep these codes duplicated across different sources and property prefixes.

In this case, it sounds like various sources repackage what is essentially the "Government Statistical Service" code. So, in addition to top-level os:*, gbr-ons:*, and gss:* properties with these duplicated values, we could also add the "official" gss source as a concordance in the wof:concordances list.

This sounds somewhat similar to what we already do with GeoNames identifiers.. Who's On First has the identifier in a top-level gn:id property, potentially as a top-level ne:gn_id property from Natural Earth, and as a concordance in the wof:concordances property.

nvkelso commented 11 months ago

WOF recently (Sept 2023) introduced a new wof:concordances_official to designate which of the many WOF concordances is the "official" concordance for joining with census statistics.

In this case, we've set that to gbr-ons:gss_code for features like England & etc.

image