whosonfirst / whosonfirst-properties

What things mean in Who's On First documents
Other
5 stars 5 forks source link

JSON schema validation and WOF document property normalisation #66

Closed vicchi closed 6 years ago

vicchi commented 6 years ago

See whosonfirst/whosonfirst-json-schema#1 for details of this PR in context with other changes/pull requests.

nvkelso commented 6 years ago

JSON schema support; allow wof:country and iso:country to have a value of “-1”

Can you point out where this logic is enforced? We allow user assigned X* range, too.

On May 1, 2018, at 06:52, Gary Gale notifications@github.com wrote:

See whosonfirst/whosonfirst-json-schema#1 for details of this PR in context with other changes/pull requests.

You can view, comment on, or merge this pull request online at:

https://github.com/whosonfirst/whosonfirst-properties/pull/66

Commit Summary

Add missing data types/edit incorrect data types, based on comparison against the whosonfirst-data repo Add a default data type (string) to all properties which are missing one and which have no exemplars in the whosonfirst-data repo JSON schema support; enhance property definitions for list data types JSON schema support; enhance property definitions for dictionary data types JSON schema support; add property name and value validation regex patterns JSON schema support; expand wof:concordances to be an enumerated dict JSON schema support; update README JSON schema support; src:geom_alt should be list of strings JSON schema support; wk:area, wk:population, wk:wordcount should be float JSON schema support; allow selected wof: and ne: properties to have "null" as a valid string value JSON schema support; allow iso:country to be an empty string JSON schema support; allow wof:country to be an empty string, an ISO 3166 country code or -99 JSON schema support; ensure wof:scale and wof:megacity are integer values JSON schema support; ensure wk:elevation and ne:ADM0CAP are float values JSON schema support; handle instances of wof:placetype_alt and wof:category which have multiple string values JSON schema suport; handle instances of wof:created which have a null value JSON schema support; handle instances of meso:admin1r_en, meso:oktmo_code, meso:hasc_id, meso:admin_1r, meso:diss_me, meso:gadm_cyril, meso:gadm_alt, meso:name_lat, meso:okato_name, meso:okato_code and meso:oktmo_name which have a null value JSON schema support; ensure meso:local_id is string value JSON schema support; handle instances of begov:numac that have a null value JSON schema support; ensure begov:oppervl is a float value JSON schema support; allow sci-not values in geom:bbox (for now) JSON schema support; allow wof:country and iso:country to have a value of “-1” JSON schema support; handle instances of gn:gn_country that have null values JSON schema support; handle multiple instances of misc: properties that have null values JSON schema support; allow misc:woe_ver to be a string JSON schema support; ensure oa:elevation_ft is an integer value JSON schema support; mz:hours should be a list of strings JSON schema support; ensure gp:adm0 is an integer value JSON schema support; allow multiple ne: properties to have a null string value JSON schema support; allow multiple ne: properties to have integer values JSON schema support; allow multiple ne: properties to have float values JSON schema support; allow ne:SOV_A3 and ne:ISO_A2 to have values of “-99” and “-1” JSON schema support; handle cases where ne:SOV_A3 contains a 3 character alphanumeric code JSON schema support; coerce oa:elevation_ft to a string JSON schema support; allow multiple properties to have a null string value JSON schema support; coerce qs:photos_all, qs:photos_9k, qs:photos_1k and qs:photos_sr to string SON schema support; coerce multiple properties to integer JSON schema support; allow sg:classifiers to be a dict File Changes

M README.md (49) M properties/SIJ/admin_1r.json (10) M properties/SIJ/hasc_id.json (10) M properties/abbreviation/eng_x_preferred.json (10) M properties/abrv/colloquial.json (10) M properties/abrv/eng_x_preferred.json (13) M properties/abrv/historical.json (10) M properties/abrv/preferred.json (10) M properties/abrv/unknown.json (11) M properties/abrv/variant.json (10) M properties/acgov/api.json (4) M properties/acgov/type.json (4) M properties/acme/elev.json (4) M properties/acme/site_id.json (4) M properties/addr/email.json (4) M properties/addr/intersection.json (4) M properties/addr/notes.json (4) M properties/addr/opentable.json (4) M properties/addr/phon.json (4) M properties/addr/postal.json (4) M properties/addr/postcode.json (4) M properties/addr/url.json (4) M properties/addr/yelp.json (4) M properties/amsgis/DOCDATUM.json (10) M properties/amsgis/DOCNR.json (10) M properties/amsgis/INGSDATUM.json (10) M properties/amsgis/VOLLCODE.json (10) M properties/amsgis/_categories.json (10) M properties/amsgis/_id.json (10) M properties/amsgis/categorie.json (10) M properties/amsgis/code.json (10) M properties/amsgis/display.json (8) M properties/amsgis/externe_id.json (10)
M properties/amsgis/naam.json (10) M properties/amsgis/online_tijdsaspect.json (10) M properties/amsgis/titel.json (10) M properties/amsgis/titel_key.json (10) M properties/amsgis/type.json (10) M properties/amsgis/type_2.json (10) M properties/amsgis/uri.json (10) M properties/atgov/bez_name.json (4) M properties/atgov/bez_nr.json (4) M properties/atgov/gem_name.json (4) M properties/atgov/gkz.json (4) M properties/atldpcd/FID_TXT.json (10) M properties/atldpcd/GLOBALID.json (10) M properties/atldpcd/NAME.json (10) M properties/atldpcd/NPU.json (10) M properties/atldpcd/OBJECTID.json (10) M properties/atldpcd/OLD_NAME.json (10) M properties/ausstat/POA_CODE.json (4) M properties/ausstat/POA_NAME.json (4) M properties/ausstat/SQKM.json (4) M properties/austriaod/bez_name.json (10) M properties/austriaod/bez_nr.json (10) M properties/austriaod/gem_name.json (10) M properties/austriaod/gem_nr.json (10) M properties/austriaod/land_name.json (10) M properties/austriaod/land_nr.json (10) M properties/austriaod/objectid.json (10)
M properties/azavea/LISTNAME.json (10) M properties/azavea/MAPNAME.json (10) M properties/azavea/NAME.json (10) M properties/baltomoit/label.json (10) M properties/baltomoit/nbrdesc.json (10) M properties/begov/AREA.json (4) M properties/begov/BEGIN_LIFE.json (4) M properties/begov/END_LIFE.json (4) M properties/begov/ID.json (4) M properties/begov/INSPIRE_ID.json (4) M properties/begov/MU_ID.json (4) M properties/begov/MU_NAME_DU.json (4) M properties/begov/MU_NAME_FR.json (4) M properties/begov/MU_NAT_COD.json (4) M properties/begov/NAT_CODE.json (4) M properties/begov/PZ_ID.json (4) M properties/begov/PZ_NAME_DU.json (4) M properties/begov/PZ_NAME_FR.json (4) M properties/begov/PZ_NAT_COD.json (4) M properties/begov/VERSIONID.json (4) M properties/begov/datpublbs.json (10) M properties/begov/lengte.json (10) M properties/begov/naam.json (10) M properties/begov/niscode.json (10) M properties/begov/numac.json (13) M properties/begov/oidn.json (10) M properties/begov/oppervl.json (10) M properties/begov/terrid.json (10) M properties/begov/uidn.json (10) M properties/bowie/latitude.json (10) M properties/bowie/longitude.json (10) M properties/bra/Name.json (10) M properties/bra/Neighborho.json (10) M properties/bra/OBJECTID.json (10) M properties/camgov/NAME.json (10) M properties/camgov/N_HOOD.json (10) M properties/camgov/Webpage.json (10)
M properties/can-abog/CITY_ID.json (10)
M properties/can-abog/GEOCODE.json (10) M properties/can-abog/GEONAME.json (10) M properties/can-abog/HAMLET_ID.json (10) M properties/can-abog/PID.json (10) M properties/can-bbygov/NEIGHBOURH.json (10) M properties/can-bbygov/OBJECTID_1.json (10) M properties/can-bbygov/PSA.json (10) M properties/can-calcai/class.json (10) M properties/can-calcai/class_code.json (10) M properties/can-calcai/comm_code.json (10) M properties/can-calcai/comm_structure.json (10) M properties/can-calcai/name.json (10) M properties/can-calcai/sector.json (10) M properties/can-calcai/srg.json (10) M properties/can-dnvgov/GLOBALID.json (10) M properties/can-dnvgov/MET_INPUT.json (10) M properties/can-dnvgov/MET_TECH.json (10) M properties/can-dnvgov/MET_TECH_R.json (10) M properties/can-dnvgov/NBDY_NAME.json (10) M properties/can-dnvgov/NBDYNAME.json (10) M properties/can-dnvgov/OBJECTID.json (10) M properties/can-dnvgov/STATS_ID.json (10) M properties/can-dnvgov/YEAR.json (10)
M properties/can-edmdsd/name.json (10) M properties/can-edmdsd/number.json (10) M properties/can-gatsudd/LSECSTATID.json (10) M properties/can-gatsudd/MUN_MRC.json (10) M properties/can-gatsudd/NOM_COMM.json (10) M properties/can-gatsudd/NOM_HISTOR.json (10) M properties/can-gatsudd/NO_COMM.json (10) M properties/can-gatsudd/NO_COMM_1.json (10) M properties/can-gatsudd/POP_2006.json (10) M properties/can-gatsudd/SECTEUR.json (10) M properties/can-gatsudd/SECTREG_2.json (10) M properties/can-mntsmvt/no_arr.json (10) M properties/can-mntsmvt/no_qr.json (10) M properties/can-mntsmvt/nom_arr.json (10) M properties/can-mntsmvt/nom_mun.json (10) M properties/can-mntsmvt/nom_qr.json (10) M properties/can-nwds/NEIGHNUM.json (10) M properties/can-nwds/NEIGH_NAME.json (10) M properties/can-ons/Name.json (10) M properties/can-ons/Name2016.json (8) M properties/can-ons/Name2016_F.json (10) M properties/can-ons/Name2017.json (10) M properties/can-ons/ONSID.json (10) M properties/can-wpgppd/id.json (10) M properties/can-wpgppd/name.json (10) M properties/canvec-hydro/definit.json (10) M properties/canvec-hydro/definit_en.json (10) M properties/cbsnl/BU_CODE.json (10) M properties/cbsnl/BU_NAAM.json (10) M properties/cbsnl/GM_CODE.json (10) M properties/cbsnl/GM_NAAM.json (10) M properties/cbsnl/IND_WBI.json (10) M properties/cbsnl/OAD.json (10) M properties/cbsnl/STED.json (10) M properties/cbsnl/WATER.json (10) M properties/cbsnl/WK_CODE.json (10) M properties/cbsnl/WK_NAAM.json (10) M properties/chgov/os_uuid.json (4) M properties/chgov/uuid.json (4) M properties/clustr/alpha.json (4) M properties/clustr/area.json (4) M properties/clustr/count.json (4) M properties/clustr/density.json (4) M properties/clustr/perimeter.json (4) M properties/counts/concordances_total.json (4) M properties/counts/languages_official.json (4) M properties/counts/languages_spoken.json (4) M properties/counts/languages_total.json (4) M properties/counts/names_colloquial.json (4) M properties/counts/names_languages.json (4) M properties/counts/names_prefered.json (4) M properties/counts/names_total.json (4) M properties/counts/names_variant.json (4) M properties/denvercpd/NBHD_ID.json (10) M properties/denvercpd/NBHD_NAME.json (10) M properties/ebc/bdyset_id.json (4) M properties/ebc/ed_abbrev.json (4) M properties/ebc/ed_id.json (4) M properties/ebc/ed_name.json (4) M properties/ebc/feat_area.json (4) M properties/ebc/feat_perim.json (4) M properties/ebc/gazette_dt.json (4) M properties/ebc/objectid.json (4) M properties/edtf/deprecate.json (10) M properties/esp-aytomad/CODBAR.json (10) M properties/esp-aytomad/CODBARRIO.json (10) M properties/esp-aytomad/CODDISTRIT.json (10) M properties/esp-aytomad/NOMBRE.json (10) M properties/esp-aytomad/NOMDIS.json (10) M properties/esp-aytomad/OBJECTID.json (10) M properties/esp-cartobcn/C_Barri.json (10) M properties/esp-cartobcn/C_Distri.json (10) M properties/esp-cartobcn/N_Barri.json (10) M properties/esp-cartobcn/N_Distri.json (10) M properties/esp-cartobcn/WEB_1.json (10) M properties/esp-cartobcn/WEB_4.json (10) M properties/figov/Aluejako.json (10) M properties/figov/Kunta.json (10) M properties/figov/Tunnus.json (10) M properties/figov/ajo_pvm.json (10) M properties/figov/eng_type.json (10) M properties/figov/fin_type.json (10) M properties/figov/gml_id.json (10) M properties/figov/local_id.json (10) M properties/figov/national_code.json (10) M properties/figov/nimi.json (10) M properties/figov/nimi_se.json (10) M properties/figov/swe_type.json (10) M properties/fra-odp/c_ar.json (10) M properties/fra-odp/c_qu.json (10) M properties/fra-odp/c_quinsee.json (10) M properties/fra-odp/l_qu.json (10) M properties/frgov/DEP.json (4) M properties/frgov/ID.json (4) M properties/frgov/LIB.json (4) M properties/frgov/POP2010.json (4) M properties/frgov/SURF.json (4) M properties/frgov/_COL6.json (4) M properties/fsgov/Aluejako.json (10) M properties/fsgov/Kunta.json (10) M properties/fsgov/Tunnus.json (10) M properties/fsgov/ajo_pvm.json (10) M properties/fsgov/nimi.json (10) M properties/fsgov/nimi_se.json (10) M properties/gbr-datalondon/GSS_CODE.json (10) M properties/gbr-datalondon/ONS_INNER.json (10) M properties/geom/bbox.json (7) M properties/geom/hash.json (4) M properties/geom/src.json (10) M properties/geom/type.json (4) M properties/geonames/id.json (10) M properties/gn/accuracy.json (4) M properties/gn/adm1_code.json (4) M properties/gn/adm1_name.json (4) M properties/gn/adm2_code.json (4) M properties/gn/adm2_name.json (4) M properties/gn/adm3_code.json (4) M properties/gn/adm3_name.json (4) M properties/gn/gn_country.json (13) M properties/gn/gn_fcode.json (10) M properties/gn/gn_pop.json (10) M properties/gn/latitude.json (4) M properties/gn/longitude.json (4) M properties/gn/name.json (4) M properties/gn/pop.json (10) M properties/gn/population.json (4) M properties/goem/longitude.json (10) M properties/gp/adm0.json (4) M properties/gp/id.json (4) M properties/gp/parent_id.json (10) M properties/hkigis/ALUETASO.json (10) M properties/hkigis/ID.json (10) M properties/hkigis/ID1.json (10) M properties/hkigis/KOKOTUN.json (10) M properties/hkigis/KOKOTUNNUS.json (10) M properties/hkigis/KUNTA.json (10) M properties/hkigis/KUNTA_NIMI.json (10) M properties/hkigis/K_NIMI_SE.json (10) M properties/hkigis/Mtryhm.json (10) M properties/hkigis/NIMI.json (10) M properties/hkigis/NIMI_ISO.json (10) M properties/hkigis/NIMI_SE.json (10) M properties/hkigis/PIEN.json (10) M properties/hkigis/SUUR.json (10) M properties/hkigis/SUURP_TN.json (10) M properties/hkigis/SUUR_N_FI.json (10)
M properties/hkigis/SUUR_N_SE.json (10) M properties/hkigis/TILA.json (10) M properties/hkigis/TUNNUS.json (10) M properties/hkigis/ajo_pvm.json (10) M properties/hkigis/aluejako.json (10) M properties/iso/country.json (7) M properties/itu/country_code.json (13) M properties/itu/region.json (10) M properties/kuogov/ID.json (10) M properties/kuogov/NIMI.json (10) M properties/lacity/CERTIFIED.json (10) M properties/lacity/DWEBSITE.json (10) M properties/lacity/NAME.json (10) M properties/lacity/NC_ID.json (10) M properties/lacity/NSA.json (10) M properties/lacity/OBJECTID.json (10) M properties/lacity/WADDRESS.json (10)
M properties/lflt/label_text.json (4) M properties/meso/admin1r_en.json (7) M properties/meso/admin_1r.json (7) M properties/meso/diss_me.json (7) M properties/meso/gadm_alt.json (7) M properties/meso/gadm_cyril.json (7) M properties/meso/hasc_id.json (7) M properties/meso/local_id.json (2) M properties/meso/mps_x.json (4) M properties/meso/mps_y.json (4) M properties/meso/name_lat.json (7) M properties/meso/objectid.json (4) M properties/meso/okato_code.json (7) M properties/meso/okato_name.json (7) M properties/meso/oktmo_code.json (7) M properties/meso/oktmo_name.json (7)
Patch Links:

https://github.com/whosonfirst/whosonfirst-properties/pull/66.patch https://github.com/whosonfirst/whosonfirst-properties/pull/66.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

vicchi commented 6 years ago

@nvkelso This logic is for JSON schema validation, when the properties JSON files are aggregated into properties.json.

If you look at patterns.value you'll see there's a regex, in JSON schema syntax, used to validate the contents of wof:country and iso:country. See https://github.com/whosonfirst/whosonfirst-properties/blob/gg-json-schema-support/properties/iso/country.json and https://github.com/whosonfirst/whosonfirst-properties/blob/gg-json-schema-support/properties/wof/country.json.

In this context enforced is via JSON schema validation, if that makes sense.

nvkelso commented 6 years ago

wof:country pattern matching looks good (any string with any 2 chars, versus a list of ISO codes since we’re a superset if that).

On May 1, 2018, at 10:56, Gary Gale notifications@github.com wrote:

@nvkelso This logic is for JSON schema validation, when the properties JSON files are aggregated into properties.json.

If you look at patterns.value you'll see there's a regex, in JSON schema syntax, used to validate the contents of wof:country and iso:country. See https://github.com/whosonfirst/whosonfirst-properties/blob/gg-json-schema-support/properties/iso/country.json and https://github.com/whosonfirst/whosonfirst-properties/blob/gg-json-schema-support/properties/wof/country.json.

In this context enforced is via JSON schema validation, if that makes sense.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

thisisaaronland commented 6 years ago

@nvkelso Does QGIS export all the ne: properties as floats? We can trap/fix that in the py-export code, if necessary.

nvkelso commented 6 years ago

QGIS just uses DBF files which have int and floats (with precision). But they do seem to have been cast somewhere along the way. Kinda shrug?

On May 1, 2018, at 16:43, Aaron Straup Cope notifications@github.com wrote:

@nvkelso Does QGIS export all the ne: properties as floats? We can trap/fix that in the py-export code, if necessary.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

nvkelso commented 6 years ago

Aaron does this end up just in Postgres? But then we need to read that seems useful to have Schema again. I’d just note for followup with issue.

On May 4, 2018, at 08:24, Gary Gale notifications@github.com wrote:

@vicchi commented on this pull request.

In properties/wof/placetype_id.json:

@@ -3,5 +3,5 @@ "name": "placetype_id", "prefix": "wof", "description": "",

  • "type": "" -}
  • "type": "string" @thisisaaronland @nvkelso @stepps00 Wait ... so it's not clear what the approach is. Do we ...

Deprecate wof:placetype_id and wof:superceded? If so, do we add a new property to the JSON files? Or git rm them and leave the files in the git history? Leave them as is and work out how to deprecate a property via the cunning use of an issue so we don't forget about this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

nvkelso commented 6 years ago

What more is left here, and do we need a related issue to clean any records that don’t confirm to the schema (like the NE props)?

vicchi commented 6 years ago

There should be nothing left to clean - it’s all been done in my fixup script and that covers all property namespaces, including “ne:” - at least as far as JSON schema conformance is concerned. All the WOF docs in the whosonfirst-data repo validate cleanly via my validation script in the whosonfirst-json-schema repo and the property definitions in this repo are now used to automatically build the JSON schema.

So everything should be aligned and in agreement. I’m pretty sure this means we can merge and close this PR now?

One final thought, which probably merits an issue, is that we should have some form of CI that checks for properties which don’t have a JSON property definition and then regenerates the JSON schema from the property definitions and validates the whosonfirst-data repo. Thoughts?

nvkelso commented 6 years ago

Woot, thank you! I’ll let Aaron do the honors :)

On May 9, 2018, at 08:32, Gary Gale notifications@github.com wrote:

There should be nothing left to clean - at least as far as JSON scheme conformance is concerned. All the WOF docs in the whosonfirst-data repo validate cleanly via my validation script in the whosonfirst-json-schema repo and the property definitions in whosonfirst-properties are used to automatically build the JSON schema.

So everything should be aligned and in agreement. I’m pretty sure this means we can merge and close this PR now?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

vicchi commented 6 years ago

@nvkelso And raise another issue for the CI idea? Unless this is already being done somewhere and I can piggy-back on that?

nvkelso commented 6 years ago

CurcleCI Times out after around 20 minutes if large repo checkouts so likely not feasible (from my experience with personal repo on election data). Suggest investigating git commit hooks instead.

On May 9, 2018, at 09:01, Gary Gale notifications@github.com wrote:

@nvkelso And raise another issue for the CI idea? Unless this is already being done somewhere and I can piggy-back on that?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

stepps00 commented 6 years ago

Thank you, @vicchi!

vicchi commented 6 years ago

@nvkelso Cool. Thanks.

@thisisaaronland Is there some server resource I can use to try and put this in place?

thisisaaronland commented 6 years ago

Let's talk about CI stuff in a separate thread/issue. It might make sense to build out the scaffolding to spin up a new machine on demand / nightly to vet things since that is work that could be repurposed for generating "distributions" (bundles, repos, etc.) for a given repo.

Or yeah, commit hooks maybe.