Location: Geographic identifiers

PaulBoroday commented 6 years ago

In a lot of cases we are facing with a lack of structured information regarding the address of party, place of performance, jurisdiction and so on. To achieve more detailed and (most important) structured description of such information, could something like this be considered as a possible extension of 'Address' object?

{
  "address": {
    "addressDetails": {
      "addressCountry": {
        "scheme": "iso-alpha2",
        "id": "UA",
        "description": "Ukraine",
        "url": ""
      },
      "addressRegion": {
        "scheme": "iso-alpha2",
        "id": "UA-12",
        "description": "Dnipropetrovska oblast",
        "url": ""
      },
      "addressLocality": {
        "scheme": "UN/LOCDE",
        "id":"UA-DNK",  
        "description": "Dnipro",
        "url": "https://service.unece.org/trade/locode/ua.htm"
      }
    }
  }
}

jpmckinney commented 6 years ago

Instead of having fields whose values are each an object, would it serve your needs to have one field whose value is a list of objects? To take @timgdavies' example in open-contracting/standard#524:

In terms of the region and locality break-downs given in the example:

(1) From the perspective of what OCDS might recommend that publishers provide in future, I can't imagine us recommending this information is included 'by default' in OCDS - as this should be information that can be extracted by geocoding a dataset;

(2) Where a publisher does want to include this information (e.g. to share geocoding they have already done), then I would suggest that this is explicitly flagged under a specific property (e.g. location) which could be based on the location object from the location extension.

For example:

{

    "countryName": "Ukraine",
    "countryCode":"UA",
    "location": [
        {
               "description":"Dnipropetrovska oblast",
               "gazetteer":{
                  "scheme":"iso-alpha2",
                  "identifiers":["UA-12"]
               }
         },
        {
               "description":"Dnipro",
               "gazetteer":{
                  "scheme":"UN/LOCODE",
                  "identifiers":["UA-DNK"]
               }
         }
     ]
}

Notes

I'm not sure 'location' is the correct name for this property. Would need to work on defining it more to make sure we have the right term.
The schemes above would need entering into the gazeteer codelist, and codes might change.

In terms of scheme names, I'd prefer using the canonical names of "ISO 3166-1 alpha-2" for "UA" and "ISO 3166-2" for "UA-12". These are separate schemes within the same ISO 3166 standard.

For all schemes, we should also consider how they are versioned. If a scheme never reuses old codes for new geographies, then we can omit any version identifier. If a scheme reuses old codes for new geographies, then we'd need a version identifier. For example, Census divisions in Canada can be re-used "if at least two editions of the classification have been published since it was last used. For example, a code deleted in 2001 may be reused in 2016." So, we'd refer to "SGC 2016" and not simply "SGC".

PaulBoroday commented 6 years ago

@jpmckinney, not exactly, to be honest. In the suggested approach there is no way to understand that particular item of 'location' array describes. Whether its about country or region? Our suggestion solves this issue and extends existing structure of 'address' object without any changes and could be applied for any object, that may include 'address'. Moreover, it would be much more easier to implement JSON-LD approach for multi-lingual description of address, using separate objects for each address-field instead of one object

jpmckinney commented 6 years ago

On one hand, for most schemes, it's possible to determine which location component an item is about: e.g. ISO 3166-1 alpha-2 is only for countries, ISO 3166-2 is only for principal subdivisions (typically the same as 'regions'), UN/LOCODE seems to be only about localities, NUTS applies to different components but you can tell which component a code is about based on its length/format, etc.

On the other hand, that's a lot of logic to implement. If segmenting codes by location component is a priority use case, then your modelling is more appropriate.

PaulBoroday commented 6 years ago

From consumer’s (NEPP, BI, etc) point of view it too hard to try to understand sense of data based on assumptions. Espessially keeping in mind that in different countries different specific schemes could be in use and you never know what, for example, some “IDENTIFICATIONSCHEMAOFSOMETHING” describes: country, region, city or disctrict or maybe something else.

akuckartz commented 5 years ago

Maybe the LOCN vocabulary can be used: https://www.w3.org/ns/locn

ColinMaudry commented 5 years ago

What's the status of this bright suggestion? Was an extension made?

Thanks!

jpmckinney commented 5 years ago

There is no extension yet.

The proposal is to offer a more structured version of the Address object. The current object has string fields for each address component. The proposal would have object fields for each address component, such that a scheme, ID, title and URL can be provided for each.

Some unanswered questions/issues:

What are the use cases for this more structured version, that aren't served by the current Address object?
What if a publisher wants to provide the information using multiple schemes, for the same address component?
What if a publisher wants to provide location information that doesn't match an address component? (we currently only support 3 administrative levels: locality, region, countryName)
Regarding my earlier proposal, can we just add a component field to identify the address component that the object describes, to address @PaulBoroday's concern?

PaulBoroday commented 5 years ago

@jpmckinney

The primary benefit and the case is machine-readable way to provide geographical sets. In real life this information is on the second place of the rating of sensitivity - right after CPV. Even simple cross-border analytics - it would be match more easier to analyze "UA" from Prozorro and "UA" from MTender rather then "Україна" і "Ucraina" from same sources.
Im not sure if it makes sense to classify same geo-attribute with several different classifiers. On the other hand we still can use something similar to 'additionalClassification' here:

{
  "address": {
    "addressDetails": {
      "countryDetails": {
        "classification": {
          "scheme": "iso-alpha2",
          "id": "DE",
          "description": "Germany"
        },
        "additionalClassifications":[
          {
            "scheme":"NUTS",
            "id":"DE",
            "description":"Germany"
          }
        ]
      }
    }
  }
}

regarding more deep description of the address (like 'district') - it a separate story. From my point of view it would be nice to have it but again - lets discuss it separately )
i didnt get an idea with 'component'. Can you please share some example?

Thanks!

jpmckinney commented 5 years ago

i didnt get an idea with 'component'. Can you please share some example?

Each of these are components of an address: street address, locality, region, country, postal code, etc. So, going back to my earlier proposal:

{
  "countryName": "Ukraine",
  "countryCode":"UA",
  "location": [
    {
      "component": "region",
      "description":"Dnipropetrovska oblast",
      "gazetteer":{
        "scheme":"iso-alpha2",
        "identifiers":["UA-12"]
      }
     },
    {
      "component": "locality",
      "description":"Dnipro",
      "gazetteer":{
        "scheme":"UN/LOCODE",
        "identifiers":["UA-DNK"]
      }
    }
  ]
}

This re-uses existing structures from the location extension, and is more flexible: it's possible to specify additional address components and additional schemes without making a very deep and complex structure.

duncandewhurst commented 2 years ago

@jpmckinney is this ready for a PR based on your proposed approach?

jpmckinney commented 2 years ago

This issue has not attracted further demand, so I would close it.

duncandewhurst commented 2 years ago

This issue is relevant to mapping BT-5071 (Place Performance Country Subdivision) and BT-507 (Organization Country Subdivision) from eForms, which are represented using NUTS codes.

The approach in the EU profile was to map the NUTS code to Address.region, e.g. F03 II.2.3. However, in that approach, the context that the value is a NUTS code is lost.

jpmckinney commented 2 years ago

I don't have a problem deferring the identification of the scheme to a publication policy. We don't have evidence of any cases where a publisher mixes multiple geographical schemes. If I look up, e.g. "UKC11" in Google, the first page of results is mostly about the NUTS location.

open-contracting / ocds-extensions

Location: Geographic identifiers #142