open-contracting / ocds-extensions

Collects issues for published extensions in one place
1 stars 0 forks source link

Location: Geographic identifiers #142

Closed PaulBoroday closed 2 years ago

PaulBoroday commented 6 years ago

In a lot of cases we are facing with a lack of structured information regarding the address of party, place of performance, jurisdiction and so on. To achieve more detailed and (most important) structured description of such information, could something like this be considered as a possible extension of 'Address' object?

{
  "address": {
    "addressDetails": {
      "addressCountry": {
        "scheme": "iso-alpha2",
        "id": "UA",
        "description": "Ukraine",
        "url": ""
      },
      "addressRegion": {
        "scheme": "iso-alpha2",
        "id": "UA-12",
        "description": "Dnipropetrovska oblast",
        "url": ""
      },
      "addressLocality": {
        "scheme": "UN/LOCDE",
        "id":"UA-DNK",  
        "description": "Dnipro",
        "url": "https://service.unece.org/trade/locode/ua.htm"
      }
    }
  }
}
jpmckinney commented 6 years ago

Instead of having fields whose values are each an object, would it serve your needs to have one field whose value is a list of objects? To take @timgdavies' example in open-contracting/standard#524:


In terms of the region and locality break-downs given in the example:

(1) From the perspective of what OCDS might recommend that publishers provide in future, I can't imagine us recommending this information is included 'by default' in OCDS - as this should be information that can be extracted by geocoding a dataset;

(2) Where a publisher does want to include this information (e.g. to share geocoding they have already done), then I would suggest that this is explicitly flagged under a specific property (e.g. location) which could be based on the location object from the location extension.

For example:

{

    "countryName": "Ukraine",
    "countryCode":"UA",
    "location": [
        {
               "description":"Dnipropetrovska oblast",
               "gazetteer":{
                  "scheme":"iso-alpha2",
                  "identifiers":["UA-12"]
               }
         },
        {
               "description":"Dnipro",
               "gazetteer":{
                  "scheme":"UN/LOCODE",
                  "identifiers":["UA-DNK"]
               }
         }
     ]
} 

Notes


In terms of scheme names, I'd prefer using the canonical names of "ISO 3166-1 alpha-2" for "UA" and "ISO 3166-2" for "UA-12". These are separate schemes within the same ISO 3166 standard.

For all schemes, we should also consider how they are versioned. If a scheme never reuses old codes for new geographies, then we can omit any version identifier. If a scheme reuses old codes for new geographies, then we'd need a version identifier. For example, Census divisions in Canada can be re-used "if at least two editions of the classification have been published since it was last used. For example, a code deleted in 2001 may be reused in 2016." So, we'd refer to "SGC 2016" and not simply "SGC".

PaulBoroday commented 6 years ago

@jpmckinney, not exactly, to be honest. In the suggested approach there is no way to understand that particular item of 'location' array describes. Whether its about country or region? Our suggestion solves this issue and extends existing structure of 'address' object without any changes and could be applied for any object, that may include 'address'. Moreover, it would be much more easier to implement JSON-LD approach for multi-lingual description of address, using separate objects for each address-field instead of one object

jpmckinney commented 6 years ago

On one hand, for most schemes, it's possible to determine which location component an item is about: e.g. ISO 3166-1 alpha-2 is only for countries, ISO 3166-2 is only for principal subdivisions (typically the same as 'regions'), UN/LOCODE seems to be only about localities, NUTS applies to different components but you can tell which component a code is about based on its length/format, etc.

On the other hand, that's a lot of logic to implement. If segmenting codes by location component is a priority use case, then your modelling is more appropriate.

PaulBoroday commented 6 years ago

From consumer’s (NEPP, BI, etc) point of view it too hard to try to understand sense of data based on assumptions. Espessially keeping in mind that in different countries different specific schemes could be in use and you never know what, for example, some “IDENTIFICATIONSCHEMAOFSOMETHING” describes: country, region, city or disctrict or maybe something else.

akuckartz commented 5 years ago

Maybe the LOCN vocabulary can be used: https://www.w3.org/ns/locn

ColinMaudry commented 5 years ago

What's the status of this bright suggestion? Was an extension made?

Thanks!

jpmckinney commented 5 years ago

There is no extension yet.

The proposal is to offer a more structured version of the Address object. The current object has string fields for each address component. The proposal would have object fields for each address component, such that a scheme, ID, title and URL can be provided for each.

Some unanswered questions/issues:

PaulBoroday commented 5 years ago

@jpmckinney

{
  "address": {
    "addressDetails": {
      "countryDetails": {
        "classification": {
          "scheme": "iso-alpha2",
          "id": "DE",
          "description": "Germany"
        },
        "additionalClassifications":[
          {
            "scheme":"NUTS",
            "id":"DE",
            "description":"Germany"
          }
        ]
      }
    }
  }
}

Thanks!

jpmckinney commented 5 years ago

i didnt get an idea with 'component'. Can you please share some example?

Each of these are components of an address: street address, locality, region, country, postal code, etc. So, going back to my earlier proposal:

{
  "countryName": "Ukraine",
  "countryCode":"UA",
  "location": [
    {
      "component": "region",
      "description":"Dnipropetrovska oblast",
      "gazetteer":{
        "scheme":"iso-alpha2",
        "identifiers":["UA-12"]
      }
     },
    {
      "component": "locality",
      "description":"Dnipro",
      "gazetteer":{
        "scheme":"UN/LOCODE",
        "identifiers":["UA-DNK"]
      }
    }
  ]
} 

This re-uses existing structures from the location extension, and is more flexible: it's possible to specify additional address components and additional schemes without making a very deep and complex structure.

duncandewhurst commented 2 years ago

@jpmckinney is this ready for a PR based on your proposed approach?

jpmckinney commented 2 years ago

This issue has not attracted further demand, so I would close it.

duncandewhurst commented 2 years ago

This issue is relevant to mapping BT-5071 (Place Performance Country Subdivision) and BT-507 (Organization Country Subdivision) from eForms, which are represented using NUTS codes.

The approach in the EU profile was to map the NUTS code to Address.region, e.g. F03 II.2.3. However, in that approach, the context that the value is a NUTS code is lost.

jpmckinney commented 2 years ago

I don't have a problem deferring the identification of the scheme to a publication policy. We don't have evidence of any cases where a publisher mixes multiple geographical schemes. If I look up, e.g. "UKC11" in Google, the first page of results is mostly about the NUTS location.