phenopackets / phenopacket-schema

Repository for the GA4GH phenopacket schema
https://phenopacket-schema.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
78 stars 30 forks source link

Use open location code instead of current location #12

Closed julesjacobsen closed 4 years ago

julesjacobsen commented 6 years ago

Open Location Code aka PlusCodes if you want the glossy overview, looks to be a nicely engineered solution to the issue of location.

From their README:

Open Location Code is a technology that gives a way of encoding location into a form that is easier to use than latitude and longitude. The codes generated are called plus codes, as their distinguishing attribute is that they include a "+" character.

The technology is designed to produce codes that can be used as a replacement for street addresses, especially in places where buildings aren't numbered or streets aren't named.

Plus codes represent an area, not a point. As digits are added to a code, the area shrinks, so a long code is more precise than a short code.

Codes that are similar are located closer together than codes that are different.

A location can be converted into a code, and a code can be converted back to a location completely offline.

There are no data tables to lookup or online services required. The algorithm is publicly available and can be used without restriction.

Currently location is defined like this:

// A GeoLocation object provides information about a geographic position
// related to a record. Examples could be:
//  - an address, e.g. of a lab performing an analysis
//  - provenance of an individual, obfuscated to a larger order administrative
//    entity (Suffolk, U.K.)
//  - a lat/long/alt position where an environmental sample was collected
//   The geographic point object uses the default units from the DCMI point scheme
//  http://dublincore.org/documents/dcmi-point/
//  and avoids optional representation in non-standard units.
message GeoLocation {
    // a text representation, preferably using standard geographic identification
    // elements, of the corresponding latitude,longitude(,altitude)
    // This representation serves the purposes to
    //  - capture standard data entry parameters
    //  - provide a sanity check for latitude,longitude values
    // Example:
    //  - 34 Washington Blvd, Marina del Rey, CA  90292, United States
    //  - Str Marasesti 5, 300077 Timisoara, Romania
    //  - Heidelberg, Deutschland
    string label = 1;

    // an optional indication of the maximum precision to be derived from the
    // latitude,longitude values
    // Example:
    // Given a street address "Winterthurerstrasse 190, 8057 Zürich, Switzerland",
    // a privacy driven (destructive) obfuscation approach could recode this
    // to
    //  "latitude": 47.37, "longitude": 8.54
    //  while providing
    //  "precision":"city", "label": "Zürich, Switzerland"
    // ... indicating that the original location could correspond to any
    // latitude,longitude point value inside the administrative boundaries of
    // the city of Zürich, Switzerland
    string precision = 2;

    // signed decimal degrees (North, relative to Equator)
    double latitude = 3;

    // signed decimal degrees (East, relative to IERS Reference Meridian)
    double longitude = 4;

    // optional, e.g. for environmental samples
    double altitude = 5;
}

It could, using the plus code, be represented like this for the Zurich example:

{
  "geoLocation": {
    "plusCode": "8FVC9G00+"
  }
}

Which resolves to this area: https://plus.codes/8FVC9G00+

The only snag with this is that if the thing you're trying to fit into a box doesn't fully fit, then what? In this example Zurick straddles approximately four boxes at this scale:

https://plus.codes/8FVCCG00+ https://plus.codes/8FVCCH00+ https://plus.codes/8FVC9G00+ https://plus.codes/8FVC9H00+

but the next box up is too big. Even so, these are computable compared to the existing 'precision' string, which is completely undefined. If the goal is purely approximation and obfuscation, then the pluscode will work better then the existing setup.

<dependency>
  <groupId>com.google.openlocationcode</groupId>
  <artifactId>openlocationcode</artifactId>
  <version>1.0</version>
  <type>pom</type>
</dependency>
mbaudis commented 5 years ago

There are various options beyond the simple format we'd defined:

Just for discussion, this. I'm happy w/ the simple long/lat/precision/label format, but then GeoJSON etc...

mbaudis commented 5 years ago

Only having now gone a bit deeper through PlusCodes: I think suggesting this here leads to loading features into a specification. Anybody can understand lat,lon, GeoGSON Point etc.; abstracting this further & making this service dependent is IMO wrong here. It may be fine to use this as a part of a Phenopackets toolkit (translating GeoJSON, internal format ...), but I'm doubtful about its use; tooling for lat,long is everywhere, and these are standard coordinates. PlusCodes doesn't really solve anything (yes, I understand the attempt to provide a compact, "unambiguous" code to circumvent confusion from erroneous use of lat,lon units; but this is not a specification problem). And you have another library / service dependency (Google track record!?). I won't interfere if you have this as a "oneof" option, but still...

julesjacobsen commented 4 years ago

Closing this as the issue is no longer relevant to the spec.