Add structure to OSM tags

hdelva commented 5 years ago

Possibly related to #21.

It seems like there are two kinds of tags; one is structured with a predefined set of possible values, the other seems to contain free-form values. Both are currently represented as string literals that start with osm:.

For example:

osm:highway seems stuctured, with its values documented on https://wiki.openstreetmap.org/wiki/Key:highway#Values.
osm:destination:lanes seems less structured, its possible values seem to be concatenated informal location descriptions such as osm:U.Z.|Oudenaarde;Zwijnaarde;EXPO;Haven|Oudenaarde;Zwijnaarde;EXPO;Haven.

I'd suggest these changes:

Don't add the osm: prefix to the second kind of tags, and that just keep it as a string literal.
Define each of the possible values that can get used in the first kind of tags in a vocabulary, and set the @type to @id. This obviously depends on the vocabulary that still has to be defined but it'd be nice if someone with OSM experience could give us a list of structured tags and values.

xivk commented 5 years ago

@hdelva Agreed on this approach. Could you perhaps build a small example on how to handle this on the higway tags for example?

This is indeed related to #21 but we probably need to discuss this first.

hdelva commented 5 years ago

Actually, I think we'd still run into issues when trying to define the tags' semantics. I'm not an expert on OSM data but some tags seem to mean different things to different people. I came across https://wiki.openstreetmap.org/wiki/Path_controversy and https://wiki.openstreetmap.org/wiki/WikiProject_Denmark#Inconsistent_tagging_of_Danish_roads. And even when there is a consensus, communities often have their own conventions (https://wiki.openstreetmap.org/wiki/WikiProject_Belgium/Conventions/Highways) which may not be interoperable with other communities'.

To me this means we cannot refer to a wiki when building our vocabulary and that we should create our own non-ambiguous definitions. But we'll have to make sure we preserve the semantics the mapper intended and this doesn't seem possible when we add semantics to the existing tags. I suggest we create our own, more limited set, of properties.

For example, instead of trying to define the highway=footway, highway=path, highway=steps, highway=pedestrian tags, we can infer something like walk=yes. Similarly, each community should agree that a highway=primary is more important than a highway=secondary, so perhaps these could become importance=1 and importance=2 properties. The underlying semantics of that field can then simply be defined as 'more is better'.

Does this make sense? And does it sound doable?

xivk commented 5 years ago

I think going down this road would be a step too far and potentially lead use down a path without an end. We also don't want to make too much assumptions on how the tiles are going to be used.

If we infer walk=yes for example we are removing some of the meaning found in the OSM data and we are making a decision for users. The same for bicycle=yes, but then ask what is a bicycle, this will also differ between countries. The same for all other vehicle definitions. I think the best-effort approach in OSM in finding a consistent tagging system and catering to all these differences is pretty successful.

Those wiki pages focus on the inconsistencies there are in OSM but these exist in any dataset of the road network of this size. Overall OSM is pretty good at describing the road network consistently and it works well for routing.

hdelva commented 5 years ago

Fair enough, I suppose it doesn't make sense to focus on the inconsistently tagged data if most of it is fine.

That aside, what sort of example did you have in mind? Should I go over the wiki page and structure the tags I find there, or should I use some actual ways from OSM and clean those up?

xivk commented 5 years ago

Perhaps we could start with the highway and name tags? And move on from there?

hdelva commented 5 years ago

I've created a minimal ontology along with a config file that should make it easier to publish data. Everything is still open for discussion of course, and feedback is welcome. I'm hosting a rendered version on http://hdelva.be/tiles/ns/terms.html for now, but ideally this'll move to somewhere on the openplanner site with a redirect from https://w3id.org/openstreetmap/terms# to wherever we host it. The original ontology and the mapping file live in https://github.com/openplannerteam/routable-tiles-ontology now.

The readme is a bit crude, so here's an example of what would change. What used to be

{
   "@context":{
      "osm:nodes":{
         "@container":"@list",
         "@type":"@id"
      }
   },
   "@graph":[
      {
         "@id":"http://www.openstreetmap.org/way/13226858",
         "@type":"osm:Way",
         "rdfs:label":"Nieuwlandlaan",
         "osm:width":"osm:6",
         "osm:highway":"osm:tertiary",
         "osm:surface":"osm:asphalt",
         "osm:maxspeed":"50",
         "osm:nodes":[
            "http://www.openstreetmap.org/node/421872676",
            "http://www.openstreetmap.org/node/156522403",
            "http://www.openstreetmap.org/node/260601808"
         ]
      },
   ]
}

Would become

{
   "@context":{
      "osm:highway": {
         "@type": "@id"
      },
      "osm:hasNodes":{
         "@container":"@list",
         "@type":"@id"
      }
   },
   "@graph":[
      {
         "@id":"http://www.openstreetmap.org/way/13226858",
         "@type":"osm:Way",
         "osm:name":"Nieuwlandlaan",
         "osm:highway":"osm:Tertiary",
         "osm:maxspeed": 50,
         "osm:hasNodes": [
            "http://www.openstreetmap.org/node/421872676",
            "http://www.openstreetmap.org/node/156522403",
            "http://www.openstreetmap.org/node/260601808"
         ],
         "osm:hasTag": [
            "width=6",
            "surface=asphalt"
         ]
      },
   ]
}

The main differences are:

The osm:highway type in the context, because they're now semantically defined
Renamed osm:nodes to a more conventional osm:hasNodes. I don't think we should do this for the properties that are derived from OSM tags though.
osm:tertiary became osm:Tertiary because named individuals are usually in PascalCase.
osm:maxspeed became numeric
The name=* key became osm:name because the rdfs:label property is meant for human consumption.
There's a hasTag list with all the tags that don't have defined semantics (yet).

xivk commented 5 years ago

This is excellent, exactly what we needed, we will do some coding and get back to you once we have something useful! :+1:

xivk commented 5 years ago

Work is happening in: https://github.com/openplannerteam/routeable-tiles/tree/features/structure-osm-tags

Main todo still left is to parse the mapping json.

xivk commented 5 years ago

This is now done in the feature branch. Waiting for #28 for deployment.

More work is being done in this repo:

https://github.com/openplannerteam/routable-tiles-ontology

openplannerteam / routable-tiles

Add structure to OSM tags #23