ideditor / schema-builder

šŸ—šŸ· Create tagging schemas for iD
ISC License
12 stars 16 forks source link

Allow for country-specific `fields` and `moreFields` #94

Closed westnordost closed 9 months ago

westnordost commented 1 year ago

So, recently two presets were added to the iD presets: post boxes in the UK and post boxes in the US. These were duplicated from the original post box preset and the latter was modified to exclude these two countries. The only difference there is to the original post box preset is that each have added additional fields.

This is problematic. Not only does this trigger an issue in osmfeatures (see https://github.com/streetcomplete/StreetComplete/issues/4916), it is semantically wrong and from an (data) architectural point of view problematic.

Why current approach is problematic

Postboxes in the UK are not something else altogether from postboxes in other countries only because OpenStreetMappers like to tag the royal_cipher on it. Semantically, it is not the preset itself that is country-specific, but the individual fields are.

If one looks closely enough, as we in StreetComplete development regularly have to do, one will find that there are tons of fields and field values that are only in use in specific countries and thus should only be suggested (by UI) to be added there. From the top of my head, designation=* with certain values, such as barangay, are used on roads in the Phillippines and nowhere else. Now, using the same approach as for the post boxes would mean that many of the highway=... presets must be duplicated and one field be added. There are a ton more of such examples, if you don't believe me, I can come up with a couple more.

So, what I am saying is that not only semantically, I believe this is the wrong approach, but also from a data architectural viewpoint. The more you go into detail here and shoehorn country-specific preset fields into the current schema, the more we will end up with an unmaintainable mess: First, you have to duplicate the presets, so for any subsequent change on the preset, it mus be made on all the duplicated presets. Second, different fields are commonly used in different countries: Following the current approach, one would need to duplicate the preset for that set of countries if any of the fields differed from the global default.

One example: Sticking with the example of postboxes, only in some countries it is common that regular collection_times are posted on the boxes themselves (see postboxesHaveCollectionTimes.yml) - so one would have to split up the postbox preset once more. Then, if it turns out, that for example postboxes in certain countries simply never have a ref (see postboxesHaveRef.yml ), one would bisect the preset once more, so that there are then 4 presets (+ the GB, US preset): with collection times and ref, without collection times but with ref, without collection times and without ref and with collection times but with ref. It only gets worse from there.... šŸ˜¬

...for the royal cypher is actually present in more territories, too: postboxesHaveRoyalCypher. So you end up with potentially 8 different presets...

I could go on, but I think you get my point.

Solution

The proposed solution I already mentioned in the issue title: Certain fields of presets should simply be made country-specific, not the preset itself unless such things really just exist in certain countries. Post boxes or roads certainly not.

mnalis commented 9 months ago

As a quick fix here (and in related cases), may I suggest that if some sub-field is not commonly used worldwide (like royal_cypher for amenity=postbox), that it gets moved from fields to moreFields (instead of making separate US / UK presets for it) ?

westnordost commented 9 months ago

Actually, a Field can already be country-specific, I noticed. See https://github.com/ideditor/schema-builder#locationset-1

This means, that https://github.com/openstreetmap/id-tagging-schema/blob/main/data/presets/amenity/post_box/post_box-GB.json is simply wrong, and not a result of a missing feature.

tyrasd commented 9 months ago

Sorry for the late reply here.

I can understand that the modelling of country/region specific attributes as preset-variants causes some inconveniences for data consumers down the line, which use the index for .

semantically wrong

I guess this depends on how you interpret the data. If your application needs a 1:1 correspondence of map feature to preset file, the regional variants of a preset can be easily merged into a single "global" preset with country-specific attributes by a simple concatenation operation. e.g. post_box--unified.json would correspond to something like

{
  "default": {
    // content of presets/amenity/post_box.json
  },
  "GB": {
    // content of presets/amenity/post_box/post_box-GB.json
  },
  // etc.
}

This would contain the same information as the proposed solution of having region-specific fields/moreFields directly in a single preset.

At the same time, country-specific fields/moreFields would complicate the schema considerably, make it harder to maintain, and would be less flexible (e.g. when a regional preset-variant would define different addTags compared to the default preset).

--

Yes, in the example of the british post boxes, a dedicated regional preset was not necessary, as it could also be modeled using regional fields (this is now refactored with ae68feef3c3d8fc99de1b894f109a2ee34fee7a8). But for the -US case this was not possible, because there we want to show the non-regional drive_through field as a regular field (and not just as a moreField like in the rest of the world).