gravitystorm / openstreetmap-carto

A general-purpose OpenStreetMap mapnik style, in CartoCSS
Other
1.53k stars 817 forks source link

Different geometry decisions (point/line/polygon) on database re-import #3611

Closed sommerluk closed 4 years ago

sommerluk commented 5 years ago

Currently there is no database re-import planned. And apparently also no urgent need to do so. Anyway it might be useful to have a list of geometry decisions (point/line/polygon) that we might maybe change if in the future we would make a database re-import.

kocio-pl commented 5 years ago

Does it cover import issues like #3606?

jeisenbe commented 5 years ago

allotments=plot should imported as a polygon, so that we have the option of solving #432 "render ref numbers for allotments=plot"

HolgerJeromin commented 5 years ago

Healthcare #3639

bhousel commented 5 years ago

In case it helps, iD's areaKeys list is published here: https://github.com/osmlab/id-area-keys

The main file in there areaKeys.json contains a dictionary of keys that imply polygon, and subkeys of exceptions that imply linestring.

kocio-pl commented 5 years ago

I just asked OWG about possible database reload on https://github.com/openstreetmap/chef/issues/211.

matthijsmelissen commented 5 years ago

@kocio-pl I think this might be a bit too hasty. Wouldn't it be better to first discuss within our own repository if and when a database reload is desired, before asking externally?

kocio-pl commented 5 years ago

Since this is about synchronizing efforts and we're dependent on them, it's good to know both sides IMO.

jeisenbe commented 5 years ago

boundary=national_park and boundary=protected_area should be added to local_polygon_values

sommerluk commented 5 years ago

boundary=national_park and boundary=protected_area should be added to local_polygon_values

Cross-reference #1141

sommerluk commented 5 years ago

@bhousel Thanks!

jeisenbe commented 5 years ago

"Golf=*" as polygon: 'rough', 'fairway', 'driving_range', 'water_hazard', 'green', 'bunker'

Possibly 'tee'?

The other values should remain as lines

jeisenbe commented 5 years ago

The key "emergency" should probably be imported as polygons.

jeisenbe commented 5 years ago

For reference, here are the list of keys that the ID editor considers to be areas (polygons) when drawn as closed ways, with the exceptions.

"advertising" (except "billboard") "aerialway": (Except "cable_car", "chair_lift", "drag_lift", "gondola", "goods", "magic_carpet", "mixed_lift":, "platter", "rope_tow", "t-bar") "aeroway": (Except "runway", taxiway") "allotments" "amenity": (except "bench") "area:highway" "attraction": (except "dark_ride", "river_rafting", "summer_toboggan", "train", "water_slide" ) "bridge:support" "building" "camp_site" "club" "craft" "emergency" (except "designated", "destination", "no", "official", "private", "yes") "golf": (except "hole", "lateral_water_hazard", "water_hazard") "healthcare" "historic" "industrial" "internet_access" "junction" (except "circular", "roundabout") "landuse" "leisure" (except "slipway", "track") "man_made" (except "breakwater", "crane", "cutline", "embankment", "groyne", "pier", "pipeline") "military": "natural": (except "cliff", "coastline", "ridge", "tree_row") "office": "piste:type": (except "downhill", "hike", "ice_skate", "nordic", "skitour", "sled", "sleigh") "place" "playground" (except "balancebeam", "slide", "zipwire") "power": (except "cable", "line", "minor_line") "public_transport": (except "platform") "residential" "seamark:type" "shop": "tourism" (except "artwork") "traffic_calming": (except "bump", "cushion", "dip", "hump", "rumble_strip") "waterway" (except "canal", "dam", "ditch", "drain", "river", "stream", "weir")

chadrockey commented 5 years ago

@jeisenbe golf:cartpath and golf:path are also lines that should be excluded.

lateral_water_hazard and water_hazards are now exclusively areas and should be removed from the exceptions list.

jeisenbe commented 5 years ago

@chadrockey - the list is from the ID Editor, and I've posted it just for information. I agree that water hazards should be polygons.

I think you can ask for this to be fixed in the ID Editor Github repository, with this link: https://github.com/openstreetmap/iD/issues/new

chadrockey commented 5 years ago

@jeisenbe The changes I mentioned were already made in the iD editor and are set to be released soon.

pnorman commented 5 years ago

I've been thinking about the best way to accumulate changes we want to make for the next major release with schema changes, etc. The best route is similar to what we did before with a long-running dev branch kept up to date with the minor releases that we can merge when we feel it's done.

jeisenbe commented 5 years ago

According to this comment, it's possible a database reload could happen soon. Should I submit a PR that includes the new geometry decisions now? Or do we need to wait? https://github.com/openstreetmap/chef/issues/211#issuecomment-479484125

pnorman commented 5 years ago

I'll make a new branch to PR against after v4.21.0. I don't see any rush since we're going to need to test any changes, and part of testing is full planet imports, so we'll take longer than setting up the new server.

jeisenbe commented 5 years ago

part of testing is full planet imports

Why is this required? I would think that testing with an large extract should have the same results, or is this to make sure that performance is not too slow during the full planet import after the Lua transformations are changed?

pnorman commented 5 years ago

I would think that testing with an large extract should have the same results, or is this to make sure that performance is not too slow during the full planet import after the Lua transformations are changed?

It depends on the nature of the changes, but if I were reviewing it or submitting it, I'd want to check how it works on the full dataset. There's also not that much difference between import time of a large extract like Europe or NA and the full planet.

jeisenbe commented 5 years ago

Re: golf; does anyone know if golf=out_of_bounds is meant to be mapped as a linestring (open way) or a polygon (closed way)? There is no wiki page. It's used <50 times, but it's mentioned on the main golf=* page - http://wiki.openstreetmap.org/wiki/Key:golf

jeisenbe commented 5 years ago

These are my plans so far on what keys and tags to import as polygons vs linestrings.

Even if we do not end up rendering all of the features, they should be available in the database so that other map styles based off of this style can use these features.

Keys to be imported as polygons:

Tags to be imported as polygon features:

Tags to be imported as linestring features:

Discussion

I reviewed the wiki pages and Taginfo for each of these keys. Craft, Club, and Healthcare are all often used to map features as an area with closed ways, and there do not appear to be any features that should be mapped as linestrings.

The emergency key is a mix of access restrictions and features. I believe it would be best to only render the features that are clearly unique (ignoring duplicates like emergency=water_tank and emergency=fire_water_pond), and which are used over 100 times and documented, so I selected the list above after reviewing taginfo and the wiki pages. See http://taginfo.openstreetmap.org/keys/emer

The Golf key has a number of features that are usually mapped as areas, but golf=path, golf=cartpath and golf=hole are only used for linestrings. While it is unlikely that we will render golf=path or =cartpath, it is simplest to import all the golf features as polygons, and list these as exceptions

See below for Playground.

There are a few other features in the list from the ID editor (above). I don't think we need to import "area:highway" because rendering this has already been rejected (if I recall correctly?), and it's not necessary to import "industrial" or "residential" because these ought to be tagged with "landuse".

I haven't reviewed these keys yet, but I think they will need separate discussions

jeisenbe commented 5 years ago

The Key "playground" is a little trickier.

Both JOSM and ID suggest that these should all be mapped as nodes or closed ways.

I downloaded 11,035 ways (there are 11, 101 according to taginfo) and 8,642 are closed ways, 2353 are unclosed. Thus 78% are closed, 22% not. Of the unclosed ways 1,870 have exactly 2 nodes, which suggests they were intentionally mapped in this way (and not a mistake).

The most common tags on unclosed ways are slide (810 features compared to 403 on closed ways), zipwire (593 vs 84), swing (223 open vs 495 closed), balancebeam (129 vs 23), structure (126 open vs 1730 closed), and climbingframe (103 vs 506 closed). So zipwires, slides and balance beams are usually tagged as open ways, but the rest are more commonly tagged on closed ways.

My (limited) understanding of the lua transformations is that the unclosed ways will still be imported as linestrings, and only the closed ways will be imported as polygons. Is this correct? In this case, there should be little harm in including all values in the key as polygons, right?

But we should exclude "balancebeam", "slide", and "zipwire" from the polygon transformation, as ID has done.

jeisenbe commented 5 years ago

The Key "advertising" is considered to be a closed way by the ID editor, but most of the features mapped as ways are open ways, mainly billboards. There are 479 closed ways, and 1191 unclosed ways. 1044 of the open ways have only 2 nodes. Half of the closed ways are advertising=column, and most of the the open ways are advertising=billboard, which is the most popular value for advertising=*

This suggests that advertising=column should be imported as a polygon, but advertising=billboard (and advertising=sign) should be mapped as linear ways. The tag advertising=poster_box is used on open and closed ways equally (about 50 times each).

I've opened an issue at the ID editor to recommend adding advertising=sign to the list of linestring exceptions.

jeisenbe commented 5 years ago

"Attraction=*" is imported as an area feature by ID Editor, but there is a significant list of exceptions: "dark_ride", "river_rafting", "summer_toboggan", "train", and "water_slide".

I have reviewed attraction=roller_coaster, and found that the majority of ways are open, and mapped along the tracks of the roller coaster; 1572 at this time. This is similar to the usual mapping practice for the other linear rides above. However, there are 331 that are mapped as closed ways, and an old proposal recommended this mapping.

Also, most attraction=log_flume features are mapped as linear ways. This should also be imported as a linestring, as should attraction=slide.

The other values of attraction with more than 50 uses all appear to be point features or area features: https://taginfo.openstreetmap.org/keys/attraction#values

I would suggest listing the key "attraction" in the polygon features, but then having these exceptions:

EDIT: See below; it may be better to list the individual polygon features, rather than making a long list of exclusions.

jeisenbe commented 5 years ago

The key "camp_site" is used with the values "=camp_pitch" and "=pitch" to show individual tent locations within a tourism=campsite. There was a proposal that suggested using camp_site=camp_pitch in 2015 on a node or area: https://wiki.openstreetmap.org/wiki/Proposed_features/camp_site_pitch

We could consider importing just "camp_site=camp_pitch" as a polygon, because it is more commonly used than =pitch and has some documentation or we could import both as polygons.

(Ideally someone should document this tag with a wiki page as "in use")

imagico commented 5 years ago

My (limited) understanding of the lua transformations is that the unclosed ways will still be imported as linestrings, and only the closed ways will be imported as polygons. Is this correct?

Yes, the linestring vs. polygon decision is only for resolving the inherent ambiguity of closed ways in the OSM data model.

In this case, there should be little harm in including all values in the key as polygons, right?

IIRC in general processing a closed way into a polygon is more expensive than into a linestring. Practically code complexity is probably the more significant argument though.

On a more general note - we need to decide how far we intend to go with preemptively making this decisions for tag based on a merely theoretical possibility they might be rendered. What needs to be kept in mind is that there are many other map styles which try to use the same database scheme as we do so even decisions for tags we don't render can have some relevance.

And idea to consider in that regard is: How many of the changes to this we are discussing now could have already been discussed with the previous database reload? For golf=* this is definitely the case. In other words: I don't think we need to worry so much about covering everything from the latest fashions in tagging.

As a clear limit i'd say the number of entries in these lists we actually do render in the style should definitely be larger than those we include preemptively. Based on that including a key of which we only render 1 or 2 values at the moment with a polygon default but half a dozen exceptions for individual values seems somewhat questionable.

jeisenbe commented 5 years ago

processing a closed way into a polygon is more expensive than into a linestring

That's good to know. In that case, it would be better to exclude all features that are clearly meant to be linestrings.

I think the list that the ID editor uses requires that the key be listed as a polygon and then individual features must be excluded, but we have the option of just picking individual tags to transform, so we can avoid adding the whole key in some cases.

there are many other map styles which try to use the same database scheme as we do so even decisions for tags we don't render can have some relevance.

Yes, because of this, it would be good to have a similar list to that used by ID, even if we don't intent to render "club=*" or camp_site=camp_pitch anytime soon, because these features are clearly meant to be polygons. So far it looks like most of the ID team's choices are reasonable, but we should check them all.

the number of entries in these lists we actually do render in the style should definitely be larger than those we include preemptively.

In this case, it may be better to pick the individual values of attraction=* that are always mapped as areas, eg =animal, =amusement_ride, =carousel, =big_wheel, and possibly attraction=yes

I don't think we need to worry so much about covering everything from the latest fashions in tagging.

True, but I believe all of the keys above have been established for several years at least.

I hope that if we include all of the important features, we won't have to think about another database reload for a couple more years.

jeisenbe commented 5 years ago

I checked out the ways tagged barrier=toll_booth. There are 2,161 closed ways, and only 54 ways that are not closed, so it looks clear that this is an optional way of mapping toll booths. However, 40,000 are tagged as nodes, the recommended method from the wiki page.

jeisenbe commented 5 years ago

The last few keys from the list above:

1) internet_access=wlan - I don't believe there is a reason to consider rendering this on a general map, so it does not need to be added

2) piste:type - there are several values that can be mapped as areas, but many are also mapped as linear ways. By far the highest usage are piste:type=downhill and piste:type=nordic, with about 90k and 70k uses. The usual way to map these is as a linear way. The wiki page for piste:type=downhill recommends adding "area=yes" if mapping as a closed way, the piste:type=nordic page suggests using relations, but both of the pages both show "area" as an option in the box.

There are far too many piste:type=downhill/nordic objects in the database for me to download them all; too bad there is no way to search for closed or unclosed ways in overpass API.

There are two more values that can clearly be mapped as areas: =snow_park and =playground, but these are both used less than 1000 times.

I'm inclined to leave this key off the list for now.

3) seamark:type - There many values, some of which repeat other more commonly used tags. The wiki page also says that "The tags described in these pages may be placed on any node, way (open or closed) or multi-polygon," which makes it difficult to decide on the correct geometry to import. However, taginfo shows that all of commonly-used values are mapped exclusively on nodes (>99% of the time). There are only 2 tags with >2000 uses that are mapped as ways - seamark:type=berth and seamark:type=anchorage.

Both of these mark areas where ships can be berthed / anchored / "parked", so I think it's unlikely we will render them anytime soon. I would not recommend including this key.

4) bridge:support- taginfo

5) traffic_calming - this key is used 436, 720 times, mainly as an attribute of the way or on nodes. The original proposal in 2008 suggested mapping on nodes or on the highway way, and the main wiki page for the key also shows only nodes and lines as appropriate.

traffic_calming=island is mentioned as mappable with an area on the individual wiki page for this features, which is much newer. The ID editor also allows a couple of the others to be mapped as areas, such as traffic_calming=chicane, but I see that only 100 out of 1210 ways are closed for this feature.

For routing applications, it would be ideal if the way or a node on the way are tagged, so I'm not sure if mapping as an area is a good idea.

I don't believe it's necessary to import any of these as polygons at this time.

jeisenbe commented 5 years ago

Linestrings to add: 1) Power=cable Power cables are never mapped as areas, so we should add this to the list of exclusions, along with power=line and power=minor_line (which area already only imported as linestrings).

2) Coastline? - is the coastline actually imported, or is it always discarded, with the shapefiles used instead? If imported, it needs to be specified as a linestring, in the case of small islands that may be mapped as closed ways. EDITED: according to this link about osm2pgsql, =By default "natural=coastline" tagged data will be discarded based on the assumption that post-processed Coastline Checker shape files will be used."

3) man_made=cutline - this is only rendered as a line. It's unlikely to be mapped as a closed way, but should be specified as a linestring only, just in case

4) man_made=pipeline - also should only be a linestring. See #640 - still open, and likely to be rendered soon.

jeisenbe commented 5 years ago

Issue #3303:

jeisenbe commented 5 years ago

building = no should not be imported as a polygon feature

eg barrier=wall with building=no

jeisenbe commented 5 years ago

I believe it is now possible to submit PRs to address this, correct? Should each key or tag be submitted in an individual PR, or would it be possible to combine some?

pnorman commented 5 years ago

Combine in a reasonable manner

jeisenbe commented 4 years ago

Are the other contributors and maintainers interested in solving this issue in the next month or two? We have now had the schema changes branch for 6 months. I think if we set a timeline we are more likely to get this done.

I can submit some of the PRs myself, and I would be even happier to review PRs submitted by other contributors (remember to check out the schema changes branch, and then make a branch based on that to add your changes).

But one issue is that @pnorman recommended testing the changes with a whole planet.osm.pbf import prior to merging the schema changes branch back to master, so that we can check if performance is significantly changed or any problems develop.

This would need to be done shortly before the team at openstreetmap.org plans to next reload the database - I was hoping for January.

Who has a server with enough RAM to import the planet? This is beyond the capacity of my hardware. Is anyone able to volunteer?

pnorman commented 4 years ago

Just a reminder, when doing a new release, merge the release tag into the schema_changes branch as described in RELEASES.md

I haven't had a huge deal of time to work on osm-carto, but if there's schema change stuff that needs review, ping me on it.

jeisenbe commented 4 years ago

Thank you, I did not realize that applied to schema_changes. I've merged v4.21.1 into schema_changes.

jeisenbe commented 4 years ago

Some additional possibilities:

jeisenbe commented 4 years ago

Updated to-do list (Last updated 2020-01-18)

Done:

Pending (need approval or discussion):

To do:

Possible - need to discuss

jeisenbe commented 4 years ago

Healthcare was already done, sorry about that.

@Adamant36 if you want to do one of these, building = no as a linestring should be uncontroversial

These 4 are likely to be accepted but will need some discussion:

jeisenbe commented 4 years ago

I would like some comments from the other maintainers and contributors about the last 3.

It appears the are unlikely to be rendered in this style, but should we include them as polygon features to make adapting the style easier?

Adamant36 commented 4 years ago

I'm leaning toward yes on advertising=column, yes on the emergency= tags, and no for attraction= because its just a clunky tagging scheme generally and I don't like attraction=yes. It seems to be added to a bunch of vaguely "attractive" stuff. As an added descriptor tag of what a person might be interested in. Which we already have tourism=attraction for. The same could also be said for some of the emergency=* stuff I guess, but to a much less degree. So, I'm OK with it.

Advertising=column I have zero problem with since Its already rendered as a node and the wiki says it can be mapped as a polygon. Although, I think it should just be mapped as a node, but that's not my call to make. So it might as well be imported as a polygon to. Maybe it would be worth asking on the mailing list first though before doing it. I'm not sure how an advertising column could be mapped as an area anymore then a bench, waste basket, or similar thing. But that's just me.

sommerluk commented 4 years ago

It appears the are unlikely to be rendered in this style, but should we include them as polygon features to make adapting the style easier?

While I didn't look into the three case you have mentioned, I think in general it is a good idea to have a better import also for things we don't actually render. As you say, it makes adapting the style easier. And it makes the import schema more useful also for projects that do no fork our rendering rules, but that need just a sort of default import schema.

jeisenbe commented 4 years ago

[Advertising=column is] already rendered as a node

I did not realize this. In that case we might want to include it, even though only 3% of the features (303 out of 11k) are mapped as an area according to taginfo.

pnorman commented 4 years ago

I'm wondering if we should have general logic for =no not matching the rules for . This would cover building=no

jeisenbe commented 4 years ago

general logic for =no not matching the rules

That sounds good. What is the syntax for that?

jeisenbe commented 4 years ago

I've looked into advertising=column with more detail. The suggestion to map as an area seems to be a mistake. 98% are mapped as nodes. I checked a sample of the 222 features mapped as closed ways (another 80 or so are mapped as unclosed ways), and almost all are 1 to 2 meter diameter circles, so mapping as an area is not adding any information that could not be provided with width=1.5. The main text of the page has only mentioned mapping as a node. Since this tagging is quite rare, we do not need to encourage it.

pnorman commented 4 years ago

That sounds good. What is the syntax for that?

It would need to go inside the loops searching through the objects tags:

https://github.com/gravitystorm/openstreetmap-carto/blob/295d65d7623c6a323d83bb254cefadeeb8e4c10c/openstreetmap-carto.lua#L395-L397

Right now for tags that match and don't have a polygon override it's an area, a v ~= 'no' check would need to be added.