gravitystorm / openstreetmap-carto

A general-purpose OpenStreetMap mapnik style, in CartoCSS
Other
1.53k stars 817 forks source link

Using external data to adjust rendering #3720

Open jeisenbe opened 5 years ago

jeisenbe commented 5 years ago

Consider using Wikidata information to improving rendering of some features

Implementation?

I do not have much experience with using Wikidata, I hope that this section can be updated when clearer ideas are offered by other contributors

Issues:

matkoniecz commented 5 years ago

My main problems are collisions with https://github.com/gravitystorm/openstreetmap-carto/blob/master/CARTOGRAPHY.md#main-goals Especially with

Being understandable and supportive for mappers - To serve as feedback for mappers and encourage correct mapping this style needs to render the data in a way that allows mappers to understand how the data produces a certain rendering result based on basic observation without in depth understanding how map rendering works or looking at the style implementation.

and

This way it shows the richness of OSM data and gives a broad recognition to the mappers' work.

In general use of feedback for mappers would be tricky if map rendering can be changed by changes in a third-party database.

There is also problem of replicating Wikidata database, but for me it is a big but a secondary problem as at least in theory it is solvable.

Also, given my interactions with Wikidata I have big problems with usability. First of all copyright issues - see https://wiki.openstreetmap.org/wiki/Wikidata#Importing_data

As far as I can see importing data from Wikidata into OSM database, or mixing this two databases is not feasible due to copyright mess on Wikidata side.

Note that Wikidata is CC0 in USA. They import databases covered by copyright-like restrictions in EU.

Wikidata community also do not care about copyright - see https://www.wikidata.org/wiki/Wikidata:Bot_requests/Archive/2017/10#OpenStreetMap_objects where proposal to import OSM data (ODBL licensed) into supposedly CC0 database completely ignored question of copyright issues (except my comment).

It is not unique - see for example https://www.wikidata.org/wiki/Wikidata:Bot_requests/Archive/2016/10#Item_located_in_Rome,_with_coordinates_not_in_Rome


There is also additional issue - data quality. For example:

but is usually available in Wikidata

What is the property code? I would like to run query checking is it really usual (I suspect that less than 1% of peaks has this data).


In short: there are major technical difficulties (mirroring Wikidata database) but it is moot as fixing rendering caused by invalid data should never require editing third party database.


EDIT: With exclusion of describing wikidata tag as "non-verifiable wikidata IDs to OSM features, pretending to create one-to-one relationships between databases" I fully agree with @imagico comment.

imagico commented 5 years ago

@matkoniecz already pointed out some of the issues this idea has.

Apart from that on the formal level there is

https://wiki.openstreetmap.org/wiki/Featured_tile_layers/Guidelines_for_new_tile_layers

to be considered, in particular the Up-to-date data and OpenStreetMap data points.

My opinion: I would be strictly against adding something that depends on and therefore encourages adding non-verifiable wikidata IDs to OSM features, pretending to create one-to-one relationships between databases with completely different data models and different perceptions of reality.

What i would be open to consider is using external crowd sourced data sources if:

My working assumption is that the first and third point are essentially mutually exclusive. If some type of information is clearly not suitable for recording in OSM (for example some subjective assessments or authoritative classifications) it is unlikely that this information is recorded in an external community project in a way that makes it suitable for this style. But this is just a gut feeling. I would be open to and eager to hear about counter-examples.

polarbearing commented 5 years ago

Prominence of natural=peaks is not appropriate data to add directly to the OSM database

https://wiki.openstreetmap.org/wiki/Proposed_features/key:prominence Usage >6k

imagico commented 5 years ago

@matkoniecz - yes, i am aware wikidata IDs are a contested subject. My main point here is that this style should try to stay neutral on the matter.

Regarding prominence - for information and future reference - of the 6k peaks with prominence tag

For importance rating peaks for prioritizing display on the map computing and using topographic isolation - see here for an example - is both simpler and more useful than prominence, which would often rule out peaks in flatland regions despite them being the most significant ones within a large area. Denmark's highest peak for example only has a prominence of about 160m but is more than 200km from any higher peak.

jeisenbe commented 5 years ago

I changed the title, because the discussion is extending to other possible external data sets, not just Wikidata.

fixing rendering caused by invalid data should never require editing third party database.

I agree that this is the biggest problem with considering any sort of crowd-sourced dataset, and it probably rules out using Wikidata

If some type of information is clearly not suitable for recording in OSM ... it is unlikely that this information is recorded in an external community project in a way that makes it suitable for this style

This probably rules out other crowd-sourced or open databases as well.

The data cannot be computed from OSM data and other generic and static open data sources

@imagico here opens up the possibility of using "static open data sources." The useful example that comes to mind is using open elevation data to directly compute isolation or prominence of peaks and find the orientation of saddles. I know that Opentopomap does this for saddles, and perhaps for peaks as well.

jeisenbe commented 5 years ago

For importance rating peaks for prioritizing display on the map computing and using topographic isolation .. is both simpler and more useful than prominence

Example (in German): https://wiki.openstreetmap.org/wiki/User:Maxbe/Dominanz_von_Gipfeln

This idea uses an open DEM (digital elevation model) to calculate the distance from a peak to the next highest point. I imagine this is a much simpler computation than calculating prominence

Would others support the use of DEM data to calculate importance criteria for peaks and to improve the display of saddles?

matkoniecz commented 5 years ago

Would others support the use of DEM data to calculate importance criteria for peaks and to improve the display of saddles?

I am not opposed. Though DEM data with resolution high enough to do that, without severe artifacts anywhere (to avoid introducing rendering problems not caused by invalid OSM data) and on a suitable license may be not available at this moment.

imagico commented 5 years ago

Two points on that: