Introduction of SQL functions to the style

gravitystorm / openstreetmap-carto

A general-purpose OpenStreetMap mapnik style, in CartoCSS

Other

1.55k stars 822 forks source link

Introduction of SQL functions to the style #5004

Closed imagico closed 1 month ago

imagico commented 3 months ago

In #4952 we have the first concrete change proposed in this style that is using SQL functions in its implementation. This choice was based on the implementation of a similar change elsewhere (https://github.com/gravitystorm/openstreetmap-carto/issues/214#issuecomment-1058639224) which #4952 is largely inspired by. To me this seems a natural and non-controversial way to approach this but apparently it is not so i open this issue to discuss the idea of introducing the technique of SQL functions as a strategic measure. This would allow us to expand our technical capabilities significantly and address key issues of the style while ensuring our code base remains well maintainable and style development remains well accessible for map designers.

Background

Our SQL code based has expanded quite massively over the history of this style and despite efforts to minimize code duplication using methods supplied by SQL and in YAML there is a lot of repetition in our SQL code.

Ultimately the problem is that all SQL functionality is implemented inline in project.mml. This severely limits our ability to implement necessary changes. #4952 could still be implemented with inline SQL code, but it would be very awkward and difficult to maintain this way. Other strategic issues like #3880 or design ideas that require ground unit processing (see for example #4340, #4346, #4661, #4948), however, depend fully on us moving to new techniques of structuring and modularizing SQL code.

I had mentioned this necessity before in https://github.com/gravitystorm/openstreetmap-carto/pull/4431#issuecomment-1179700322:

So far on the SQL code level of this project (which - after the MSS code - is the second most important level where design work happens and that most map designers need to deal with) we are inlining all code in project.mml (with the exception of use of YAML features to de-duplicate completely identical queries). I don't think this is sustainable any more. In particular resolving #3880 in a scalable and future-proof way would not be possible under that paradigm. Same applies most likely for ground unit rendering (#4340, #4346). So introducing use of views, functions or lookup tables or a combination of such seems inevitable if we want to make progress on the larger issues this project faces.

Proposal

The suggestion is to introduce SQL functions as a method to structure and modularize SQL functionality and to avoid unnecessarily duplicating SQL. In #4952 the purpose of these functions would be tag interpretation tasks - which need to be performed identically at several places in the road layers. But the use of function is not limited to that, for ground unit processing functions would be used for geometric calculations (converting between map units, ground units and screen pixels). For addressing #3880 a central use of functions would likely be the scale_denominator to zoom level conversion.

SQL functions are not the only method that can be used to improve our abilities to work with SQL in solving map design problems. #3880 will likely require other methods in addition (like script generated SQL code or lookup tables) but SQL functions are quite clearly the most versatile method here. Other use cases can be found in the AC-Style.

The idea would be to approach this in a conservative way:

use #4952 as a first test to evaluate the method
have the functions in a file functions.sql next to the existing indexes.sql.
stick to functions in LANGUAGE SQL as far as reasonably possible.
if #4952 turns out to be sucessful we could identify existing cases of substantial duplication in our SQL code and explore if these can be reasonably avoided by using functions.
new suggestions to using functions would be looked at on a case-by-case basis.
care would need to be taken to evaluate if the use of functions has a significant performance impact. This is usually possible to test already on a small test database to not a big practical problem.
when we make changes to function names/parameters we need to take care that previously installed older versions are correctly removed.
we need to communicate the need to install the functions we use to style users. Practically this is not going to be a big issue since we already require users to install our indices.

pnorman commented 3 months ago

Functions can be useful for logic that operates across multiple objects or multiple tables, but most of the examples I see are functions which transform tags into something more useable. This is the job of osm2pgsql, which converts OSM tags into Postgres columns that are useful for making a map.

Do you have examples of functions that don't duplicate functionality of osm2pgsql?

imagico commented 3 months ago

Let me get one thing strait: It is us who decide how we want to use the tools we use, not the developers of these tools deciding about what these should be used for long after OSM-Carto was created. As said before in #4952 - if you want to suggest us to move to doing style specific tag interpretation in preprocessing, either in SQL (https://github.com/gravitystorm/openstreetmap-carto/pull/4952#issuecomment-2276851669: moving the SQL logic to import time) or via osm2pgsql then please present your approach and argue your case why you think this would be of strategic benefit for us. And since the linear discussions on this issue tracker are not suitable for mixing several topics i would strongly suggest moving that to a separate issue.

Regarding applications of SQL functions other than tag interpretation: I have already provided several pointers: a viable solution to #3880 will likely require a !scale_denominator! to zoom level conversion - which is most suitable for being implemented in an SQL function. And since !scale_denominator! is a render time variable this is inherently impossible to do during database import of course. Another natural use case is - as mentioned - conversion between map units, ground units and screen pixels. Currently we only do map units to screen pixels conversion 20 times via way_area/NULLIF(POW(!scale_denominator!*0.001*0.28,2),0) - there is not that much benefit of encapsulating that in a function but if we decide to work with ground units at some point (and - as mentioned - there are plenty of cases where this could be desirable) that would change. Further examples can be found in the AC-Style, for example the per-country spatially differentiated lookup of default language, currency and driving side of roads. I also widely use functions that combine numeric interpretation of tags with ground unit calculation to determine the drawing size of map elements. Like:

https://github.com/imagico/osm-carto-alternative-colors/blob/7f2aa12ce375e318b60b05332bb7ad135d98a613/sql/symbols.sql#L10 https://github.com/imagico/osm-carto-alternative-colors/blob/7f2aa12ce375e318b60b05332bb7ad135d98a613/sql/roads.sql#L449

But yes, a wide range of potentially useful applications of functions for us is style specific tag interpretation. That is why i think it makes sense to test this method with a tag interpretation application in #4952.

dch0ph commented 3 months ago

This is the job of osm2pgsql, which converts OSM tags into Postgres columns that are useful for making a map.

I disagree with this presentation. The job of osm2pgsql is to import the data needed to make a useful map. But unless there are real performance issues, interpretation of data should be left to point-of-use for maximum flexibility, otherwise any change to interpretation involves a database reload.

I feel there's a big difference between data normalisation, which should be done at import, and tag interpretation which is best handled by the style, not least because it allows the database to be shared. I used to create custom columns etc. in my own style, and then got rid of this since it was so much better to have a common database structure that could be shared between styles.

Generated columns may be useful in future. These may be easier to support in the flex style. In "classic" osm2pgsql mode, they were a bit clunky as re-importing the database killed the generated columns. In either case, having the SQL function that generates the column is needed!

imagico commented 2 months ago

Reminder to everyone: This proposal has been open for discussion for more than a week now and progress on #4952 depends on establishing consensus on this.

The question asked regarding examples of SQL functions for purposes other than tag interpretation has been answered.

I would like to once more explicitly invite a critical look at what is suggested here from everyone. Although several IMO weighty arguments have been put forward for the use of SQL functions (including their use for style specific tag interpretation) arguments speaking against it would be very valuable to hear.

I also want to summarize the timeline on #214/#4952 to make clear again why making progress here is of high importance:

214 is our second oldest issue and has been open for nearly 11 years now. A lot of other issues regarding visualization of access restrictions on roads (implicit and explicit) depend on this being addressed in a way that provides flexibility for further design adjustments.
A first selective ad-hoc change for slightly extended access tag interpretation was made 10.5 years ago in #371 - and was rejected calling to have an overall strategy on this topic (semantics, rendering) rather than to make ad-hoc changes.
A first concrete proposal for a tag interpretation strategy based on the concept of de facto restrictions for the primary mode of transport was made 5 years ago. This is suitable to be used without changes in the rendering concept and formed the principal basis of the practical work following.
A first proof-of-concept to address #214 within the technical and design framework of OSM-Carto using SQL functions had been presented 2.5 years ago.
4952 as the first and only attempt to concretely solve #214 with an overall and forward-looking strategy was created nearly half a year ago now and @dch0ph invested a lot of time over the months in adjusting this to various suggestions and requests made.

If there are arguments why - in light also of this history - the approach taken in #4952 of using SQL functions for implementing the tag interpretation logic as part of a principal strategy of conservatively introducing SQL functions in this project like proposed here is not a good idea for OSM-Carto in pursuit of its goals i want to hear them. Otherwise i would say it is really more than time to get #4952 out of the door.

pnorman commented 2 months ago

Regarding applications of SQL functions other than tag interpretation: I have already provided several pointers: a viable solution to #3880 will likely require a !scale_denominator! to zoom level conversion

Yes, I agree that some calculations that require information only available at time of generation can make sense as functions - basically anything involving Mapnik tokens. I still don't see that it's the best route for calculating values that are only determined by the object's properties.

dch0ph commented 2 months ago

I still don't see that it's the best route for calculating values that are only determined by the object's properties.

But we calculate internal values from object properties all the time:

                CASE WHEN surface IN ('unpaved', 'compacted', 'dirt', 'earth', 'fine_gravel', 'grass', 'grass_paver', 'gravel', 'ground',
                                      'mud', 'pebblestone', 'salt', 'sand', 'woodchips', 'clay', 'ice', 'snow') THEN 'unpaved'
                  WHEN surface IN ('paved', 'asphalt', 'cobblestone', 'cobblestone:flattened', 'sett', 'concrete', 'concrete:lanes',
                                      'concrete:plates', 'paving_stones', 'metal', 'wood', 'unhewn_cobblestone') THEN 'paved'
                END AS int_surface,

Apart from the one-off bump in install complexity, I don't follow why it is worse to do this via an SQL function rather than as in-line SQL. My understanding is that small functions would be inlined anyway.

We would not want to calculate int_surface at import time - you would need a reload with any change to surface types interpreted.

As noted above, generated columns may offer a third way in the future, but you still need the function that does that calculation.

@dch0ph invested a lot of time over the months in adjusting this to various suggestions and requests made.

But at least I'm benefitting from this work in my own style ☺️ ...

imagico commented 2 months ago

I still don't see that it's the best route for calculating values that are only determined by the object's properties.

This does not provide much of a chance to achieve consensus here. You are essentially asking us to prove a negative - the impossibility of a better solution to exist. I am (a) not convinced that this is the case and (b) not making any claim in that direction at all with this proposal.

Various arguments have been presented why we think introducing use of SQL functions in OSM-Carto at this stage is a good idea and why #4952 is a suitable first use case - exactly because, like @dch0ph pointed out - it does not introduce something technically fundamentally new (like it would be for ground unit rendering) but structures and modularizes something into functions that we have so far always done inline in project.mml - at the expense of substantial code duplication and poor maintainability. If you find the arguments presented flawed or have concrete arguments against this proposal please present them so we can have a fact based discussion rather than just an exchange of opinions and can try to come to a consensus view on this.

Stating that you are not convinced but not actually arguing on the matter does not provide much of a foundation to work towards consensus.

imagico commented 2 months ago

My understanding is that small functions would be inlined anyway.

I don't think this is primarily a matter of the length in code, more about what the function is doing. I used the case example of the map units to pixels conversion (way_area/NULLIF(POW(!scale_denominator!*0.001*0.28,2),0)) - that is a static unit conversion that does not implement any map design decision and will never need to me modified. Separating that out into a function with a descriptive name (like mercator_square_meters_to_pixels()) would only have the benefit of hiding the exact implementation of the math - which might make it a bit easier to work with for some map designers but more difficult for others. Bottom line: Very little or negative net benefit. This will change if the logic gets more complicated of course (like for ground unit conversion).

Things are different for map design specific logic. For example we have 7 occurrences of

CASE WHEN service IN ('parking_aisle', 'drive-through', 'driveway') THEN 'INT-minor'::text ELSE 'INT-normal'::text END

which generates the minor service road distinction from the service tag. That is much shorter than the int_surface case you used as an example - still it could be worthwhile to encapsulate that logic in a function so someone who wants to change which service tags (or potentially other tags) lead to classification as minor or normal service roads only has to make one instead of seven modifications. I am not saying we should necessarily do that in this case (discussing that would be part of step 4/5 of my proposal) - but the benefit would be much clearer than for the map units to pixels conversion.

There is a third type of highly repetitive code elements in our SQL code that do not represent style specific logic but implement practically immutable aspects of the OSM data model. Most obvious of those is probably the COALESCE(layer,0) AS layernotnull (16 occurrences). Since such elements do not practically need to be edited by map designers and are simple enough to be intuitively clear to everyone familiar with OSM they could indeed be implemented in preprocessing. But note this is not trivial even in such a simple case - the difference between an explicit layer=0 and a missing layer tag is not necessarily irrelevant for rendering. But this is definitely something i would consider worth discussing (would be part of developing a data preprocessing strategy in #4901). At this stage, however, this is a theoretical discussion since no concrete proposal has been presented for a strategy in that direction. And i see not much overlap with the discussion here - the distinction between style specific and style independent interpretation of the OSM data is quite clear.

pnorman commented 2 months ago

I still don't see that it's the best route for calculating values that are only determined by the object's properties.

This does not provide much of a chance to achieve consensus here. You are essentially asking us to prove a negative - the impossibility of a better solution to exist.

But I have provided what I view as a better solution, you have just considered it off-topic for this discussion.

imagico commented 2 months ago

But I have provided what I view as a better solution, you have just considered it off-topic for this discussion.

You have hinted at the abstract idea to do style specific tag interpretation in pre-processing - but you have not presented a practical strategy how to do that or arguments why you think this is a beneficial approach for OSM-Carto. I have asked you to do so several times in #4952 and explained there why i consider it necessary to separate this from the PR review discussion (https://github.com/gravitystorm/openstreetmap-carto/pull/4952#issuecomment-2168549541, https://github.com/gravitystorm/openstreetmap-carto/pull/4952#issuecomment-2185147613, https://github.com/gravitystorm/openstreetmap-carto/pull/4952#issuecomment-2225590765, https://github.com/gravitystorm/openstreetmap-carto/pull/4952#issuecomment-2276914789).

@dch0ph has argued a bit why style specific tag interpretation with SQL functions is preferable to the abstract idea of doing it in pre-processing in some way. I do not see how doing this in pre-processing can be preferable to what is suggested here either, considering the goals of this project - but i do not consider it a productive approach to argue this in an abstract fashion at this stage. I want to be open to the possibility that i am wrong about this and that a well thought through, practically viable concept can address my arguments and convince me to change my mind. But you (or someone else) have to present such a concept and arguments why this is preferable to using SQL functions. Then we can have a fact based discussion of the arguments for and against the different approaches and develop consensus or come up with a third strategy based on the insights gained that is better than the individual proposals.

This is what an openly cooperative project like OSM-Carto is all about. Developing a common strategy on how to tackle the challenges we meet that is better suited than what each of us could come up with individually by combining our diverse viewpoints and experiences to jointly develop the best approach to do things under our shared goals. That requires investing the time and energy to present your ideas and reasoning to the critical review of others. I am doing this here for the approach we have in #4952 developed over the past months based on ground work of the past years. I expect you to do the same for your preferred approach.

A key to productive cooperation is also to value each other's views and perspectives even if you disagree with them and the willingness to discuss them to develop a common understanding (see also https://github.com/gravitystorm/openstreetmap-carto/issues/3951#issuecomment-555490869). Your repeated assertion of tag interpretation in pre-processing being preferrable combined with the non-engagement in argument either for your preferred approach or against the proposal presented here makes it difficult to develop consensus.

imagico commented 1 month ago

Another month has passed with no substantially new input. Everyone had plenty of time to contemplate the matter so i think we can try to wrap this up.

@pnorman - please indicate if you want to put up a concrete alternative proposal for a strategy to move style specific tag interpretation to the database import as a (partial) alternative to what is suggested here. I would very much like us to decide on this proposal with a concrete alternative to compare to and choose the better of two option rather than the only one practically available. But for that we'd need to have a concrete alternative with an implementation strategy (like i provided here) and an idea how this will practically look like (as we have in #4952). The general assertion you'd rather do that in preprocessing is neither suitable for a consensus decision among the maintainers regarding strategic direction nor does it provide sufficient guidance to developers how to practically implement changes that do tag interpretation.

mboeringa commented 1 month ago

@imagico

I think introducing functions is reasonable. Duplicate code is never desirable from a maintenance point of view, and should be avoided whenever and to the extent possible.

As you said, it is also not to much of a burden - the installation of the functions - compared to what already needs to be done, e.g. indexing.

I do also see the point @pnorman tries to make, and of course we should be glad osm2pgsql has so much more to offer nowadays in terms of tag processing via the new flex style processing, but I think these options are largely complementary and interchangeable, rather than clear-cut choices in specific cases. One is not necessarily better than the other, as arguments by @dch0ph also shows, it all depends on your goals with the data processing.

I do also agree with the general wish to minimize tag processing in the import stage, and leave the tag interpretation to the style (but of course the 'flex' configuration files can now largely represent "style" as well if developers wish to, that is what we gained with recent versions of osm2pgsql).

imagico commented 1 month ago

Ok, after another two weeks without new critical comments i think it can be stated that this proposal has quite clear support among those few who participated in the discussion. It is unfortunate that - given this has quite a bit of strategic significance - there has been not much participation in the discussion, especially from among the maintainers.

But given that

making progress on this in one way or another is - as explained above - of quite fundamental importance for solving one of the oldest issues of the project and concluding many years of slow but continuous work on a comprehensive solution for that - which is a precondition for a substantial number of other issues.
that we can't really suggest developers to invest their time and energy into working on the issues of the project if we are unable to develop an effective strategy on how this should be done.
that plenty of time has been given and explicit invitations have been made to present and discuss a concrete strategy for an alternative approach to this proposal.
that we ultimately stand in the do-ocratic tradition of OpenStreetMap

i plan to go ahead and finalize #4952 with the current implementation approach through SQL functions and further pursue the strategy outlined in this proposal.

That does not mean closing the door for alternative approaches. The invitation to present other strategies partly or fully replacing the use of SQL functions in style specific tag interpretation remains. Merging #4952 does not mean we are irreversibly tied to this approach. In other words: This proposal represents a RfC in the best tradition of that concept.

Bottom line: I am going to close this issue as resolved accordingly but any further discussion on the proposal (overall or on specific steps) is welcome. Beyond that

any proposal for an alternative approach should be presented in a new issue.
any discussion of our collective decision making procedure belongs in #3951.

Regarding use of newer features of osm2pgsql as mentioned by @mboeringa - that is going to be much easier to explore and discuss once we have merged #4978 - which, according to the approach sketched in #4981 i would like to see as the next strategic step after merging #4952 and releasing the pending changes.

gravitystorm / openstreetmap-carto

Introduction of SQL functions to the style #5004

Background

Proposal

214 is our second oldest issue and has been open for nearly 11 years now. A lot of other issues regarding visualization of access restrictions on roads (implicit and explicit) depend on this being addressed in a way that provides flexibility for further design adjustments.

4952 as the first and only attempt to concretely solve #214 with an overall and forward-looking strategy was created nearly half a year ago now and @dch0ph invested a lot of time over the months in adjusting this to various suggestions and requests made.