osm2pgsql-dev / osm2pgsql

OpenStreetMap data to PostgreSQL converter
https://osm2pgsql.org
GNU General Public License v2.0
1.51k stars 474 forks source link

Allow propagating tags to ways during relations processing #230

Closed kevinkreiser closed 4 years ago

kevinkreiser commented 9 years ago

During relations processing one has the option to mark members as superseeded/belonging/matching the given parent relation. When you do this any superseeded member that was added during way processing is deleted and the full geom of the relation is added back in using tag processing as a combination of the relation and the member ways. Essentially this allows one to pull tags from the members up to the relation and write out the relation. This makes for some duplication because often the members of a relation don't share the same tags for the full lenght of the relation.

I think it would be useful to allow users to "push" down tags from the relation on to each of the members of the relation that were already written to the db. Basically allow the tagtransform::filter_rel_member_tags to edit the tags on the members based on the relation and have those tags be written back out on the original ways. Currently we have the ability already to edit the tags they just don't get written back out.

My use case is as follows. I want to make a database that contains information useful for routing. Most of this information is readily avialable on the way. One thing that isn't are the tags needed for shielding/naming for numbered routes as well as their direction. This information exists in the relation's name and it's members' roles when the relation has type=route for example. In this case it would be really handy to be able to "push" tag and roles information from the relation into additional tags on each of the member ways.

To accomplish this in the code we could, in relation processing, loop over the superseeded information even if the relation was set to filter==1 and just make a call to the db for every way whose superseeded==0 to write that members keyvalues back out to the table. since the keyvalues are tagtransform::filter_way_tags filtered before doing so the lua you might write to "push" the tags down to the members would be just concentrated on what you are "pushing" down as well.

I may end up adding this on a branch and making a PR for this unless someone thinks its a poor idea or not worthwhile.

@pnorman @zerebubuth @lonvia @apmon @gknisely what do you guys think?

pnorman commented 9 years ago

My use case is as follows. I want to make a database that contains information useful for routing. Most of this information is readily avialable on the way. One thing that isn't are the tags needed for shielding/naming for numbered routes as well as their direction. This information exists in the relation's name and it's members' roles when the relation has type=route for example. In this case it would be really handy to be able to "push" tag and roles information from the relation into additional tags on each of the member ways.

What you want to do is basically this, but pre-computed, right?

SELECT w.way, w.name, w.highway, w.ref, r.ref AS network_ref, r.network AS network_network
  FROM planet_osm_line w
    LEFT JOIN (SELECT * FROM planet_osm_line WHERE osm_id < 0) AS r
      ON (..non-trivial join to match ways and relations...)
kevinkreiser commented 9 years ago

yep we thought if no one wants to allow this in osm2pgsql we would do a post processing step similar to what you are saying using the middle tables, i just thought it would be nice to have so it could all happen in the same lua script. again i dont know if anyone else would be interested in this though.. also it might be worth making superseeded be more than just 0 or 1 but maybe a 2 denotiing to do the way columns updates and 0 still being a no-op

pnorman commented 9 years ago

performance wise, issuing a bunch of UPDATEs to lines in relation processing is probably going to be a headache. this won't impact those not using that feature, but is of concern to you.

The current superseeded processing is really based around multipolygons where you want to get rid of the way. Updating raises the case of a way being part of multiple relations, which is tricky to handle right.

kevinkreiser commented 9 years ago

good point, handling multiple memberships will be tricky but we must handle it for our purposes. at any rate am i to understand you aren't particularly interested in this feature at this time? :smile:

pnorman commented 9 years ago

yep we thought if no one wants to allow this in osm2pgsql we would do a post processing step similar to what you are saying using the middle tables

I should note that you can actually do this from only rendering tables, but it's a headache thanks to long linestring splitting. see https://github.com/gravitystorm/openstreetmap-carto/issues/596#issuecomment-47410719

pnorman commented 9 years ago

good point, handling multiple memberships will be tricky but we must handle it for our purposes. at any rate am i to understand you aren't particularly interested in this feature at this time?

I'd like a way to get network relation information back onto way linestrings. This might be the best way to do it, but I want to make sure we've thought through it and know the problems that will come up.

Unfortunately, I probably won't be able to use the feature for openstreetmap-carto.

kevinkreiser commented 9 years ago

Perfect, does anyone else have any opinions on this topic? Including a better name for this issue/feature request?

lonvia commented 9 years ago

I'm not sure I've understood you right. I thought way processing worked like this with respect to multipolygons: in the first round of reading ways that potentially belong to a multipolygon are held back from writing in to the final planet_osm_line tables and just marked as to be done (TBD) in the middle tables. Then relations are processed and they delete the TBD flags for ways which are part of a multipolygon and therefore are not independent. Afterwards the TBD ways are processed and inserted in planet_osm_line.

There is no actual deleting (just unflagging) and it should be much easier to plug into this mechanism for what you want: make sure all ways that can potentially inherit relation tags are marked as TBD (currently that happens, when the way is marked as polygon1, so you might need an additional flag, as 'polygon' and TBD would become independent properties). Then in the postprocessing2 look up for each way where it is a member of and hand the relation tags (or a list of them, if there are more relations the way is a member of) into the tag transform.

Does that make sense to you? If you employ some clever in-memory caching of relation tags (possibly filtering the relations you are actually interested in), then this isn't necessarily slowing down anything.

NB: this feature request comes up from time to time, in particular for route relations.

lonvia commented 6 years ago

If that is implemented then the same should be implemented for nodes (see also #791).

kocio-pl commented 6 years ago

I wonder if this feature would help with rendering rivers on OpenStreetMap Carto? They tend to be long and instead of repeating such general properties like name or length (distance) for every segment, it's handy to put it into the properties of relation, which binds them all. But when I wanted to filter them by these properties, I was not able to do so.

Are there any plans regarding implementing such functionality in osm2pgsql?

lonvia commented 6 years ago

It is unlikely that this will ever be implemented unless there is an external contribution for it.

pnorman commented 6 years ago

I have a possibility of doing this on contract, so intend to write up what is needed.

yvecai commented 6 years ago

One should have a look at the relation_member table in Imposm3. https://imposm.org/docs/imposm3/latest/relations.html The JOIN become trivial. It's the slim tables with indexes.

kennykb commented 6 years ago

@yvecai, @imagico, I agree that ImpOSM's approach to relations is sound. I use it in a newer proof of concept for rendering shaped markers for possibly concurrent road routes. Rather than tracking all relations, I limit it to type=route route=road, with both network= and ref= specified, to minimize the footprint. Obviously, it could easily be expanded to other relations if there's a use for it.

kennykb commented 6 years ago

In various fora, I've received a lot of strongly positive and strongly negative (even dismissive) feedback for the idea that I mentioned above.

I'm floating a formal proposal to develop the necessary osm2pgsql infrastructure to address the problem.

Consider this to be a request, nay, rather, a plea for comments from those who understand the issues. The proposal is most emphatically not in its final form - in particular, there is an Open issues section that describes some requirements that I'm fairly sure exist, but that I don't understand.

I hope that writing up this portion of the issue, as I understand it, demonstrates a good-faith attempt to "do the homework" and will not be dismissed out of hand as yet another ignorant oversimplification of the problem. I recognize that I'm still at an early stage in understanding, but I'm reluctant to move forward without at least some feedback from wiser heads. I'll make every effort to insure that your help will not be a waste of time.

(Oh, quick note - it might be easier to consolidate replies in this ticket.)

lonvia commented 6 years ago

@kennykb Two problems I see with that proposal: first, it is focused too narrowly on routes. Anything included in osm2pgsql directly has to be able to work with any kind of relation. The proposal has potential to be generalised to arbitrary relations though, so that's not a real blocker.

More importantly, the proposal suggest to add additional tables which are joined during render time. I really don't think this is a good way forward. The necessary information for the way should be collected when the database is created. It is less time critical at that point and we can cache the necessary relation information more efficiently.

That said, if joining is fast enough for you you can do all that relation-member lookup relatively efficiently already with a stock osm2pgsql in slim mode without creating any additional tables: Route relations are already processed by osm2pgsql. The route relation information (planet_osm_route in your proposal) is available in planet_osm_line (negative id because it is a relation). The relation-member information is in the planet_osm_rels and there are even convenient indexes created. Putting all together, you should write a lua tagtransform script that pre-computes the appropriate shied information for each highway route you are interested in and save that in a custom shields column. Then you can collect all shields for a way like that:

SELECT osm_id, 
       ARRAY(SELECT DISTINCT l.shield
             FROM planet_osm_rels r JOIN planet_osm_line l ON r.id = -l.osm_id
             WHERE o.osm_id = ANY(parts) AND 'w' || o.osm_id::text = ANY(members))
FROM planet_osm_line o

parts of planet_osm_rels has an index, so you need to use that to look up membership. It only stores raw ids though that do not distinguish between ids for nodes, ways and relations. That's why there is the second match against the members column. This is just an outline to give you an idea how this could work. The indexes used are a little bit larger but I doubt that you see much of a difference in rendering performance compared to the extra tables and you save the updates of the tables.

As for using the slim tables here, officially speaking that is indeed frowned upon. However, fact is that they are so widely used, that we can't really change the format at a whim. It might happen at some point (there is a discussion about using JSON) but is more something of a once in ten years occurrence.

imagico commented 6 years ago

Without having a good idea how feasible this is practically - it might be worth considering making planet_osm_rels a generic feature of osm2pgsql that is also available without slim mode. I completely get that propagating relation tags to the member geometries is a more elegant solution within the design scope of osm2pgsql. But this is a non-trivial problem when considered in a truly generic form (the concept of route relations is a very simple example where multiple relation membership have a fairly straightforward meaning - but that is not something that can be assumed to apply universally).

After all seen from the Postgresql side this is exactly what relational databases are meant for.

kennykb commented 6 years ago

@lonvia Thanks for noticing this and taking the time to respond. I hope that you will tolerate my asking more questions, because there is a problem to be solved here, and we've not yet converged on a complete solution.

As you note, the proposal can indeed be generalised to arbitrary relations. I simply did not do so out of a desire to keep the database footprint as small as possible. Moreover, relations are not simply dumb containers; I suspect that different types of relation may need somewhat different structures, with part of the structure determined by the 'role' and the type of object (point, open way, closed way, other relation) contained. At the very least, I'd most likely exclude multipolygons, since they really are quite well handled by the existing mechanisms.

I agree with you in general that we want to push the processing as early as possible in the rendering sequence. With that said, though, I observe that the performance cost of the render-time join in this particular use case is almost negligible. The join winds up being executed only for numbered ways that are known to be inside the current super-tile and of enough importance to be rendered with markers at the current zoom level. The cost of doing a JOIN (which turns out to be all against cached data in many cases) is small compared with the cost of retrieving and rendering PNG or SVG files containing the graphic elements for the markers. At least it turned out to be so in a multi-hour job that prerendered all tiles down to zoom level 15 in the area that I cover on my private server (most of the US east of the MIssissippi) - adding the marker rendering added at most a few minutes to the total time.

The join at rendering time is still, I concede, not ideal, but I've not et been able to come up with a better approach. It appears to me that the entries with negative osm_id in the 'roads' table have had their ways coalesced (using ST_LineMerge or something similar) prematurely. The result is that in a route concurrence, the renderer will simply see two unrelated lines. In a place like this, for instance, it will have one line for the SR 443 relation, one for the SR 85 relation, and one for each individual way segment (including the changes of speed limit and the addition of climbing lanes as the road goes up the hill).

Moreover, some of the numbered ways are minor roads that do not appear in 'roads' and do not get marked as 'pending ways' in the initial pass, so don't wind up in the second pass of assembling the routes at all. Inspecting the 'line' table, I did not see entries corresponding to the relations on these minor roads.

In any case, as I said, the LineMerge is simply premature. What's actually needed for usable marker placement is that there be one multilinestring for each distinct subset of the routes, so that there is a set of linestrings that's 'just SR 85', another that's 'just SR 443' and another that's 'SR 85/443 running concurrently.' For good marker placement, these strings need to be ST_LineMerge'd, of course, because otherwise, it's insanely difficult to produce markers at more-or-less regular intervals (and avoid excess ones at spots where the way is split for a reason such as a change to the speed limit), but that must be deferred until a point where the process is aware of concurrencies.

Keeping concurrency sets up to date during, say, a minutely update was something where the processing was not at all obvious to me - and it appears to be beyond the capabilities of the Lua API. At least my understanding is that the the Lua script takes geometries as they are given, and that the only decisions it can make are whether or not to include the object in the table under construction and what tags to use to decorate it. If you have an example of a Lua script for osm2pgsql that actually modifies the structure of what it is given in some nontrivial way, I'd really appreciate a pointer to it! I had considered the approach, but put it on hold when my initial searches on Google appeared to turn up nothing useful. Failing that, though, building the sets in SQL at render time works quite well.

Your suggestion of joining to the 'rels' table is indeed more or less workable. The resulting PL/SQL code is rather more complicated, and this stems from the fact that the 'role' is important (including links, in particular, makes a mess of the rendering). Determining the role once you know that a given way is a member of the relation means rescanning the arrays to search for the way a second time and parsing out the 'role' element from a heterogeneous bucket of strings. It works, but it's really slow when dealing with, say, a transcontinental route and trying to pick out the roles for just those members in the tile being rendered. It also strikes me as being brittle code. When @pnorman pointed out that he considered any use of the slim tables to be unacceptable, I abandoned it.

As for using the slim tables here, officially speaking that is indeed frowned upon. However, fact is that they are so widely used, that we can't really change the format at a whim. It might happen at some point (there is a discussion about using JSON) but is more something of a once in ten years occurrence.

Have the maintainers of osm2pgsql considered that the widespread use - in spite of the vehement disparagement of such use - indicates that they address a need that cannot be satisfied in any other way?

kennykb commented 6 years ago

Oh, one more note. Any solution that loses the association of the ordered pair (network, ref) is not acceptable. Consider the area shown here All of those tertiary roads are tagged 'ref=CR NNN', and without the network, it's not possible to select the right marker. The markers in Guernsey and Belmont Counties are green-and-white squares. Those in Harrison County are white-and-black squares. Those in Tuscarawas County are blue-and-gold pentagons.

Before anyone tells me that 'sure you can do it, "just" intersect the way with the multipolygons that give the county boundaries, let me observe that even if it were possible to do so with reasonable performance, it simply cannot work. Here is a case where New York SR 17 dips into Pennsylania (while retaining its NY designation), and here is one where NY 120A runs mostly in Connecticut when it isn't following the state line.

This sort of thing isn't exactly ubiquitous, but it's not vanishingly rare, either. It's often done by informal agreement between the jurisdictions that maintain the roads, as to who takes responsibility for one or another road that goes between the jurisdictions.

yvecai commented 6 years ago

I think documenting the use of planet_osm_rels as a proper way (although people are always free to frown) to join relations and their members would already be a good step.

lonvia commented 6 years ago

Sigh, it's amazing how this went from "it likely won't break any time soon" to "it's the proper way to do it" in a matter of hours. So let me be crystal clear here: (mis)using the slim tables for elaborate cross-joining and data analysis operations is not a proper way. If you do it, you are on your own. Don't open tickets if it doesn't work, don't ask for features so that your use-case is better supported and don't complain if things break when the format changes after all. The main design goal for the slim table remains to make the import and updates as fast as possible.

Have the maintainers of osm2pgsql considered that the widespread use - in spite of the vehement disparagement of such use - indicates that they address a need that cannot be satisfied in any other way?

I don't doubt that there is a need, I just contest that osm2pgsql is the right tool for all those uses. Not everything is a nail just because you happen to have a hammer.

kennykb commented 6 years ago

I'm not claiming that the slim tables are the proper way to do it!

I'm trying to explore what is the proper way to do it.

You've answered me now: there is no right way to do it, because osm2pgsql is not the right tool for the job.

I'm disappointed, because so much of the rest of the rendering toolchain depends on osm2pgsql, and now I have to investigate alternatives all up and down the line, but you surely have the authority to make that decision

I'm sorry to have wasted your time.

imagico commented 6 years ago

I don't doubt that there is a need, I just contest that osm2pgsql is the right tool for all those uses.

You should be aware that this is not visible to the observer from the documentation AFAIK. The Readme says:

osm2pgsql is a tool for loading OpenStreetMap data into a PostgreSQL / PostGIS database suitable for applications like rendering into a map, geocoding with Nominatim, or general analysis.

I see in principle three possibilities how this pertains to relations:

There have been statements that a viable implementation of the third option could be acceptable for osm2pgsql at least for route relations. There is now a statement that the second option is not considered to be acceptable beyond 'abusing' existing functionality. It would make sense to document this (and preferably also the reasoning behind it) somewhere where it can easily be found by users.

pnorman commented 6 years ago

I see no problems that propagating relation tags back to ways through the transforms could not solve. To be clear, we want to do this. It is unlikely to happen unless someone comes forward to do the work.

imagico commented 6 years ago

I see no problems that propagating relation tags back to ways through the transforms could not solve.

Whenever the relation actually represents a relationship between its members and is not just a collection of independent features (like in this case: all roads that belong to a certain route).

For the route and multipolygon relation this is solved through distinct handling and generating combined geometries but for all other relation types this would need to be separately implemented.

The most widespread relation type where this applies is turn restrictions. If you'd want to represent turn restrictions propagating tags is not enough. Generating separate geometries for them in addition could help to some extent - though you would still have the problem that the same road can be member of multiple turn restriction relations in different roles and i see no way to properly represent this in the database without creating references between geometries which would need to be interpreted via table joins.

kocio-pl commented 5 years ago

I have a possibility of doing this on contract, so intend to write up what is needed.

@pnorman Is this possibility still on the table or it's gone?

Maybe we could collect some money through independent action to implement it. What amount of money would be needed then (estimated)?

pnorman commented 5 years ago

@pnorman Is this possibility still on the table or it's gone?

That was over one contract ago I believe, so I have no plans to do any osm2pgsql stuff right now.

kennykb commented 5 years ago

Let me try to make one more run at breaking the log-jam.

The impasse with @lonvia appears to be that she is concerned with the performance implications of a run-time JOIN during rendering. I begin to see how sufficiently clever preprocessing when updating the tables could avoid such a beast for the specific case of route relations. I still have questions that would influence the design of such a thing. I'm hoping that I can get answers to them that would allow me to move forward. Otherwise, the requirement of "come up with a fully fleshed proposal for examination" is nearly tantamount to "go away," since that is so very inefficient a method for discovering what hidden requirements a proposal must satisfy.

I do get the idea of adding a 'shields' column to the database, as @lonvia suggested earlier. In the case of rendering concurrent routes, this column would need to comprise a delimited list of alternating values of network and ref from the route relation. (Here, I'm assuming that the type of the column must be a simple type such as INTEGER or TEXT. Obviously, if ARRAYs can be supported here, they'd be preferred!)

Obviously, 'network' and 'ref' by themselves are not enough to do attractive rendering of shaped markers. I imagine that there will be additional calculation at render time to retrieve a marker and use either a ShieldSymbolizer or a PointSymbolizer to do the actual rendering (under control of a GroupSymbolizer to handle clusters). This may involve an additional lookup of some sort to retrieve an appropriate graphical item. The lookup could conceivably be a JOIN operation, but in this case it would be a JOIN in which the outer table contains only items that we are already committed to rendering. This JOIN could be avoided by making the rendering decision at the time that the tables are updated, but such a design strikes me as a premature optimization and as a breaking of the division of responsibilities between database and renderer.

My experience suggests that rendering of route markers can be made much more attractive and readable by coalescing ways that belong to the same routes. A route may be fragmented into a great many short ways, to represent bridges, changes in speed limits, changes in number or purpose of traffic lanes, presence or absence of cycleways or pavements, etc. Considering these short ways in isolation makes it extremely difficult to place markers at approximately equal intervals. By the same token, coalescing ways that belong to a single route means that there will be multiple ways that follow the same nodes, with no indication that they are concurrent - which was the original problem to be solved.

The correct coalescence for this particular case is to merge ways that belong to the same set of route relations, rather than all ways for a given route relation. This is not consistent with how routes are stored in the line and roads tables today, and I do not propose to change that, so as not to have impact on existing code that uses those tables. There are two major alternatives:

(1) Simply store the shields information with the individual ways. At render time, once the ways have been retrieved, run ST_LineMerge on the result set (grouping by shields to coalesce the lines before returning them to the renderer. This method adds some computation on the database side, but is operating only on data that are known to participate in the rendering, and are already in memory buffers in PostgreSQL. Data outside the bounding box will not be considered in the merge.

(2) Pre-merge the ways that correspond to each set of routes. These merged ways would need to be stored in a separate table analogous to roads. There would need to be some sort of specification in the style file or Lua script to indicate what column mapping is needed for this table, which is likely to be considerably more austere than the line table. It would also require considerable design work to make decisions about how to handle merging of ways when tagging disagrees on the individual ways - and, alas, the particular rendering that I have in mind for markers requires the highway=* tag as well as the shields information, so this thorny issue would have to be addressed. Moreover, I'm not entirely convinced that using this approach would save any time in rendering. The pre-merged ways might be extremely long: consider that 'network=US:I ref=87' might well merge to a single way hundreds of km long. All of the extraneous information in these monstrous ways would have to be tested against the bounding box and discarded.

For the reasons stated, I have a strong preference for alternative (1).

I have not discussed yet the methods for generating the auxiliary shields column on initial database creation, for keeping its value consistent during database updates, or the specific database queries for driving a GroupSymbolizer to render the markers. Obviously, these methods will change based on the answers to the questions above. I think it would likely be a waste of both my time and yours to attempt to flesh out that design before the questions above are answered.

I apologize for taking so much of the gatekeepers' time, but I don't see a path forward without going through requirements step by step.

jeisenbe commented 5 years ago

Is anyone interested in working on this issue?

SomeoneElseOSM commented 5 years ago

Is anyone interested in working on this issue?

I did have a very quick look at it (ages ago) but couldn't see an easy way to expand the existing lua calls with an extra one to add relation details to ways (sample comment in my notes "In 'if (lua_pcall(L, 4, 6, 0))' WTF do those numbers mean?").

Something that would be really helpful would be documentation of how what is there already is implemented (not just how to use it - https://github.com/openstreetmap/osm2pgsql/blob/master/docs/lua.md covers that).

kennykb commented 5 years ago

Is anyone interested in working on this issue?

I think that my earlier replies should probably be taken as willingness to work on it. I have put the project on hold because of the fact that when I present fairly detailed sketches of proposed changes and ask specific questions, the posts either go unanswered or else are dismissed with objections that are too vague to be actionable.

lonvia commented 4 years ago

The new "flex" output now implements a mechanism to propagate relation information back to ways. It is not yet fully developed but the basic mechanism is there. I'm closing this ticket. Please try the new output and see if it fits your use-case. Open separate new issues, if you find bugs or have a use-case that it not yet covers.

SomeoneElseOSM commented 4 years ago

(echoing my comment from last year) is there documentation such as a worked example of how to achieve this now?

joto commented 4 years ago

@SomeoneElseOSM Yes. See docs/flex.md and the example files in flex-config.