openstreetmap / openstreetmap-website

The Rails application that powers OpenStreetMap
https://www.openstreetmap.org/
GNU General Public License v2.0
2.2k stars 911 forks source link

Limits on number of tags and relation members #1711

Closed mmd-osm closed 2 years ago

mmd-osm commented 6 years ago

While reviewing the diff upload code, I noticed that there's no max limit on the number of key/value pairs per object, as well as on the number of relation members in a relation. It would probably be a good idea to have such limits, given that we already have a max number of objects per changeset, and max number of nodes per way.

woodpeck commented 6 years ago

Data point: actual maximum number of tags per object is ~ 430, on http://www.openstreetmap.org/node/424298167

tomhughes commented 6 years ago

If you're going to propose adding an arbitrary limit you really need to give a good reason, not just say it would be a "good idea".

mmd-osm commented 6 years ago

Today it's difficult to reason about the diff size and the corresponding memory requirements.

Let's assume we create a changeset with 30 nodes and 10'000 tags each. Each tag can have up to 255 characters in both key and value. That's already adding up to 146MB worth of data, giving both the upload and downstream apps like JOSM a hard time. Memory requirements will be far less with 5 characters for keys+value (<3MB), for a mapper it's still not helpful to have objects with that many tags.

I see this more as a safeguard to prevent silly things from happening. It shouldn't get in the way of any normal mapping activity, though. I don't have a strong opinion on the actual limit. 500 will for sure be too low, 10'000 seems unreasonable. Maybe around 1000-1500? I'm leaving that open for further discussion.

dieterdreist commented 6 years ago

sent from a phone

On 28. Dec 2017, at 10:29, Frederik Ramm notifications@github.com wrote:

Data point: actual maximum number of tags per object is ~ 430, on http://www.openstreetmap.org/node/424298167

and they’re even missing klingon...

mmd-osm commented 6 years ago

One more data point wrt to maximum number of relation members, the other topic of this issue:

osm2pgsql has a limit of 32767 entries due to a data type limitations (see https://github.com/openstreetmap/osm2pgsql/issues/713). Someone even mentioned relations with 77769 nodes there which in that particular case seems more like abusing relations as collections.

mmd-osm commented 6 years ago

I think we can close this one, it turned out that the limits won't affect the new code for cgimap.

mmd-osm commented 5 years ago

In the light of recent events (e.g. https://lists.openstreetmap.org/pipermail/talk/2019-February/082064.html), I'm reopening this issue. We should reconsider introducing some limits for both the number of elements in a relation, as well as the maximum number of tags. Creating an insane amount of data via some buggy script is just too easy, and putting some safeguard in place to prevent the worst from happening isn't probably too bad an idea.

mmd-osm commented 5 years ago

Feedback in the respective cgimap pull request seems to indicate that

are sensible limits across different tools in the osm ecosystem.

I think this would have to be added to the Rails port as well, and announced to editors via new properties in /api/capabilities, then?

simonpoole commented 5 years ago

I think this would have to be added to the Rails port as well, and announced to editors via new properties in /api/capabilities, then?

No, not "I think". Any such limit has to mandatory be available to the editors so they can react correspondingly locally when the limit is exceeded before generating a failed upload, which potentially is not recoverable in a reasonable fashion.

mmd-osm commented 4 years ago

@gravitystorm : cgimap part of this topic is done now. Changes are included in the forthcoming 0.8.3 release.

mmd-osm commented 2 years ago

I'm reopening this issue. Someone uploaded a 200k member relation, and it caused havoc across a number of downstream data consumers.

https://github.com/zerebubuth/openstreetmap-cgimap/pull/174#issuecomment-1013266746

Here's a preliminary list of action items I could think of:

drolbr commented 2 years ago

There are currently no relations with more than 8000 members and about 900 relations with more than 2000 members. The relations that have congested the pipes (the M15 incident in Manhattan and the relation from last Friday) had over 50k resp. over 200k members.

Given that, a limit of 20k for relation members will not harm benign use for the foreseeable future, but will catch errors similar to the ones we have seen so far.

Other limits for other properties can be decided on separately. There is already a limit for the number of way members, thus a limit for the number of relation members should not be a too big surprise.

mmd-osm commented 2 years ago

http://osm.org/relation/6535292 currently has 29190 relation members, so a limit of 20k will not do.

You can read more about the analysis / queries in the linked cgimap issue.

drolbr commented 2 years ago

I stand corrected. I had ignored multipolygons and relations without tags. Yet still there are only 9 multipolygons with more than 8k members and this is the only relation with more than 20k members.

Note that the link http://osm.org/relation/6535292 does not work and is far from working, because even if one could get the rails port fast enough, it is still severely strenuous for the client to render.

I would tend to review this and the other few relations with over 10k members with the aim of breaking them down. The relation in question consists to more than 99% of coastline.

mmd-osm commented 2 years ago

Right, the Rails port link doesn't work and times out. This one should work, though: https://api.openstreetmap.org/api/0.6/relation/6535292.json

mmd-osm commented 2 years ago

I created #3440 which covers the following topics:

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="OpenStreetMap server" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
  <api>
    <version minimum="0.6" maximum="0.6"/>
    <area maximum="0.25"/>
    <note_area maximum="25"/>
    <tracepoints per_page="5000"/>
    <waynodes maximum="2000"/>
    <relationmembers maximum="32000"/>
    <changesets maximum_elements="10000"/>
    <timeout seconds="300"/>
    <status database="online" api="online" gpx="online"/>
  </api>
  <policy>
    <imagery>
      <blacklist regex=".*\.google(apis)?\..*/(vt|kh)[\?/].*([xyz]=.*){3}.*"/>
      <blacklist regex="http://xdworld\.vworld\.kr:8080/.*"/>
      <blacklist regex=".*\.here\.com[/:].*"/>
    </imagery>
  </policy>
</osm>
HolgerJeromin commented 2 years ago

I would have expected

    <relations maximum_members="32000"/>

to align to existing:

    <changesets maximum_elements="10000"/>

But waynodes has the same logic as your suggestion.

mmd-osm commented 2 years ago

Yes, most of the code comes from waynodes, so I'm following the same naming convention here for consistency.

pnorman commented 2 years ago

It would be nice if we had been consistent in naming with other stuff, but I'm in favour of aligning relation member limits with way node limits.

simonpoole commented 2 years ago

It would be nice if we had been consistent in naming with other stuff, but I'm in favour of aligning relation member limits with way node limits.

Of the things that are going to annoy people utilizing the API I think this is way down there, adding yet another brittle regexp now that ....

simonpoole commented 2 years ago
* [ ]  Communicate new limits to consumers of the API

Any chance of this actually happening (before it is active)?

tomhughes commented 2 years ago

Do you have a reason to think it won't be? or a concrete suggestion to make on how we should do so? or are just going to make snide remarks?

simonpoole commented 2 years ago

Do you have a reason to think it won't be? or a concrete suggestion to make on how we should do so? or are just going to make snide remarks?

It is simply previous experience with API changes, I don't think there is even a need to debate that cut over / deprecation handling of any kind has been rare.

tomhughes commented 2 years ago

Well I was serious about suggestions for how we communicate them given we don't really have any meaningful way to do so...

Sure we can post to the dev list and tweet, and approximately 0.1% of API users will notice if we're lucky.

mmd-osm commented 2 years ago

I already added a comment on the OSM API 0.6 wiki page the other day, referring to the relevant Github issues: https://wiki.openstreetmap.org/w/index.php?title=API_v0.6&type=revision&diff=2256803&oldid=2253203

Not sure what else we're supposed to do here. We could post something to osm dev, maybe. Given that this is such a niche topic, I doubt this is super relevant for the majority of users.

simonpoole commented 2 years ago

Diary post, OSM blog post, OSM weekly, OSM US-slack, dev mailing list, dev and editor forums, HOT has something technical too. These can all be the same text.

The OSM weekly will at least provide a non-EN heads up.

simonpoole commented 2 years ago

Not sure what else we're supposed to do here.

Determine a date when the feature/change should go live allowing a suitable amount of time in advance for adaptations. Then announce the date and what is happening.

Assuming the limit would be 32k, then the change will have little to no practical impact, so a weeks notice could be considered enough (obviously for something with impact you would want half a year or more). In this case the limiting channel will be Weekly OSM, so assuming it would get covered in this weeks edition (which it won't without explicitly asking the publishers), then the 14th would be a good choice.

gravitystorm commented 2 years ago

Diary post, OSM blog post, OSM weekly, OSM US-slack, dev mailing list, dev and editor forums, HOT has something technical too. These can all be the same text.

I think this is completely over the top for this change. It's likely to affect approximately zero mappers and zero authors of editing software (since they can just ignore the new limit and in reality approximately zero of their users are likely to ever hit it).

My counter-proposal would be a) an email to dev and b) immediate rollout.

Of course, if @simonpoole feels like all those announcements are genuinely worthwhile, then I'm happy for him to volunteer to do so, but I think it's an unreasonable burden to be putting on anyone else.

simonpoole commented 2 years ago

Getting some practice in announcing things properly might just be a good idea (contrary to well practiced not being bothered) and an uncritical change would seem to be ideal for that, The limit is nearly guaranteed to be reduced going forward so making everybody aware of it has the added benefit of an early heads up.

mmd-osm commented 2 years ago

Tickets raised so far in various editing apps:

mmd-osm commented 2 years ago

@joto again brings up the seemingly unlimited number of tags as a potential security issue in his new data model study. While this is technically speaking not quite correct (we have implicit limits based on message sizes), it would still be good to have an explicit max number somewhere below 10k. Yes, we should also include this limit in /capabilities and give editing apps ample time to adjust.

Luckily, CGImap already has a config option to enable a tag limit right away, bit we don't have anything similar for the Rails port yet.

Any thoughts on this?