Set bounding box size limits based on surface dimensions

imagico commented 1 week ago

Problem

Since #4908 introduces bounding box size limits not only for reading data but also for writing data it becomes even more pertinent to implement these size limits in a meaningful way.

Right now these are determined in equirectangular projection (i.e. geographic coordinates treated as cartesian), which is not adequate. This has lead to mapping using the OSM API for reading data being practically impossible at high latitudes in most cases. The workaround has been to use Overpass instead. With changeset submission becoming subject to a size limit as well there is, however, no similar workaround available. This would, as is, lead to many edits in polar regions becoming impossible to mapping newcomers (plenty of normal existing geometries there, coastline ways, landcover polygons, will exceed the limits).

Since the purpose of these limits is to set reasonable limits (in terms of potential data volume for reading, in terms of spatial extent of a map edit for writing) setting these limits based on surface dimensions is the more appropriate method.

Description

The area and linear size of a bounding box should be calculated/approximated using spherical trigonometry rather than equirectangular projection. For the bounding box area that is:

r*r*abs(sin(lat1) - sin(lat2)) * (lon2 - lon1)

with longitude and latitude in radians, r being the Earth radius. See also https://gis.stackexchange.com/questions/59087/calculating-bounding-box-size.

For the linear size it depends a bit on what exactly you want to measure - a strait away solution would be to calculate the diagonal using haversine (https://en.wikipedia.org/wiki/Haversine_formula).

The limits of course would need to re-specified in terms of kilometers/square kilometers rather than degrees. 0.25 square degrees at the equator amounts to about 3000 square kilometers.

Screenshots

No response

tomhughes commented 1 week ago

To be clear the new write limits are extremely large so are highly unlikely to be a problem - you're not supposed to get anywhere near triggering them in anything like normal editing.

I'll have to check my numbers at home but the initial limit is such that only something like 0.01% of changesets have ever been larger than that and most of those are abuse.

As such I really don't think it matters to try and do what you suggest for those limits.

Even the download area limit is fairly large (though the initial upload limit for new users is nine times larger for a square area) and I suspect that in most cases the 50k node limit is hit long before any area limit.

imagico commented 1 week ago

I'll have to check my numbers at home but the initial limit is such that only something like 0.01% of changesets have ever been larger than that and most of those are abuse.

I am well aware of that, mapping in polar regions is very rare, so if you go by absolute numbers you can ignore it. If, however, the OSM community wants to be serious about creating the best map of the world then it cannot.

My estimate is that probably around 90 percent of all mappers trying to edit physical geography at very high latitudes (Northern Greenland, Antarctic interior) will run into the read API 0.25 square degree limit. It will be less for the changeset size limit obviously - but as said, in contrast to data reading this is a hard limit.

What i am suggesting is not a big change, it would just replace a grossly inadequate concept of bounding box size with a decent approximation. We are not talking about values being off by ten percent or so here, we are talking about a factor of 5-10 or more. I mentioned that at the equator 0.5*0.5 degrees is about 3000 square kilometers. At 83 degrees latitude (Northern Greenland) it is about 360 square kilometers.

Incidentally what i suggest would also allow communicating the changeset size limit in a more meaningful way (as a new mappers you must not edit things more that X kilometers apart in a single changeset).

Woazboat commented 1 week ago

I agree, using raw lat/lon degrees without considering the projection disproportionately limits changes in the polar regions by a huge factor. Using the real projected distances would be better and would still provide the desired effect.

tomhughes commented 1 week ago

Yes sure it would be lovely - patches welcome.

One small thing is that I did consider using the diagonal size rather than width+length but my brief tests suggested it caused more variation as the shape of the box changed from square to elongated.

gravitystorm commented 1 week ago

This has lead to mapping using the OSM API for reading data being practically impossible at high latitudes in most cases.

But editing software can make multiple read requests, can't they? Like request a large area in 0.25 degree chunks?

in contrast to data reading this is a hard limit.

I want to emphasise here that it will only affect new user accounts, and the limit rapidly increases to be a global extent over the first few weeks of mapping. It is not a hard limit. In fact, it's the read limit that's the hard limit, since it doesn't adjust or become global-scale over time.

I appreciate that at extreme latitudes this does have an outsized effect, in theory, on brand-new mappers. But I really don't think there will be many mappers who are making widespread changes in the high polar regions as their first bit of mapping.

I'm happy to be proved wrong, so if anyone can link to some changesets that would have hit the new limits, please let us know.

Otherwise, as @tomhughes says, I think working on this is a nice-to-have, not a must-have.

imagico commented 1 week ago

One small thing is that I did consider using the diagonal size rather than width+length but my brief tests suggested it caused more variation as the shape of the box changed from square to elongated.

This goes a bit beyond this issue - but mathematically what you currently do (width+length) is the L1 norm. The diagonal size would be the L2 norm. At the other end of the scale (see https://en.wikipedia.org/wiki/Lp_space) you have the L^inf norm, which would be max(width, length).

I don't think which of these you choose ultimately matters since:

you do this in a highly distorted equirectangular projection anyway.
the bounding box is a highly suboptimal measure for spatial extent of the impact an edit has on the map, even if it was in a low distortion projection.

openstreetmap / openstreetmap-website