zerebubuth / openstreetmap-cgimap

A C++ implementation of the OpenStreetMap API map call.
http://wiki.openstreetmap.org/wiki/Cgimap
GNU General Public License v2.0
73 stars 38 forks source link

Allow changeset bounding boxes to be limited in size #409

Closed systemed closed 5 months ago

systemed commented 5 months ago

This PR adds the ability for a maximum changeset bounding box to be enforced.

The maximum can be set with --max-bbox-1d and --max-bbox-2d. The former sets the maximum, in degrees, in any one dimension: for example, if it's set to 2, then a changeset with a node at lon 35.0 and another at lon 38.0 will fail. The latter sets the maximum, in square degrees, in both dimensions (i.e. dx * dy).

I'm not under any illusions that this will prove more than a speedbump in the road for our long-wayed friend, but it's another tool for the box.

The PR passes existing tests and new ones have been added but more testing is always good!

Woazboat commented 5 months ago

This will also block changes by moderators in its current form, which is probably not a good idea.

On 16 June 2024 17:54:45 CEST, Richard Fairhurst @.***> wrote:

This PR adds the ability for a maximum changeset bounding box to be enforced.

The maximum can be set with --max-bbox-1d and --max-bbox-2d. The former sets the maximum, in degrees, in any one dimension: for example, if it's set to 2, then a changeset with a node at lon 35.0 and another at lon 38.0 will fail. The latter sets the maximum, in square degrees, in both dimensions (i.e. dx * dy).

I'm not under any illusions that this will prove more than a speedbump in the road for our long-wayed friend, but it's another tool for the box.

The PR passes existing tests and new ones have been added but more testing is always good! You can view, comment on, or merge this pull request online at:

https://github.com/zerebubuth/openstreetmap-cgimap/pull/409

-- Commit Summary --

  • Allow changeset bounding boxes to be limited in size

-- File Changes --

M include/cgimap/options.hpp (28) M src/backend/apidb/changeset_upload/changeset_updater.cpp (7) M src/main.cpp (2) M src/options.cpp (21) M test/test_apidb_backend_changeset_uploads.cpp (39) M test/test_parse_options.cpp (4)

-- Patch Links --

https://github.com/zerebubuth/openstreetmap-cgimap/pull/409.patch https://github.com/zerebubuth/openstreetmap-cgimap/pull/409.diff

-- Reply to this email directly or view it on GitHub: https://github.com/zerebubuth/openstreetmap-cgimap/pull/409 You are receiving this because you are subscribed to this thread.

Message ID: @.***>

mmd-osm commented 5 months ago

Thank you for looking into this issue. I'd like to share a few random thoughts on the overall approach you've chosen, also comparing it to our previous work on limiting changeset sizes. In the end, I'll leave it up to OWG to decide which approach they feel most comfortable with (@tomhughes).

When we introduced changeset limits last year November, we had a few design goals in mind:

In the end, we've chosen to have a new database function in place, which is managed in the Rails repo, and called by CGImap with an agreed upon set of parameters. You can see this approach in https://github.com/zerebubuth/openstreetmap-cgimap/blob/master/src/api06/changeset_upload_handler.cpp#L58-L69 as call to get_rate_limit(*user_id).

In a way, these database functions are to Rails/CGImap what Lua is to osm2pgsql. The core remains stable, while additional scripting offers the flexibility needed in this case.

Now on the idea of having a global bounding box limit for everyone, I could think of scenario where this could in fact backfire on our efforts to combat vandalism:

Let's imagine we have a vandal who has created two users Moron1 and Moron2. Each user uploads one changeset each with silly things that would still fit our bounding box limits. However, both changesets have an overlap area, in which both morons change some objects multiple times (interleaved updates).

DWG needs to revert both changesets at the same time to get rid of the nonsense. However, the resulting upload would be blocked because its bbox is too large. Without further changes to our revert tools, we would essentially make the revert impossible. -> not a good idea.

Initially, I thought about a new database function, this time evaluating user id and new bounding box. Depending on the type of account, etc., we could have a bit more flexibility on the permitted bbox.

systemed commented 5 months ago

This will also block changes by moderators in its current form, which is probably not a good idea.

Yep. I looked into that but AFAICT there isn't currently an elegant way of persisting the is_moderator value through the call stack without adding an extra bool to a whole bunch of method signatures. It would probably be neater to introduce a new struct which comprises the existing std::optional<osm_user_id_t> and the is_moderator bool. That's certainly doable but I didn't want to embark on that without some certainty that this was all worth doing.

DWG needs to revert both changesets at the same time to get rid of the nonsense. However, the resulting upload would be blocked because its bbox is too large

This should be fine if the block is disapplied with is_moderator is set as per above comment, I think?

mmd-osm commented 5 months ago

This depends a bit on the overall approach. At least for the db function, we already have a concept of "moderator" and "importer" in place, so there would be no need to do all this work on CGImap:

https://github.com/openstreetmap/openstreetmap-website/blob/84aa7f455ae58b4b051368e35dbc43f780b3aa0f/lib/database_functions.rb#L21-L28

And yes, we need to have a solution that also works for Rails (or depending on your POV, a solution in Rails, that also works for CGImap). Otherwise you could trivially bypass all bbox limits.


Previous upstream discussion: https://github.com/openstreetmap/openstreetmap-website/issues/4805

mmd-osm commented 5 months ago

For some extra context, the changeset size limit involved the following changes: https://github.com/zerebubuth/openstreetmap-cgimap/pull/296/files

As a side note, we also need to backport any changes to 0.9.x, since the current master branch doesn't support Basic Auth and OAuth 1.0a anymore, and might cause some other issues. I'll look into the backport topic, once we have some code for master in place and agree on the overall approach.

Woazboat commented 5 months ago

Yep. I looked into that but AFAICT there isn't currently an elegant way of persisting the is_moderator value through the call stack without adding an extra bool to a whole bunch of method signatures. It would probably be neater to introduce a new struct which comprises the existing std::optional<osm_user_id_t> and the is_moderator bool. That's certainly doable but I didn't want to embark on that without some certainty that this was all worth doing.

I'm also currently running into this exact same issue on a related project. I'm working on/testing adding a framework to cgimap for dynamically loadable plugins + event hooks that could be used to add add-on features such as e.g. upload filters without having to modify, recompile and deploy a completely new cgimap version. https://github.com/Woazboat/openstreetmap-cgimap/tree/plugin-infrastructure

(WIP and not yet fully ready unfortunately, although the main features are already there. A bunch of the added code in that branch is just for testing at the moment and needs to be cleaned up.)

Here's an example of a hook function for changeset upload requests that is able to block uploads with bounding boxes over a certain size: https://github.com/Woazboat/openstreetmap-cgimap/blob/plugin-infrastructure/plugins/foo/foo.cpp#L100-L139

There's also a very rudimentary tag filter there that I added as a proof of concept that can block requests that contain elements with certain tags.

The whole thing needs a bit more work though, especially configuration for plugins and refactoring some of the code to get it properly production ready.

Woazboat commented 5 months ago

I'd maybe also recommend using the proper dimensions of the bounding box rectangle for the map projection instead of the latitude/longitude arc angles. Otherwise changes near the poles will appear to have a massively inflated bounding box even when the actual area is quite small. https://github.com/Woazboat/openstreetmap-cgimap/blob/plugin-infrastructure/plugins/foo/foo.cpp#L112-L125

Woazboat commented 5 months ago

Thank you for looking into this issue. I'd like to share a few random thoughts on the overall approach you've chosen, also comparing it to our previous work on limiting changeset sizes. In the end, I'll leave it up to OWG to decide which approach they feel most comfortable with (@tomhughes).

When we introduced changeset limits last year November, we had a few design goals in mind:

* Maximum reuse across Rails and C++: Since all single object operations still exclusively reside on Rails, we wanted to have a single implementation that can be used in both worlds.

* Quick turnaround times: no hardcoded logic in CGImap, in order to avoid releasing new CGImap versions for any sort of policy changes

In the end, we've chosen to have a new database function in place, which is managed in the Rails repo, and called by CGImap with an agreed upon set of parameters.

I think we need a solution fast and can't afford to wait too long until everything is perfect in this instance. Get it working first and then it can still be tidied up and improved afterwards. That could very well mean initially separate implementations for cgimap/rails or only partially complete ones.

AntonKhorev commented 5 months ago

Good luck editing things like https://www.openstreetmap.org/way/1113252388

Woazboat commented 5 months ago

Yes, that is absolutely a problem here https://community.openstreetmap.org/t/cyber-attacks-in-the-osm-space/113618/43

mmd-osm commented 5 months ago

@Woazboat I’m not clear how you would ship a c++ plugin to production. You still need to go through the same ppa build steps, since you cannot build directly on the frontend servers. But then you could skip all the plugin overhead and directly add the logic to CGImap (which I want to avoid).

i think before we spend too much time on building solutions, we should collect proper requirements from both DWG/OWG. I do realize that the forum is in panic mode right now, but that doesn’t really help anyone. We still don’t have consensus on what we want to do, and a lot of proposals are not feasible at all, like everything that’s asking for some kind of approval mechanism / queues.

Looking back at the previous rounds on adding new limits, I’m confident that the actual implementation can be done rather quickly. As I mentioned elsewhere, the most time consuming part is to work out what we want to do.

i also believe that at the time of a single upload, we’ll never see the full picture, and whatever we build using this approach can be gamed in one way or another (end of mandatory expectation management section).

systemed commented 5 months ago

I do realize that the forum is in panic mode right now

I think that's an extraordinarily complacent way of characterising the issue, but this discussion isn't going anywhere and the suggested scope enhancements are outside my skillset and available time, so I'll close this PR. Good luck.

mmd-osm commented 5 months ago

I think we all agree that we care a lot about having a perfect map and stopping the nonsense as quickly as possible. DWG and others are doing a terrific job in tirelessly cleaning up after the vandals. For a lot of people, vandalism is causing a lot of stress, and it tends a bit to get in the way of finding workable solutions that most people could agree with.

With my statement, I want to encourage people to take a step back, and work on finding possibe options with a more non-emotional way of looking at the issue and thinking about good solutions. I apologise, if this came across as complacent. That was really not the intention.

Woazboat commented 5 months ago

I’m not clear how you would ship a c++ plugin to production. You still need to go through the same ppa build steps, since you cannot build directly on the frontend servers. But then you could skip all the plugin overhead and directly add the logic to CGImap (which I want to avoid).

You can distribute the plugins as separate packages via the same methods (PPA). I don't really see the problem there. The point is that they can be updated and enabled/disabled separately from the main application.

i think before we spend too much time on building solutions, we should collect proper requirements from both DWG/OWG. I do realize that the forum is in panic mode right now, but that doesn’t really help anyone. We still don’t have consensus on what we want to do (...)

I think the rough requirements are fairly clear and any step in that direction helps. Of course the DWG/OWG needs to be involved, but let's not stall the whole process here until everything is triple signed and stamped. The least we can do is prepare the groundwork that allows those things to be implemented. The specifics can be ironed out later and details such as what should be let through and what should be blocked isn't even our task here. That needs to be configured later, we just need to provide the means to do it.

(...) and a lot of proposals are not feasible at all, like everything that’s asking for some kind of approval mechanism / queues.

Absolutely, we here know that that's not a workable option. Other people who are not familiar with the backend are not aware of that though. That's why we need to set expectations about what is doable and what isn't and provide solutions that are actually feasible.

i also believe that at the time of a single upload, we’ll never see the full picture, and whatever we build using this approach can be gamed in one way or another

Sure, there will probably always be ways around it but the idea is to make that as hard as possible and to increase the required effort and knowledge. And again, things can be improved incrementally.

DWG and others are doing a terriffic job in tirelessly cleaning up after the vandals.

I agree, but tirelessly is the key word here. How long should they be expected to deal with this until they get tired of it, which nobody could blame them for? The (human) resources and voluntary effort put in by the DWG are one of the cornerstones of OSM that we cannot do without and we need to do our best to keep that from being wasted or going away.

mmd-osm commented 5 months ago

You can distribute the plugins as separate packages via the same methods (PPA). I don't really see the problem there. The point is that they can be updated and enabled/disabled separately from the main application.

That's a topic where we really need some feedback from operations. Building PPAs is not so much a technical problem (although the process itself tends to be slow and cumbersome), but also an organizational one. We have designed the current changeset rate limiting in a way, that Tom can change all settings at any time simply by deploying the website repo to production. It doesn't require any support from CGImap side at all, and we're not part of the critical path.

Moving more of these settings and code to the CGImap repo means that Tom would have to commit changes to this repo, create new releases, go through the whole PPA publishing process (creating and signing tar files, upload them to Ubuntu launchpad, wait for the build to complete, deploy the new PPA to production). It's not exactly a very lean process.

As an outlook: at some point, we could have some sort of rule engine where members from DWG (or maybe admins, or both) can quickly configure rules on a web frontend without doing any deployment. CGImap then simply provides a list of impacted object ids to the engine and receives some green or red light. Think of what Wikipedia is doing, maybe.

See https://community.openstreetmap.org/t/the-osm-standard-tile-layer-looks-wrong-white-lines-abusive-comments-etc/111583/401

simonpoole commented 5 months ago

As an outlook:

I think it needs to be pointed out that there is no "outlook" if this continues as is, without at least a modicum of urgency in resolving . At best this will all be rendered moot and Kamils project will be deployed, yes you are playing directly in to his hands.

mmd-osm commented 5 months ago

@simonpoole: Yes, agree. The purpose of stating a (long term) outlook was to raise a bit of an awareness, that whatever we build would benefit from a certain degree of flexibility. It doesn't mean that we won't do anything in the short term, or that we need to follow this approach from day one.

If you want to help out, you could support testing activities on the dev instance as soon as we're ready. At least this time, it would be good to have more than a single tester.

@tomhughes : I don't see any comments from your side on the topic. Are you planning to implement something on Rails, like we did last time?

tomhughes commented 5 months ago

I haven't had a chance to look at it yet, or consider how it might work given the obvious potential problems.

I guess we would have to do something similar to the rate limit so that new accounts are limited in the bbox size they can create.

AntonKhorev commented 5 months ago

Things that are wrong with hard limits on bounding boxes of changesets:

mmd-osm commented 5 months ago

@AntonKhorev @SomeoneElseOSM I don't have much visibility into the discussions DWG is having internally (obviously). If you happen to have a consolidated lists of things you'd like to check during upload, it would be much appreciated if you could share that feedback with us (also non public by email).

mvl22 commented 5 months ago

This will also block changes by moderators in its current form, which is probably not a good idea.

Presumably they would no longer be needing to apply reverts to a large bbox change though, if such bbox changes were prevented in the first place by the same restriction?

mmd-osm commented 5 months ago

@mvl22 Please read my explanation in https://github.com/zerebubuth/openstreetmap-cgimap/pull/409#issuecomment-2171783738 on that topic. If it’s not clear after reading please ask again.

mvl22 commented 5 months ago

@mvl22 Please read my explanation in https://github.com/zerebubuth/openstreetmap-cgimap/pull/409#issuecomment-2171783738 on that topic. If it’s not clear after reading please ask again.

@mmd-osm Thanks. I did read that. I assumed that reverts are atomic and fully reversible, i.e. reverts could be done in reverse order and end up with the original result, in which case the bbox issue would not arise as far as I can see. However, perhaps I am showing lack of familiarity here.

Initially, I thought about a new database function, this time evaluating user id and new bounding box. Depending on the type of account, etc., we could have a bit more flexibility on the permitted bbox.

I agree.

mmd-osm commented 5 months ago

Currently, changesets are not atomic. As long as a changeset is open, you can upload further changes (assuming the 10k changes limit or some maximum open time periods are not exceeded).

Vandals can operate with multiple users at the same time, and create some interleaved changes:

Version Node 1 Node 2
Version 1 Clean Clean
Version 2 Vandal 1, Changeset 1 Vandal 2, Changeset 2
Version 3 Vandal 2, Changeset 2 Vandal 1, Changeset 1

Reverting either changeset 1 or changeset 2 will result in some crap being left over. So you need to revert both changesets at once. Combining that with partially overlapping bounding boxes of changeset 1 and 2 will break the cleanup process with today's tools, because it would exceed a static max. bounding box. That's why exceptions for moderators / admins are mandatory.

(for the purpose of keeping this explanation simple, I'm leaving out additional account age related max. bbox considerations, or anything more sophisticated)

HTH

mmd-osm commented 5 months ago

@tomhughes : Thank you for working on the Rails part in https://github.com/openstreetmap/openstreetmap-website/pull/4908

We can now start working on CGImap based on this implementation. As a reminder, the code should roughly follow our previous changeset size limit approach, as explained in https://github.com/zerebubuth/openstreetmap-cgimap/pull/409#issuecomment-2171805476

New code should be implemented against the master branch and will be backported later to 0.9.x. Backporting the new code means: we will create a new branch "release/0.9.3" with the added bounding box check. This branch is based on https://github.com/zerebubuth/openstreetmap-cgimap/tree/release/0.9.2

I have created a new tracking issue for the Rails logic integration in CGImap: https://github.com/zerebubuth/openstreetmap-cgimap/issues/412

mvl22 commented 5 months ago

Currently, changesets are not atomic. As long as a changeset is open, you can upload further changes (assuming the 10k changes limit or some maximum open time periods are not exceeded).

Ah, yes, good point - I always forget about the 'Potlatch 1 mode support' as I call it.

Can someone remind me why this is really needed in an age when almost every other API adopts a formal atomic transaction system? I really cannot imagine that being created freshly if that were being designed now in 2024. Would it perhaps be time to put aside the vagueness arising from multi-change commits with silent auto-close functionality, and have a clear model that will do away with exactly the kind of situation above?

mmd-osm commented 5 months ago

The API itself is atomic and transactional, every API method either fails or succeeds, there's no "partial application" of changes. Somehow you're trying to extend that concept to changesets as well. Think of adding some items to a shopping basket maybe. You don't want to force customers to immediately check out after adding an item. (I can't think of anything that is more closely resembling of how changesets behave at the moment).

Auto-closing changesets would have an immediate impact on apps like StreetComplete, that want to group changes of a certain type per changeset. Due to nature of the app, it needs to keep multiple changesets open for a longer time, and update them as needed.