openstreetmap / openstreetmap-website

The Rails application that powers OpenStreetMap
https://www.openstreetmap.org/
GNU General Public License v2.0
2.17k stars 911 forks source link

Idempotency for API 0.6 #2201

Closed mmd-osm closed 4 years ago

mmd-osm commented 5 years ago

API 0.6 has no concept to prevent duplicate POST operations, which could happen as a client retries a failed network operation. As per RFC 7231, POST operations are not idempotent, which is quite unfortunate in particular for diff uploads: assuming the changeset consists of "create" operations only, those objects might be created multiple times on the database.

I've created this issue to discuss different options (such as adding a downward-compatible HTTP Header "Idempotency-Key"), and find ways on how to implement it in the API.

Some pointers:

Link proposes as part of their RESTful API guidlines:

A client specific idempotency key provided via Idempotency-Key header in the request. The key is not part of the resource but stored temporarily pointing to the original response to ensure idempotent behavior when retrying a request (see May: Consider to Support Idempotency-Key Header).

mvl22 commented 5 years ago

Rather than require clients to send a token and have responsibility for checking this, would one approach, which I suspect would not require substantial changes to the server infrastructure be:

  1. Whenever a POST comes in (for any API call), create some kind of hash of the incoming data, first omitting anything which is known to be different (e.g. the request time).
  2. Store the hash in a global table of requests(hash, response) which is a table of incoming hashes and responses where each entry has a lifetime of say one day or a week (as users/clients are unlikely to retry after that).
  3. When the server has processed the request and sends back the response set the response field to the response XML and continue the response to the client as normal.
  4. If the server receives a repeated request that has the same hash, i.e. the client has just resent the same request, immediately send back the same response, without any further processing. For good practice, send an additional header stating that this is a cached response.

This would seem to me to require no changes to any client, and eliminate almost all duplications on the server, with the code change essentially just a global interception to any POST request by implementing that cache at that level.

This approach admittedly does not solve Andy's point:

But that still doesn't work, since if the server receives a new trace, and then the owner changes the description 2 seconds later on the website, and the server receives the same creation request 2 seconds after that, it doesn't have enough information to know if it's an intentional recreation or just a resent request.

but I don't think that is actually solvable, because a changed description is by definition a change in the request and could constitute a deliberate and separate change to the data.

simonpoole commented 5 years ago

@mvl22 see https://github.com/openstreetmap/openstreetmap-website/issues/2201#issuecomment-485143934 I would think returning an error in all cases in which no token has been supplied is in order, there are obviously a couple of edge cases in which this might not be able to catch all repeated uploads outside of actual edits, but seriously, I don't think we really care about those.

androideveloper commented 5 years ago

We are experiencing the same issue with okhttp. We are going to use the concept of X-REQUEST-ID in the header to solve the duplication issue https://stackoverflow.com/a/54356305/1635488. Server should keep list of ids and check if request was procedeed earlier.

mmd-osm commented 5 years ago

@simonpoole : I was sort of expecting your feedback on my proposal to include an auto close (-> close_changeset=true) attribute in the osmChange message header (https://github.com/openstreetmap/openstreetmap-website/issues/2201#issuecomment-485230581).

Several people in this issue seem to suggest that they want atomic changesets, which this attribute would enable. If the upload fails or times out, it would be safe to re-upload: either the osmChange mesasge gets processed, or rejected in case the changeset is already closed.

I don't want to introduce one upload per changeset as a default behavior, as it would be an incompatible, breaking change. Adding that attribute is the next best thing to do. By the way, an additional URL parameter "?close_changeset=true" could serve a similar purpose, if there are concerns about extending the XML structure.

Should we go for this as one option, and continue with the diffResult topic?

simonpoole commented 5 years ago

@mmd-osm IMHO the people wanting atomic changesets actually want something else: the ability to upload a essentially unbounded number of changes over an unbounded number of uploads atomically. That and a couple of unicorns :-).

Auto-closing the changeset would mostly resolve the "has the upload succeed or not" question for the non-chunked uploads at the expense of having to re-download, which in this case I would think is acceptable. In general I would support this if no more general solution is on the horizon.

mvl22 commented 5 years ago

@mvl22 see #2201 (comment) I would think returning an error in all cases in which no token has been supplied is in order, there are obviously a couple of edge cases in which this might not be able to catch all repeated uploads outside of actual edits, but seriously, I don't think we really care about those.

I am suggesting though that clients should not have to change - instead that the server itself should do the token generation.

Would my suggested approach noted above work?

mmd-osm commented 5 years ago

@simonpoole : right, this inevitably includes ideas like throwing away the whole changeset unless the changeset is properly closed (link). Clearly, that's something well beyond the scope of this issue. We would immediately inherit all of the issues of the asynch queues (https://github.com/openstreetmap/openstreetmap-website/issues/2201#issuecomment-484484302). I'd say this is actually quite similar to having some kind of staging area to stash all changes. Say Hello to the world of branches, commits and merge (conflicts), this time with OSM data.

@mvl22 : there are some similarities to https://github.com/openstreetmap/openstreetmap-website/issues/2201#issuecomment-480499265 in the "let's handle all POST requests" approach, only that you're creating the key based on a payload hash - which we discussed in https://github.com/openstreetmap/openstreetmap-website/issues/2201#issuecomment-485143934 as "calculating a checksum". I noted in https://github.com/openstreetmap/openstreetmap-website/issues/2201#issuecomment-485145410 this may or may not be possible, depending on the endpoint semantics and needs to be clarified. Keeping the full blown XML response around in a table was already questioned in https://github.com/openstreetmap/openstreetmap-website/issues/2201#issuecomment-484396809.

mmd-osm commented 4 years ago

@gravitystorm has published some thoughts on this topic in his blog: https://blog.gravitystorm.co.uk/2019/10/17/smoother-api-upgrades-for-openstreetmap/ :

My proposal is to include an idempotency key in the diff upload. The server will store these, and if it sees the same upload key for a second (or third) time, it will realise the response has been lost somewhere, and no harm is done.

mmd-osm commented 4 years ago

This issue has become stale after more than a year without much progress. I’m closing it now as I don’t see any way to move this topic forward.