openstreetmap / openstreetmap-website

The Rails application that powers OpenStreetMap
https://www.openstreetmap.org/
GNU General Public License v2.0
2.21k stars 918 forks source link

Global element versioning #4660

Open AntonKhorev opened 7 months ago

AntonKhorev commented 7 months ago

Problem

Sometimes the code tries to reproduce the sequence of element modifications but the data to do this is not directly available.

Changeset downloads rely primarily on timestamps, and then additionally sort by version, type etc and it's done differently in osm-website and cgimap. This additional ordering may not be correct. For example, it makes sense to reverse ordering by type for deletions. Relying on the additional ordering becomes necessary if timestamps are truncated, and that may happen if the data is restored from a dump.

Changeset webpages show lists elements. Those also reproduce the modification sequence in most of the cases but that happens as a side-effect of them not being sorted. This lack of sorting causes problems elsewhere and so does assuming that the elements are not sorted, see https://github.com/openstreetmap/openstreetmap-website/pull/4571.

Description

Is it feasible to do this thing? https://github.com/openstreetmap/openstreetmap-website/blob/0c4cbda662502f05646a4e82f5d8f639b9183059/app/controllers/api/changesets_controller.rb#L154-L157

  1. Create one sequence to serve three old element tables.
  2. Start it at a large enough number.
  3. Add a column to each old element table with a default value taken from this sequence. This should work with db clients that don't yet know about this column but insert elements.
  4. Backfill the column according to this order. It's not necessarily correct, maybe update it later.
  5. Use this global version column for ordering (old) elements inside changesets on osm website and in other places.

Screenshots

No response

mmd-osm commented 7 months ago

Backfill the column according to this order. It's not necessarily correct, maybe update it later.

I'm not sure how feasible this is from a total runtime point of view. In any case, mass updating old element tables is probably bad news for our minutely diff replication, since it would end up generating billions of entries in our replication log files.

I haven't tested it, but I believe it would bring minutely diff replication to a standstill for quite some time. The original use case for old element updates was to support redaction, which is rather low data volume. Also, we're not using this information at the moment.

By the way, I'm well aware that the additional sorting in both implementations would not necessarily match the original upload sequence. However, I never really had the expectation that it should.