openstreetmap / openstreetmap-website

The Rails application that powers OpenStreetMap
https://www.openstreetmap.org/
GNU General Public License v2.0
2.17k stars 914 forks source link

Change the way changesets are displayed to use achavi #1376

Open rmikke opened 7 years ago

rmikke commented 7 years ago

When I get a link to changeset, all I see on the map is a rectangle containing all the changes. Like this. Current changeset view

Why don't we use Achavi changeset view? It shows much more. Achavi changeset view

If Achavi is considered not stable enough to replace the current view, there should be at least a link to Achavi at the end of the change list: Placement of link

tomhughes commented 7 years ago

Because nobody's written it - patches welcome.

SomeoneElseOSM commented 7 years ago

Any usage of Achavi would have to handle the "changeset too big for Achavi" problem which happens with e.g. http://www.openstreetmap.org/changeset/43855051 .

tomhughes commented 7 years ago

Well I wasn't taking "use achavi" literally, aside from anything else I have absolutely no idea how it works but presumably it has some parallel database that we wouldn't want/need to replicate.

I was just taking it as a request for an "achavi like" view.

Performance would certainly be an issue - the example above was too slow to load to be usable on the main site I think which suggests that it might be even more problematic doing it from the main database that won't be optimised for this use.

simonpoole commented 7 years ago

It uses Overpass augmented diffs afaik. Adapting it for use on OSM aould certainly be possible, but clearly we would need to address the issue of being dependent on Rolands Overpass instances (which as we know are overloaded), as we already are with the explore function.

rmikke commented 7 years ago

So, there arę issues causing that we don't want to use achavi as a main way to display changesets. Ok, I understand. How about the minimum plan then? Let's add a link to achavi in standard changeset display. Everyone uses it on his/her own risk. And it should be very easy to implement this way.

tomhughes commented 7 years ago

As a matter of policy we don't normally link to third party services like that - there are dozens of changeset analyzers that people have built and if we link to one then they will all feel entitled to have one.

pnorman commented 7 years ago

To a certain extent this seems like a duplicate of https://trac.openstreetmap.org/ticket/1775. See also OWL.

Zverik commented 7 years ago

The Achavi is not scalable. Now it queries a changeset for a minute or so. If we are to direct hundreds of mappers to it, overpass api might choke.

iandees commented 7 years ago

Could we not cache the output of the Overpass query that Achavi does so that the query only happens once per changeset? Clearly this would only work when the changeset is closed, but it would mean a much faster load for subsequent users and less load on Overpass.

I was attempting to build something like this that would automatically generate "before and after" changeset files based on minutely diffs but ran out of time to set up the full history database. It would be interesting to have the main OSM database construct this sort of thing as an output alongside the minutely diffs.

tomhughes commented 7 years ago

Well that only helps if a small number of changesets are viewed repeatedly, which doesn't seem likely.

It sounds like all of this is irrelevant until we have our own overpass server anyway, as we don't want to put any more load on the current ones which are already struggling.

planemad commented 7 years ago

It sounds like all of this is irrelevant until we have our own overpass server anyway

Is there an OSMF overpass server in the works that would unblock this 🚀?

tomhughes commented 7 years ago

It's on the wish list.

pnorman commented 7 years ago

From those I've talked to, a single overpass instance isn't enough for the load achavi on osm.org would impose, and certain types of changesets are too much for the method achavi uses. It doesn't make efficient use of resources, probably since it's not what overpass was originally designed for.

mmd-osm commented 7 years ago

The Achavi is not scalable. Now it queries a changeset for a minute or so. If we are to direct hundreds of mappers to it, overpass api might choke.

It would be good to have some examples of such changesets, and maybe the time of the day when the query is being executed. Keeping an eye on the current CPU usage might also useful to see if this a general issue or "just" CPU load related.

Edit: Adding link to known achavi issue on changesets spanning several hours and/or having a large bbox: https://github.com/nrenner/achavi/issues/9, updating https://github.com/drolbr/Overpass-API/issues/322

Are there any other issues not yet covered?


FYI: I have also put up a demo achavi service, which queries a dev Overpass instance with performance improvements not yet available on the main instance.

http://dev.overpass-api.de/tmp/psv/achavi/index.html?changeset=38125857

For testing you can simply adjust the changeset number as needed, and play around a bit. I'd be very interested to get some feedback on response times. The demo database is running on hard disks and was last updated on November 19th, 2016. It includes all changesets since the ODbL license change in Sep 2012.

Copying Roland, @drolbr

mmd-osm commented 7 years ago

We're making some (albeit slow) progress on large changesets, like www.openstreetmap.org/changeset/45018920 (btw: kudos to @nrenner for supporting this effort!).

As you can see in the performance measurement results, an (id: ...) based approach brings response time for the respective Overpass query down to 4s, a massive 50k cs all over the world cs is at 11s, many other changesets with a large number of items and/or large extent are at about 1s. Besides, this approach has a much lower memory footprint.

Now, the main drawback at this time is, that we'd need a list of relevant {node/way/rel} ids, which we retrieve from the main OSM API and later pass to the Overpass query. Of course, it would make more sense to get those ids per changeset directly from Overpass API. That's still a todo for @drolbr, I would assume and my expectation is that won't need the main OSM API further down the road.

This complements efforts by @geohacker to get decent history visualization, where caching is not yet available for older changesets, like for the initial example of cs 45018920: https://osmcha.mapbox.com/changesets/45018920

Zverik commented 7 years ago

For an alternative, we could move changeset caching by @geohacker to an OSMF-supported hardware and use it for displaying changeset geometries. Having these span only until 2012 seems good enough to me, and there won't be any requests to a third-party service. For now, I often get temporarily blocked from Overpass just by using the query tool on the website 3-4 times in a row, who knows what happens if it is used at 20 changesets at once, with infinite scrolling enabled.

mmd-osm commented 7 years ago

For now, I often get temporarily blocked from Overpass just by using the query tool on the website 3-4 times in a row, who knows what happens if it is used at 20 changesets at once, with infinite scrolling enabled.

Well, that reflects the situation on the current production server only. That's a different db compression, not all performance and scalability fixes in place, etc., leading to a rather high overall CPU utilization. For those reasons I don't recommend to use it as a reference point for performance testing. The tests I mentioned earlier all ran on an otherwise more or less idle development server, where only minutely updates are being applied.

Also, you wouldn't display details for 20 changesets at once. Like in the case of OSMCha you start with a list of changeset metadata (id, description), and only once you drill down to a particular changeset, further details down to the node/way/rel level are shown.

Storing all changesets as GeoJSON needs a lot of disk space and is probably not the most efficient approach to take. Overpass API has given up on a similar idea a few years back due to space constraints. I don't know what it means to store all cs back to 2012 using that approach, but disk requirements might be in the TB range. By contrast, the Overpass DB is at 200-240GB (depending on compression) including full history back to Sep 2012.

Besides, @geohacker still uses Overpass to populate the cache, and every effort to make the original Overpass queries run smoother (or at all), would also immediately help his caching approach - win - win.

Zverik commented 7 years ago

You are definitely doing an important job. Speeding up Overpass API and finding faster ways of working with it is obviously useful. I am trying to map your numbers on the current history page of OSM, which displays 20 changesets at once. And that gives me 3-5 seconds for a user.

I don't know the average size for a geojson-ed changeset, but if it is under 100k (which means under 10k gzipped: changeset jsons are very verbose), that would require under 500 GB for all the changesets, which is smaller than a rendering database.

mmd-osm commented 7 years ago

I am trying to map your numbers on the current history page of OSM, which displays 20 changesets at once.

Ah, ok. I thought we were discussing how a single changeset should be rendered, like initially mentioned by @rmikke in this issue. I don't know what is means from a UX perspective when loading data for 20 changesets at once, which may be quite large. Before challenging an even more difficult problem, I would propose to focus on a single changeset for the time being...

I don't know the average size for a geojson-ed changeset

I'd love to see those figures based on @geohacker's experience with the current cache.

andrewharvey commented 6 years ago

As a matter of policy we don't normally link to third party services like that - there are dozens of changeset analyzers that people have built and if we link to one then they will all feel entitled to have one.

Similar to how that giant edit button gives you the option to choose from id, Potlatch 2, JOSM or Merkaartor. It would be very useful for mappers to have a dropdown select on the changeset page to open a changeset in a variety of 3rd party changeset analyses. With the default being whichever the user last choose, or set in their preferences.

mmd-osm commented 5 years ago

I don't know the average size for a geojson-ed changeset, but if it is under 100k (which means under 10k gzipped: changeset jsons are very verbose), that would require under 500 GB for all the changesets, which is smaller than a rendering database.

100k per Geojson file is spot on for the 10'000 changesets I checked. OSMCha doesn't use compression, which means that around 1.5 - 2TB of uncompressed GeoJSON data might have accumulated on AWS S3 in the meantime. It seems that @geohacker is no longer with Mapbox, so we will probably never find out the exact numbers.

By the way, is everyone using OSMCha these days and this issue no longer being relevant?

geohacker commented 5 years ago

Hi @mmd-osm, your estimate sounds about right. I'll copy @jinalfoflia since she's at Mapbox and probably help us look at this, if needed.

I think everyone has adopted OSMCha for inspecting changesets so this may not be relevant for the short-term. But I do think having this view on osm.org is valuable in the long run.

mmd-osm commented 1 week ago

Alternative approach to Achavi is being proposed here: https://www.openstreetmap.org/user/TrickyFoxy/diary/405188