openstreetmap / gpx-updater

Retrieve new OSM GPS tracks as they are uploaded, and invalidate cached tiles
7 stars 2 forks source link

Deleted GPS traces do not disappear #4

Open ghost opened 6 years ago

ghost commented 6 years ago

Recently (some weeks or months ago), I've removed some inaccurate GPS traces from http://www.openstreetmap.org/traces, but they still show up in iD. Are they going to disappear at some time?

(First reported at https://github.com/openstreetmap/iD/issues/4543.)

e-n-f commented 6 years ago

Unfortunately the OSM GPS database doesn't provide any way to be notified when traces are deleted, so the map server doesn't know anything about the deletions.

At some point I should try wiping the local data and reimporting everything, but that takes a long time.

johantiden commented 3 weeks ago

I'm very concerned about this. You can clearly see where I live from some accidentally uploaded tracks, which I now can't delete. This would probably be against GDPR for anyone hosting this data. It's sad that this issue hasn't had any work done in over six years.

What can I do to get some progress on this?

tomhughes commented 3 weeks ago

Well the GPS tiles aren't associated with any individual so I don't see how GDPR is involved - the original data (which can potentially be linked to a user depending on the trace settings) does get deleted and without that there's no way to know where a particular dot or line on the GPS tiles came from.

That said if you want to "get some progress on it" then write some code, or persuade somebody to write some for you.

All the code that runs the GPS tiles was written years ago by a contributor that has long since left and hasn't been maintained at all since.

The main code is https://github.com/e-n-f/datamaps which produces a program called encode that processes a trace and creates some sort of binary representation of it and then one called merge that merges that with the existing master database from which tiles are rendered.

I don't think encode stores any record of which trace data came from which means there is no obvious way to remove that data and I'm also not sure what it does when multiple traces contain the same point but most likely merge just merges those.

The simplest solution might be an unmerge so you could re-encode the trace being deleted and unmerge it from the main data but that still needs to figure out any issues around overlaps.

johantiden commented 3 weeks ago

Thanks for the intro. I'll see if I get the time and the energy to check it out!

As for the GDPR part, that's not how it works. Personal data, even if anonymized is still personal data. In this case it would also be trivial if anyone wanted to find me. I have a lot of edits around my neighborhood and there is one single house with lots of gps traces around it. I'd call that data very personal.

To give another context: How about traces on personal property? A famous person living in the woods may not want traces leading up to their house, showing how to get there.

I consider this a fatal design flaw of the trace renderer. If it can't keep up with the changing opinion about privacy, maybe we have to choose to fix it or scrap it. The people hosting this data should really consider it. Until the software can be "fixed" maybe it's time to pull the brakes.

tomhughes commented 3 weeks ago

I'll happily shut that site down tomorrow if the powers that be tell me to.

The whole concept of right to delete for public data is an entirely nonsensical part of GDPR though I'm afraid. It's perfectly reasonably for private data but once you've released something to the world at large trying to put the genie back in the bottle is never going to work.

johantiden commented 3 weeks ago

That's not how private vs public data is defined. Data can only be made public by aggregating it into a form that is not identifiable, i.e. you can't single out any single data point anymore. This usually ends up being coarse enough heatmaps and where each such point contains at least two individuals, making it impossible to single one individual out.

GDPR is a blunt tool meant to stop mass surveillance by mega corporations, not punish small actors. It just happens that GDPR currently aligns with the concerns I have about my data. I agree that the traces all over my backyard is still my data and it's a shame I can't clean it up.

I'm not going to invoke any more laws now. I just wanted to raise awareness of this, so that future replacements of this can design removal of data as one of the core features, as it should be. We call it CRUD for a reason.

johantiden commented 3 weeks ago

I tried briefly to compile and run this, to see where it's at. I could compile datamaps fine but now I'm stuck at ./import/src/gpx-parse at update:100. Where is this dependency? I doubt it's this one: https://github.com/elliotstokes/gpx-parse

Firefishy commented 3 weeks ago

I tried briefly to compile and run this, to see where it's at. I could compile datamaps fine but now I'm stuck at ./import/src/gpx-parse at update:100. Where is this dependency? I doubt it's this one: https://github.com/elliotstokes/gpx-parse

gpx-parse is from: https://github.com/e-n-f/gpx-import/tree/master/src

These are the other repos which are used:

https://github.com/openstreetmap/gpx-updater.git https://github.com/e-n-f/datamaps.git

NeatNit commented 3 weeks ago

I'm rooting for you @johantiden! I personally don't have quite as much stake on this as you, and I don't believe it poses as much of a privacy risk as you seem to, but it's a bad situation nontheless and absolutely should be fixed. Good luck!

If you happen to also find easy fixes for whatever causes osm.org to show only a fraction of the traces that JOSM can show, that would also be fantastic. But I know that's now what you set out to do, no pressure.

johantiden commented 3 weeks ago

I'm rooting for you @johantiden! I personally don't have quite as much stake on this as you, and I don't believe it poses as much of a privacy risk as you seem to, but it's a bad situation nontheless and absolutely should be fixed. Good luck!

Thank you! Don't get your hopes up though. I don't know any Perl nor its ecosystem.

If you happen to also find easy fixes for whatever causes osm.org to show only a fraction of the traces that JOSM can show, that would also be fantastic. But I know that's now what you set out to do, no pressure.

I suppose the other open issues explain some of that, at least. This repo only fetches data from the public rss of traces and traces don't show up there when permissions are changed.

I hope to be able to build a "resync" feature that instead uses the other APIs to fetch all traces, given a region of map. This may not be feasible hardware wise though - it might be a very heavy operation to start from scratch. We'll see.

mmd-osm commented 3 weeks ago

instead uses the other APIs to fetch all traces, given a region of map

Yes, this is already known to be fairly slow: https://prometheus.openstreetmap.org/d/5rTT87FMk/web-site?orgId=1&refresh=1m&from=now-90d&to=now&viewPanel=16 --> select "trkpts" for trackpoints.

It doesn't really help that people (and some bots in particular) have uploaded around 29 billion trace points in total -> https://planet.openstreetmap.org/statistics/data_stats.html

johantiden commented 3 weeks ago

instead uses the other APIs to fetch all traces, given a region of map

Yes, this is already known to be fairly slow: https://prometheus.openstreetmap.org/d/5rTT87FMk/web-site?orgId=1&refresh=1m&from=now-90d&to=now&viewPanel=16

It doesn't really help that people (and some bots in particular) have uploaded around 29 billion trace points in total -> https://planet.openstreetmap.org/statistics/data_stats.html

Interesting! Maybe it's time to start purging some of that data! There is a lot of useless data around my neighborhood that I would like to be able to remove - including my own. We'd open up community moderation of traces. We should either see the data as private (and not publish it in this form), or public, which to me means that anyone can edit it - just like the map.

There are a lot of users that have uploaded traces that are not active anymore but their traces still clutter the map.

This is all visionary of course. I know osm is a very poltical domain. I don't have high hopes of large changes like this. :)

HolgerJeromin commented 3 weeks ago

Maybe it's time to start purging some of that data! There is a lot of useless data around my neighborhood that I would like to be able to remove - including my own.

You, cleaning up some hundred tracks would make no real difference.

The main point are apps which store extreme overnoded data. See here: https://github.com/openstreetmap/operations/issues/931#issuecomment-1681688115

johantiden commented 3 weeks ago

That was just an example. Me as a moderator though, if the data could be moderated would get some work done! See my 10000 edits on osm ;)

mfornasa commented 1 week ago

@Firefishy mentions here that a possible workaround to remove deleted traces would be re-rendering the entire dataset on a regular interval. Am I getting this right?

If so, is it feasible to schedule a re-rendering, what are the drawbacks and the advantages, even besides this specific issue?

tomhughes commented 1 week ago

No he says that is the only way to do it, not that it is something that is practical to do on a regular basis.

IIRC we have only done this once in ten years and it took something like a month to complete.

mfornasa commented 1 week ago

@johantiden I'm very concerned about this. You can clearly see where I live from some accidentally uploaded tracks, which I now can't delete.

Same here. Law asides, this seems to me a privacy (and sometimes a security) risk in case of innocent user mistakes.

@tomhughes The whole concept of right to delete for public data is an entirely nonsensical part of GDPR though I'm afraid. It's perfectly reasonably for private data but once you've released something to the world at large trying to put the genie back in the bottle is never going to work.

I agree in principle, but the main issue lies with the user interfaces within the OSM ecosystem for uploading GPX traces, which can be misleading in terms of how data is shared. For example, the mobile editor Go Map!! allows users to publish public GPX traces with just a single button press. No choice of private/public is even offered, nor a proper disclaimer.

This contrasts sharply with the process for submitting regular changesets, which typically involves multiple steps and a clear disclaimer about the publication of data.

Firefishy commented 1 week ago

IIRC we have only done this once in ten years and it took something like a month to complete.

I recall it taking much longer to generate. I think it was as least 2 months. It would be longer now.

johantiden commented 1 week ago

I'm building a new renderer from scratch, just to learn the API:s and use a language I'm comfortable with :)

A major blocker (in addition to it being slow) is that the GPS traces API seems to only return traces starting in a given bounding box, not intersecting it.

johantiden commented 1 week ago

One would have to zoom out quite a bit to ensure that all traces for a given tile are visible. Shorter traces will have a higher chance to be visible.

So the idea is to make a renderer that renders "on demands", cache:ing all tiles to the file system. For a given tile z/x/y, zoom out to 15(?) and get all the traces there, cache the traces to file for future tile renderings. To flush the cache, one would simply need to flush the wanted tiles (and traces) from the file system.

The rendering would be really slow at first so my idea is to pre-populate the lowest zoom level.

NeatNit commented 1 week ago

A major blocker (in addition to it being slow) is that the GPS traces API seems to only return traces starting in a given bounding box, not intersecting it.

This doesn't seem to be true, in JOSM I can select a very small bounding box and still get a lot of traces in it which are definitely started outside of the bounding box.

johantiden commented 1 week ago

This doesn't seem to be true, in JOSM I can select a very small bounding box and still get a lot of traces in it which are definitely started outside of the bounding box.

Thanks for checking. I double-checked my code and realized I was way too eager to round off the coordinates, clipping the whole tile out of the query :dancing_men:

johantiden commented 1 week ago

I will continue my work at https://github.com/johantiden/osmheat if anyone is interested.

I got a naive heatmap going, which obviously need a lot more work. Screenshot_20240917-140220

By using a heatmap, every single point of data is less revealed and one could just flush the cache in order to re-render any tiles with sensitive data.