zerebubuth / planet-dump-ng

Converts an OpenStreetMap database dump into planet files.
BSD 2-Clause "Simplified" License
30 stars 8 forks source link

Add option to drop user data from output #18

Closed woodpeck closed 6 years ago

woodpeck commented 6 years ago

This pull request adds new outputs (e.g. --xml-clean in addition to --xml) which will create files that omit user IDs and user names, but are otherwise identical. The motivation is preparedness for future, more stringent data protection regulation which might necessitate removing this information from public data.

An alternative to creating non-userdata versions right away is using software that strips user data from generated files. This is a viable, but more resource intensive option as it requires the parsing of already-generated files.

The test data in this branch depends on the previous pull request for ae3e4b7 (add historic version to database for tests); applying this to a setup without historic test data would necessitate re-generating the "expected result" files here.

woodpeck commented 6 years ago

Your requests make sense and I tried to cater to them. I had to introduce a dependency for C++11 ("warning: scoped enums only available with -std=c++11"). I was unable to nicely handle default parameters so got rid of them altogether but I guess they could be brought back if deemed necessary.

I agree that all this is a bit speculative with a view towards a possible future requirement of publishing userdata-free planets; even if such a requirement arose, it is unclear whether it would perhaps better be satisfied by first creating a normal planet file and then scrubbing it. I wouldn't mind if this pull request got rejected and I'd then simply keep my fork around in case it is needed some day.

zerebubuth commented 6 years ago

it is unclear whether it would perhaps better be satisfied by first creating a normal planet file and then scrubbing it

I think it's probably more efficient to generate them all from the same source. planet-dump-ng has already parsed all the input, so to dump it and parse it again would introduce at least some delay.

We might want to have such a program anyway, since presumably we'll need to scrub previous planet files and diffs?

if such a requirement arose

I hope that we're going to get the chance to discuss whether that would be a requirement.

Nakaner commented 6 years ago

@zerebubuth wrote:

We might want to have such a program anyway, since presumably we'll need to scrub previous planet files and diffs?

You could use the Osmium Tool to remove all metadata except version and timestamp: osmium cat -o cleaned-planet.osm.pbf --output-format pbf,add_metadata=version+timestamp input.osm.pbf (latest nightly required because the ability to write only some metadata was added about two weeks ago by Jochen and me. Note that this solution does not work in place but if you clean the dumps one after another, additional 40 GB for a temporary file shouldn't be a problem.

zerebubuth commented 6 years ago

Thanks, that's good to know.