mapbox / tippecanoe

Build vector tilesets from large collections of GeoJSON features.
BSD 2-Clause "Simplified" License
2.64k stars 426 forks source link

Determinism of result #884

Open bjnsn opened 3 years ago

bjnsn commented 3 years ago

I have a large set of different data sources coming from hundreds of different geojson files. Some of the resulting mbtiles are combined ones, but many are not. We are storing the output in a git repository - which makes it easy for local development and deployment.

Unfortunately, it is difficult to figure out which files have actually changed (git can't tell) because it appears the output is not deterministic. That's generating huge (unnecessary changes) in the git repository.

Generally, we're using a command like this to generate the result:

tippecanoe -o output.mbtiles -f --detect-shared-borders --base-zoom=6 --maximum-zoom=10 --simplification=8 -n="file description" input.json

Is there anything that can be changed to make this generate the same result, given identical inputs, each time?

Thanks!

bjnsn commented 3 years ago

For anyone running into the same issue, I've found a workaround - using version control on the source files and only generating new mbtiles where GIT says that the sources have changed within the last n commits.

e-n-f commented 3 years ago

The thing that makes the mbtiles indeterminate from one run to the next is that different threads can complete in different orders, so the tiles can be added to the tiles table in a different sequence. I don't know a way to guarantee that two sqlite files will be identical even when rows are added in the same order. But if you tippecanoe-decode the two versions of the file and compare the output, you should be able to safely revert the new mbtiles file if it decodes exactly the same as the previous mbtiles file.