koordinates / kart

Distributed version-control for geospatial and tabular data
https://kartproject.org
Other
532 stars 41 forks source link

pointcloud/raster: de-dupe tiles from alternate refs #1017

Open craigds opened 2 weeks ago

craigds commented 2 weeks ago

Is your feature request related to a problem? Please describe. kart import will convert tiles to cloud-optimised format during the import. Because the output is nondeterministic it makes an attempt to de-dupe these files based on the sourceOid from pointer files found at HEAD. The effect is that you can import the same dataset multiple times, and only changed files will actually be converted to cloud-optimised (identical tiles from the previous commit will be reused)

However, if the pointer files for existing tiles live on another ref rather than HEAD, it would be ideal if we could re-use those too.

Describe the solution you'd like

Add a kart import --find-tiles-at-all-refs flag to de-dupe tiles based on all refs.

Perhaps (probably based on whether it's very slow or not?) this could be the default behaviour. In case there are already relevant tiles to de-dupe this will be a considerable speedup, so it seems reasonable to accept at least a small performance loss in the general case (?)

Describe alternatives you've considered The flag could take a set of refs patterns, but this might be overkill, e.g.

--find-existing-tiles-at='HEAD,refs/my-imports/*'