koordinates / kart

Distributed version-control for geospatial and tabular data
https://kartproject.org
Other
522 stars 41 forks source link

Faster GPKG checkout: Use spatial index bulk loading #946

Open craigds opened 10 months ago

craigds commented 10 months ago

Checkout of a full GPKG working copy is quite slow. On my MBP (M1/2021):

$ time kart checkout
Creating GPKG working copy at nz-primary-land-parcels.gpkg ...
Writing features for dataset 1 of 1: nz_primary_land_parcels
nz_primary_land_parcels: 100%|███████████████████████████████████████████████████████████████████| 2375572/2375572 [04:40<00:00, 8476.60F/s]
kart checkout  0.01s user 0.02s system 0% cpu 4:51.30 total

https://github.com/rouault/sqlite_rtree_bulk_load looks useful for reducing rtree creation time for bulk loads by >50%. So that 5 minutes would probably drop to 2 minutes.

Other idea: Could we start writing a WC before we've finished cloning? I don't know if that's a good idea or not; it sounds messy.