linz / nz-buildings

:house: NZ Buildings | QGIS plugin and PostGIS database schema for building dataset management
https://nz-buildings.rtfd.io
Other
6 stars 2 forks source link

feat: improve performance of functions used when bulk loading dataset #557

Open ajacombs opened 2 years ago

ajacombs commented 2 years ago

Improves performance of the buildings_bulk_load.bulk_load_outlines_insert_supplied function, by optimising three other functions in buildings_reference which this calls: suburb_locality_intersect_polygon, town_city_intersect_polygon, and territorial_authority_grid_intersect_polygon.

These three functions perform equivalent tasks: finding the suburb, town, and TA which each building outline overlaps the most. It is possible for an outline to overlap none, one, or multiple of each of these. If an outline overlaps more than one, we want to choose the one with the most overlap. This requires calculating the area of the intersection, which takes a non-trivial amount of processing time. Most outlines however will only intersect with a single suburb, town, or TA.

By first finding which and how many areas an outline intersects, we can then avoid having to take the time to calculate the area of the intersection in the case where there is only a single candidate. This avoids significant amounts of computation.

A test bulk load into a database running in docker saw the time taken to execute buildings_bulk_load.bulk_load_outlines_insert_supplied when loading the Ecopia Hawkes Bay dataset with 134,209 outlines drop from 1h 44m 23s to 17m 59s.

Notes on Sqitch

I am unsure if the way I have added the changes through Sqitch is correct, and have pushed this branch up so others can check what I have done, though I am not confident it is fully correct.

This pull request should be seen more as a "here is what I did, is this the right way to do it?" than fully finished changes ready to be merged.

Steps I followed using Sqitch were:

Things I'm not sure about with Sqitch:

dwsilk commented 2 years ago

This pull request should be seen more as a "here is what I did, is this the right way to do it?" than fully finished changes ready to be merged.

You can mark a PR as draft and then request a review, which is I guess the built-in GitHub functionality to indicate this.

ajacombs commented 2 years ago

This pull request should be seen more as a "here is what I did, is this the right way to do it?" than fully finished changes ready to be merged.

You can mark a PR as draft and then request a review, which is I guess the built-in GitHub functionality to indicate this.

Aha, didn't know you could do that, thank you. Changed to draft now.

Douglas-K commented 2 years ago

This is looking great overall, as far as I can see. For the Sqitch things you aren't sure about: