Open ajacombs opened 2 years ago
This pull request should be seen more as a "here is what I did, is this the right way to do it?" than fully finished changes ready to be merged.
You can mark a PR as draft and then request a review, which is I guess the built-in GitHub functionality to indicate this.
This pull request should be seen more as a "here is what I did, is this the right way to do it?" than fully finished changes ready to be merged.
You can mark a PR as draft and then request a review, which is I guess the built-in GitHub functionality to indicate this.
Aha, didn't know you could do that, thank you. Changed to draft now.
This is looking great overall, as far as I can see. For the Sqitch things you aren't sure about:
suburb_locality
looks fine. Master only has one function in it. So it is correct that this branch has one function for both @4.0.0-dev1.sql
and your updated version.suburb_locality
only has one function in it.
Improves performance of the
buildings_bulk_load.bulk_load_outlines_insert_supplied
function, by optimising three other functions inbuildings_reference
which this calls:suburb_locality_intersect_polygon
,town_city_intersect_polygon
, andterritorial_authority_grid_intersect_polygon
.These three functions perform equivalent tasks: finding the suburb, town, and TA which each building outline overlaps the most. It is possible for an outline to overlap none, one, or multiple of each of these. If an outline overlaps more than one, we want to choose the one with the most overlap. This requires calculating the area of the intersection, which takes a non-trivial amount of processing time. Most outlines however will only intersect with a single suburb, town, or TA.
By first finding which and how many areas an outline intersects, we can then avoid having to take the time to calculate the area of the intersection in the case where there is only a single candidate. This avoids significant amounts of computation.
A test bulk load into a database running in docker saw the time taken to execute
buildings_bulk_load.bulk_load_outlines_insert_supplied
when loading the Ecopia Hawkes Bay dataset with 134,209 outlines drop from 1h 44m 23s to 17m 59s.Notes on Sqitch
I am unsure if the way I have added the changes through Sqitch is correct, and have pushed this branch up so others can check what I have done, though I am not confident it is fully correct.
This pull request should be seen more as a "here is what I did, is this the right way to do it?" than fully finished changes ready to be merged.
Steps I followed using Sqitch were:
db/sql
directory had no pending changes.sqitch rework <file-identifier> -n '<Note text>'
.suburb_locality.sql
,territorial_authority.sql
andtown_city.sql
.suburb_locality.sql
,territorial_authority.sql
andtown_city.sql
to add a block to check for text added in the new version of each function.Things I'm not sure about with Sqitch:
@4.0.0-dev1.sql
files fortown_city
andterritorial_authority
seem to have the entirity of what the non-suffixed file was before these changes, butsuburb_locality
only has the previous version of the one function that was changed. That doesn't seem quite right, given the changes were equivalent across the three files.suburb_locality
only had a singleSELECT has_function_privilege
generated, when the others had one generated for each function. I didn't add the others, but it seems inconsistent.