pelias / whosonfirst

Importer for Who's on First gazetteer
MIT License
27 stars 43 forks source link

query performance tweaks #425

Closed missinglink closed 5 years ago

missinglink commented 5 years ago

Hi @Joxit

I did a deep-dive today into the performance issues noted by @orangejulius.

I used the docker project portland-metro to test it out, focussing on the locality layer as it has the most records. I'm using the pip-service repo to test things out, because that's our greatest concern regarding performance (start times)

As a 'control' here is the timing using bundles on my development machine:

info: [wof-pip-service:master] locality worker loaded 1006 features in 0.848 seconds

With the commits you added yesterday in https://github.com/pelias/wof-admin-lookup/pull/249, I can now confirm that the importPlaces property is being honoured correctly and so it's not trying to load the whole planet anymore ;)

Here's a benchmark when using the sqlite stream, before this PR:

info: [wof-pip-service:master] locality worker loaded 1006 features in 5.845 seconds

...and again after:

info: [wof-pip-service:master] locality worker loaded 1006 features in 0.697 seconds

keep in mind that this isn't the most scientific testing method but I ran it many times and the result was always near-to or below the old bundle loading time 🎉

I've made a few changes to the query, these are mostly cosmetic (although I was confused with the null island check, was this triggered for a geometry had only one of it's axis set to 0?).

The major change here is the addition of a new covering index, this alone greatly speeds up query execution.

I would prefer not to modify the sqlite databases, so I've opened https://github.com/whosonfirst/go-whosonfirst-sqlite-features/pull/4 upstream, hopefully, they will merge this so we don't have to start monkey-patching the WOF distributions.

Let me know what you think.

I'd probably like to add some tests for cases such as null island and the new clause which removes Point geometries within the query by comparing min_latitude and max_latitude etc. This option was removed from the PR, see issue comments

Joxit commented 5 years ago

The index was a great idea ! I hope they will merge your changes.

I glad that importPlaces is working fine now :+1:

Should we really remove all points here ? This is not currently the case with the bundle version. Points are removed only for wof-admin-lookup.

Joxit commented 5 years ago

I just checked, and we are importing WOF points. So it would be better if the filter on these records is not done in the query. (My random example was whosonfirst:locality:421205547 for Subayrah, Yémen)

missinglink commented 5 years ago

Yep, totally agree, I'll remove the points-filter code.

missinglink commented 5 years ago

Query clause to filter out points removed via rebase.

missinglink commented 5 years ago

Performance seems to be the same with/without the points filtering for my portland-metro test:

info: [wof-pip-service:master] locality worker loaded 1006 features in 0.768 seconds
missinglink commented 5 years ago

Good to go?

Joxit commented 5 years ago

Yes ! LGTM :rocket: