Test performance of complex queries

billgeo commented 1 year ago

So that complex queries are completed in a reasonable level of time, we want to test a complex query provided by Chris Crook.

Tasks

Benchmark
Make performance improvements one at a time
Stop when the timing is reasonable or we have run out of ideas to improve performance

His query typically takes about ~4 minutes on prdassgeo02 (a non FDW database with BDE data in it). It would be useful to investigate if we can tune the FDW database can do the same query in under 10 mins. 20-30 mins may still be acceptable.

Query to test with
log with timings ~ 4mins on prdassgeo02 server

Test on Wednesday March 1st

Took 53 mins
PostgreSQL logs with query plans postgresql.log.2023-02-27-21.txt and postgresql.log.2023-02-27-22.txt
Decided to try https://github.com/linz/bde-fdw-rds/issues/45

Tested again by @ccrook on Friday 3rd

Hit the four hour maximum timeout for a query.

Jimlinz commented 1 year ago

Just want to point out that it would be unfair to compare timing with a query run locally since fdw needs to fetch data from a remote server prior to execution. That said, the response should be within an acceptable time frame (which it currently isn't). We need to figure out what the bottle neck is and adjust accordingly. One thing to try, is to increase rds instance size. Currently the sizing of bde processor (and even the bde replica) is multiples of the bde_analytics db.

Jimlinz commented 1 year ago

On db.r6g.xlarge rds instance and use_remote_estimate set to false, the query above (provided by Chris) took about 38 minutes to run. When use_remote_estimate is set to true, query takes much much longer...

Edit: Prod bde runs on 4xlarge (though different instance family). To get a like for like comparison, we should test it on similar instance size (before weighing up response vs costs).

IMHO we should instead look at why the query takes such a long time to complete.

billgeo commented 1 year ago

The 4 mins comes from prdassgeo02. Can get the specs of that if you like? But from memory I'd say it'd be closed to a r6g.large.

How long does it take for a r6g.large ? xlarge is quite expensive ~$800USD per month. So probably not justified to save 10 minutes on the odd query.

The query is doing quite a lot of geospatial processing, so 4 minutes doesn't seem unreasonable from what I can see.

ccrook commented 1 year ago

I certainly wouldn't want to add a large expense for the sake of one particularly complex query run quite infrequently. Certainly I'd like to be able to run a query like this to completion, but it isn't a regular requirement (at least not this large) so definitely not worth adding $100s taxpayer dollars per month indefinitely. I'd be fine with it taking one hour. It is deliberately large to test capability. Note that this query is testing geometries in parcels updated in the last 20 BDE updates.

delete from tmp_pargeom_status where revision_created >  (select max(revision_checked)  from tmp_pargeom_status)-20;

This can be reduced by changing 20 to a smaller number. A likely scenario would be a weekly run each weekend. (Also this is not a fixed size task as the data is continually changing)

Jimlinz commented 1 year ago

Is the data available on prdassgeo02 identical to the production bde processor (i.e. do both db contain the same amount of dataset)? Just checking if there isn't another process removing data that isn't required for analytics.

ccrook commented 1 year ago

I think there is the same data - though bde_analytics table_version tables may be much smaller depending on when they are populated from.

Jimlinz commented 1 year ago

It seems this has been discussed before: https://github.com/linz/bde-processor-deployment/issues/243

billgeo commented 1 year ago

FYI - prdassgeo02 has 2 cores and 12 GB memory. So probably closer to a r6g.large than an r6g.xlarge.

cpu-memory-info.txt

billgeo commented 1 year ago

It seems this has been discussed before: https://github.com/linz/bde-processor-deployment/issues/243

That was for the BDE Processor itself though, which has much higher workloads and I'm sure uses more memory. So I don't think directly relevant to this.

linz / bde-fdw-rds

Test performance of complex queries #58

Tasks