Open billgeo opened 1 year ago
Just want to point out that it would be unfair to compare timing with a query run locally since fdw needs to fetch data from a remote server prior to execution. That said, the response should be within an acceptable time frame (which it currently isn't). We need to figure out what the bottle neck is and adjust accordingly. One thing to try, is to increase rds instance size. Currently the sizing of bde processor (and even the bde replica) is multiples of the bde_analytics db.
On db.r6g.xlarge
rds instance and use_remote_estimate
set to false
, the query above (provided by Chris) took about 38 minutes to run. When use_remote_estimate
is set to true
, query takes much much longer...
Edit: Prod bde runs on 4xlarge (though different instance family). To get a like for like comparison, we should test it on similar instance size (before weighing up response vs costs).
IMHO we should instead look at why the query takes such a long time to complete.
The 4 mins comes from prdassgeo02. Can get the specs of that if you like? But from memory I'd say it'd be closed to a r6g.large
.
How long does it take for a r6g.large
? xlarge
is quite expensive ~$800USD per month. So probably not justified to save 10 minutes on the odd query.
The query is doing quite a lot of geospatial processing, so 4 minutes doesn't seem unreasonable from what I can see.
I certainly wouldn't want to add a large expense for the sake of one particularly complex query run quite infrequently. Certainly I'd like to be able to run a query like this to completion, but it isn't a regular requirement (at least not this large) so definitely not worth adding $100s taxpayer dollars per month indefinitely. I'd be fine with it taking one hour. It is deliberately large to test capability. Note that this query is testing geometries in parcels updated in the last 20 BDE updates.
delete from tmp_pargeom_status where revision_created > (select max(revision_checked) from tmp_pargeom_status)-20;
This can be reduced by changing 20 to a smaller number. A likely scenario would be a weekly run each weekend. (Also this is not a fixed size task as the data is continually changing)
Is the data available on prdassgeo02
identical to the production bde processor (i.e. do both db contain the same amount of dataset)? Just checking if there isn't another process removing data that isn't required for analytics.
I think there is the same data - though bde_analytics table_version tables may be much smaller depending on when they are populated from.
It seems this has been discussed before: https://github.com/linz/bde-processor-deployment/issues/243
FYI - prdassgeo02 has 2 cores and 12 GB memory. So probably closer to a r6g.large
than an r6g.xlarge
.
It seems this has been discussed before: https://github.com/linz/bde-processor-deployment/issues/243
That was for the BDE Processor itself though, which has much higher workloads and I'm sure uses more memory. So I don't think directly relevant to this.
So that complex queries are completed in a reasonable level of time, we want to test a complex query provided by Chris Crook.
Tasks
His query typically takes about ~4 minutes on prdassgeo02 (a non FDW database with BDE data in it). It would be useful to investigate if we can tune the FDW database can do the same query in under 10 mins. 20-30 mins may still be acceptable.
Test on Wednesday March 1st
Tested again by @ccrook on Friday 3rd