What format of the geometry is faster for T-Rex?

dmitrykinakh commented 2 years ago

Hi there,

in the project we are building at the moment the biggest problem is tiles generation speed.

So, these are input parameters:

T-Rex version 0.14.1

Config in toml.

# t-rex configuration

[service.mvt]
viewer = true

[[datasource]]
dbconn = "postgresql://{{USER}}:{{PASS}}@{{DBHOST}}.c4epkxdwfidz.us-east-1.rds.amazonaws.com/{{DBNAME}}"
name = "shared"

[grid]
# Predefined grids: web_mercator, wgs84
predefined = "web_mercator"

# ************** Global Settings **************

[[tileset]]
name = "municipal_zoning_code"
extent = [-198.808594,-16.499992,-61.699219,70.958956]
[[tileset.layer]]
datasource = "shared"
name = "municipal_zoning_code"
table_name = "land_use_city"
geometry_field = "geom"
geometry_type = "MULTIPOLYGON"
srid = 3857
#fid_field = "id"
#tile_size = "4096"
buffer_size = 3
simplify = false
minzoom = 10
maxzoom = 18
#query_limit = 1000
[[tileset.layer.query]]
sql = """SELECT Sieve(geom, ZRes(!zoom!::int - 1) * 3) as geom,id,title,land_use_type,place,jurisdiction,uniqueid,color FROM land_use_city WHERE geom && !bbox! and land_use_type='Municipal Zoning Code'"""

[cache.s3]
endpoint = "https://s3.us-east-1.amazonaws.com"
bucket = "{{BUCKET}}"
access_key = "{{AKEY}}"
secret_key = "{{SKEY}}"
region = "us-east-1"
baseurl = "https://st1-tiles.{{DOMAIN}}.com"

[webserver]
# Bind address. Use 0.0.0.0 to listen on all adresses.
bind = "0.0.0.0"
port = 6767
threads = 4
cache_control_max_age = 2592000

Number of records in DB - 13231 Number of records that match query in selected bbox - 60

Generation logs for:

time docker-compose run --name TL_01 --rm trex-master sh -c 't_rex generate --tileset municipal_zoning_code --overwrite true --minzoom 10 --maxzoom 18 --extent -81.658619,28.346742,-80.863155,28.786278 --config /storage/trex/config.toml;'

Timing

Z10 - 0m 20.023s
Z11 - 0m 14.679s
Z12 - 0m 31.691s
Z13 - 1m 55.904s
Z14 - 6m 17.114s
Z15 - 19m 56.263s
Z16 - 73m 25.188s
Z17 - 284m 10.441s
Z18 - 1130m 6.336s

Generation was performed on ec2 r5.4xlarge (16 vCPU 128 GB RAM) and with DB db.r6g.4xlarge (16 vCPU 128 GB RAM)

With bigger zoom levels, like 17 or 18 this takes hours. We are wondering if the issue is in data that we store as multipolygons. Those multipolygons contain from 1 to 2-3K individual polygons.

From the performance perspective, would it be better to have these data represented in DB as polygons, so instead of 60 records, in our case, we will have several thousand? Looking forward to recommendations.

Note: this is a part of the query that takes the most of the time:

SELECT ST_Multi(ST_Buffer(ST_Intersection(geom,ST_MakeEnvelope($1-?*$6::FLOAT8,$2-?*$6::FLOAT8,$3+?*$6::FLOAT8,$4+?*$6::FLOAT8,?)), ?)) AS geom,"id","title","land_use_type","place","jurisdiction","uniqueid","color" FROM (SELECT Sieve(geom, ZRes($5::int - ?) * ?) as geom,id,title,land_use_type,place,jurisdiction,uniqueid,color FROM land_use_city WHERE geom && ST_MakeEnvelope($1-?*$6::FLOAT8,$2-?*$6::FLOAT8,$3+?*$6::FLOAT8,$4+?*$6::FLOAT8,?) and land_use_type=?) AS _q

pka commented 2 years ago

This is really extremely slow. Is there an index on the geometry field (CREATE INDEX ON land_use_city USING GIST(geom);) or could the Sieve function on the geometry be slow? I would recommend using EXPLAIN to find out which part of the query is slow.

dmitrykinakh commented 2 years ago

Is there an index on the geometry field (CREATE INDEX ON land_use_city USING GIST(geom);)

There is an index for the geometry column. I've performed tests with it and without it for zoom levels 12 and 13 using the smaller machine. Generation time was the same. The blue color and first execution is zoom level 12 and no index. The red color and first execution is zoom level 13 and no index. The second execution for both zoom levels was with the index field added.

or could the Sieve function on the geometry be slow?

I don't think that this is the case since the same performance was before we started using that function.

I would recommend using EXPLAIN to find out which part of the query is slow

I'll try to get this information for you and post it here.

dmitrykinakh commented 2 years ago

Here is a log from DB for all of the queries that are longer than 4 seconds postgresql.log.2021-11-25-11(more than 4 seconds).txt .

pka commented 2 years ago

Geometry index usage looks fine. How's the timing without using the Sieve function?

dmitrykinakh commented 2 years ago

This is a log without using the Sieve function postgresql.log.2021-11-26-13.txt

Also, this is a generation timing using ec2 r5.4xlarge (16 vCPU 128 GB RAM) and with DB db.r6g.4xlarge (16 vCPU 128 GB RAM) without the Sieve function.

Zoom	With Sieve	Without Sieve
Z10	0m 20.023s	0m 22.549s
Z11	0m 14.679s	0m 8.360s
Z12	0m 31.691s	0m 15.165s
Z13	1m 55.904s	0m 20.980s
Z14	6m 17.114s	0m 37.817s
Z15	19m 56.263s	2m 13.246s
Z16	73m 25.188s	5m 19.229s
Z17	284m 10.441s	11m 45.406s
Z18	1130m 6.336s	34m 49.984s

Looks like I know the answer to why generation is so slow. However, still, wondering what it's better - polygon or multipolygon?

t-rex-tileserver / t-rex

What format of the geometry is faster for T-Rex? #272