t-rex-tileserver / t-rex

t-rex is a vector tile server specialized on publishing MVT tiles from your own data
https://t-rex.tileserver.ch/
MIT License
556 stars 69 forks source link

What format of the geometry is faster for T-Rex? #272

Open dmitrykinakh opened 2 years ago

dmitrykinakh commented 2 years ago

Hi there,

in the project we are building at the moment the biggest problem is tiles generation speed.

So, these are input parameters:

T-Rex version 0.14.1

Config in toml.

# t-rex configuration

[service.mvt]
viewer = true

[[datasource]]
dbconn = "postgresql://{{USER}}:{{PASS}}@{{DBHOST}}.c4epkxdwfidz.us-east-1.rds.amazonaws.com/{{DBNAME}}"
name = "shared"

[grid]
# Predefined grids: web_mercator, wgs84
predefined = "web_mercator"

# ************** Global Settings **************

[[tileset]]
name = "municipal_zoning_code"
extent = [-198.808594,-16.499992,-61.699219,70.958956]
[[tileset.layer]]
datasource = "shared"
name = "municipal_zoning_code"
table_name = "land_use_city"
geometry_field = "geom"
geometry_type = "MULTIPOLYGON"
srid = 3857
#fid_field = "id"
#tile_size = "4096"
buffer_size = 3
simplify = false
minzoom = 10
maxzoom = 18
#query_limit = 1000
[[tileset.layer.query]]
sql = """SELECT Sieve(geom, ZRes(!zoom!::int - 1) * 3) as geom,id,title,land_use_type,place,jurisdiction,uniqueid,color FROM land_use_city WHERE geom && !bbox! and land_use_type='Municipal Zoning Code'"""

[cache.s3]
endpoint = "https://s3.us-east-1.amazonaws.com"
bucket = "{{BUCKET}}"
access_key = "{{AKEY}}"
secret_key = "{{SKEY}}"
region = "us-east-1"
baseurl = "https://st1-tiles.{{DOMAIN}}.com"

[webserver]
# Bind address. Use 0.0.0.0 to listen on all adresses.
bind = "0.0.0.0"
port = 6767
threads = 4
cache_control_max_age = 2592000

Number of records in DB - 13231 Number of records that match query in selected bbox - 60

Generation logs for:

time docker-compose run --name TL_01 --rm trex-master sh -c 't_rex generate --tileset municipal_zoning_code --overwrite true --minzoom 10 --maxzoom 18 --extent -81.658619,28.346742,-80.863155,28.786278 --config /storage/trex/config.toml;'

Timing

Z10 - 0m 20.023s
Z11 - 0m 14.679s
Z12 - 0m 31.691s
Z13 - 1m 55.904s
Z14 - 6m 17.114s
Z15 - 19m 56.263s
Z16 - 73m 25.188s
Z17 - 284m 10.441s
Z18 - 1130m 6.336s

Generation was performed on ec2 r5.4xlarge (16 vCPU 128 GB RAM) and with DB db.r6g.4xlarge (16 vCPU 128 GB RAM)

With bigger zoom levels, like 17 or 18 this takes hours. We are wondering if the issue is in data that we store as multipolygons. Those multipolygons contain from 1 to 2-3K individual polygons. image

From the performance perspective, would it be better to have these data represented in DB as polygons, so instead of 60 records, in our case, we will have several thousand? Looking forward to recommendations.

Note: this is a part of the query that takes the most of the time:

SELECT ST_Multi(ST_Buffer(ST_Intersection(geom,ST_MakeEnvelope($1-?*$6::FLOAT8,$2-?*$6::FLOAT8,$3+?*$6::FLOAT8,$4+?*$6::FLOAT8,?)), ?)) AS geom,"id","title","land_use_type","place","jurisdiction","uniqueid","color" FROM (SELECT Sieve(geom, ZRes($5::int - ?) * ?) as geom,id,title,land_use_type,place,jurisdiction,uniqueid,color FROM land_use_city WHERE geom && ST_MakeEnvelope($1-?*$6::FLOAT8,$2-?*$6::FLOAT8,$3+?*$6::FLOAT8,$4+?*$6::FLOAT8,?) and land_use_type=?) AS _q

image

pka commented 2 years ago

This is really extremely slow. Is there an index on the geometry field (CREATE INDEX ON land_use_city USING GIST(geom);) or could the Sieve function on the geometry be slow? I would recommend using EXPLAIN to find out which part of the query is slow.

dmitrykinakh commented 2 years ago

Is there an index on the geometry field (CREATE INDEX ON land_use_city USING GIST(geom);)

There is an index for the geometry column. I've performed tests with it and without it for zoom levels 12 and 13 using the smaller machine. Generation time was the same. The blue color and first execution is zoom level 12 and no index. The red color and first execution is zoom level 13 and no index. The second execution for both zoom levels was with the index field added. image

or could the Sieve function on the geometry be slow?

I don't think that this is the case since the same performance was before we started using that function.

I would recommend using EXPLAIN to find out which part of the query is slow

I'll try to get this information for you and post it here.

dmitrykinakh commented 2 years ago

Here is a log from DB for all of the queries that are longer than 4 seconds postgresql.log.2021-11-25-11(more than 4 seconds).txt .

pka commented 2 years ago

Geometry index usage looks fine. How's the timing without using the Sieve function?

dmitrykinakh commented 2 years ago

This is a log without using the Sieve function postgresql.log.2021-11-26-13.txt

Also, this is a generation timing using ec2 r5.4xlarge (16 vCPU 128 GB RAM) and with DB db.r6g.4xlarge (16 vCPU 128 GB RAM) without the Sieve function.

Zoom With Sieve Without Sieve
Z10 0m 20.023s 0m 22.549s
Z11 0m 14.679s 0m 8.360s
Z12 0m 31.691s 0m 15.165s
Z13 1m 55.904s 0m 20.980s
Z14 6m 17.114s 0m 37.817s
Z15 19m 56.263s 2m 13.246s
Z16 73m 25.188s 5m 19.229s
Z17 284m 10.441s 11m 45.406s
Z18 1130m 6.336s 34m 49.984s

Looks like I know the answer to why generation is so slow. However, still, wondering what it's better - polygon or multipolygon?