tobinbradley / dirt-simple-postgis-http-api

Dirt Simple PostGIS HTTP API
368 stars 101 forks source link

Suggestions for improving performance #54

Closed gagecarto closed 2 years ago

gagecarto commented 2 years ago

I have an AWS Postgres instance db.m6g.large (2vCPU and 8GiB memory) that I am working with. I am doing some tests with a pretty big dataset. This dataset has global coverage and 400million polylines. They have a unique ID which is indexed and I can easily query on that and make my selections in just a second or so some selections result in . The bottleneck comes when waiting for Postgres to build and send those tiles back.

I add a filter on the indexed unique ID to my tile calls. e.g. https://tiles.myserver.xyz/v1/mvt/mylinestable/5/28/16?&filter=uid = '25790'

This sometimes takes at least a minute or two. Some of these selections could contain 15,000 lines so I am guessing this is one of the issues.

Any quick suggestions for improving performance? The tables have already been spatially indexed. I am just thinking I need to increase class of my database on AWS but am hoping there are some cheaper things to try first.

Thanks!

tobinbradley commented 2 years ago

I don't think your CPU/memory config is a bottleneck. Are you super sure you have a spatial index on the geometry column? Any chance the DB doesn't auto vacuum? Does running vacuum anaylze <table> make a difference?

When you say "the tables" have been spatially indexed, are you doing a query on a view with multiple tables with geometry columns? You could be hitting a bottleneck there, as there isn't a single geometry index to deal with. In that case you could make a materialized view so you could create a spatial index on the joined geometry column of the view.

I can get a tile with 54k lines from my roads layer in 1.375 seconds with a similarly configured VM. The number of lines isn't the whole story - the complexity/vertices per line makes a big difference - but the performance you're seeing is weird.

I would imagine you can EXPLAIN ANALYZE a query from the tooling AWS gives you if you can't run PGAdmin, so I'd try that to pinpoint where the bottleneck is. The SQL query being run from Dirt based on your tile call should look like this:

WITH mvtgeom as (
      SELECT
        ST_AsMVTGeom (
          ST_Transform(geom, 3857),
          ST_TileEnvelope(5, 28, 16)
        ) as geom
      FROM
        mylinestable,
        (SELECT ST_SRID(geom) AS srid FROM mylinestable
   LIMIT 1) a
      WHERE
        ST_Intersects(
          geom,
          ST_Transform(
            ST_TileEnvelope(5, 28, 16),
            srid
          )
        )
        AND uid = '25790'
    )
    SELECT ST_AsMVT(mvtgeom.*, 'mylinestable', 4096, 'geom' 
  ) AS mvt from mvtgeom;

Neither here nor there, but 15k is a lot of lines to fetch for a single 256x256 tile. Unless there's a business reason not to I'd limit the minzoom on the client to something more reasonable for the geometry. It shouldn't be eating 60+ seconds, but I'd consider >1 second to generate a vector a bridge to far, and I'd be looking at pre-building tiles or putting a varnish cache between the client and dirt if I couldn't get that time down to 500ms or less.

gagecarto commented 2 years ago

This is all very helpful! I'll do some more digging and vacuuming and see it that speeds things up.

gagecarto commented 2 years ago

@tobinbradley - also on this note.. As I am noodling around with performance improvements, to what effect does projection have on speed of tile generation?

I currently have my table and geom column in decimal degrees, 4326. These tiles are being added to a mapbox gl js map. Is 4326 with decimal degrees the ideal geom in this use case?

tobinbradley commented 2 years ago

The most ideal scenario would be 3857, as that would entail no ST_TRANSFORM operations, but the performance impact is fairly negligible. Most of the data I deal with is in US State Plane projections and I haven't run into any bottlenecks there.

The mvt service takes the tile bounding box, projects it to the layer's native SRID, gets the subset of layer geometry via intersection, then projects just that portion of the layer geometry to 3857 for tile rendering. That way the whole layer doesn't have to be transformed to 3857 see what falls in the tile's bounding box.

It might be worth storing your data in 3857 and giving it a shot. Even if it doesn't give a noticeable performance bump you'll get distance units in meters instead of degrees :).