koopjs / koop-pgcache

PostGIS cache for Koop.
Other
4 stars 4 forks source link

Full table scans while using spatial index? #34

Open PeaceNlove opened 8 years ago

PeaceNlove commented 8 years ago

I've a table created with Koop which contains around 65000 items. My performance was very low, even when I zoomed in to an area with a low amount of features.

I had a close look at the pgcache provider and after a second check on my database I found out the geom column is not used at all? Can you explain why you chose to do spatial part in the feature column as well, because I cannot understand why at this moment.

I analyzed one of the queries with the postgres explain command and got the following result: EXPLAIN select id, feature->>'properties' as props, feature->>'geometry' as geom from "WFS:rce:rce:NationalListedMonuments:0" WHERE ST_GeomFromGeoJSON(feature->>'geometry') && ST_SetSRID('BOX3D(5.4217529140341885 53.17641168037742,5.427246078096215 53.1797038995227)'::box3d,4326) ORDER BY id;

"Sort (cost=11127.26..11159.04 rows=12711 width=36)" " Sort Key: id" " -> Seq Scan on "WFS:rce:rce:NationalListedMonuments:0" (cost=0.00..10260.77 rows=12711 width=36)" " Filter: (st_geomfromgeojson((feature ->> 'geometry'::text)) && '0103000020E610000001000000050000000214F3FEDFAF15403DED6EA894964A400214F3FEDFAF1540F3B4918900974A40ED11F3FE7FB51540F3B4918900974A40ED11F3FE7FB515403DED6EA894964A400214F3FEDFAF15403DED6E (...)"

Then I refactored the whole pgcache provider in order to use the geometry column for my geometries and I got this result: EXPLAIN select id, feature->>'properties' as props, feature->>'geometry' as geom from "WFS:rce:rce:NationalListedMonuments:0" WHERE geom && ST_SetSRID('BOX3D(5.4217529140341885 53.17641168037742,5.427246078096215 53.1797038995227)'::box3d,4326) ORDER BY id; "Sort (cost=59.95..59.98 rows=14 width=36)" " Sort Key: id" " -> Bitmap Heap Scan on "WFS:rce:rce:NationalListedMonuments:0" (cost=4.39..59.68 rows=14 width=36)" " Recheck Cond: (geom && '0103000020E610000001000000050000000214F3FEDFAF15403DED6EA894964A400214F3FEDFAF1540F3B4918900974A40ED11F3FE7FB51540F3B4918900974A40ED11F3FE7FB515403DED6EA894964A400214F3FEDFAF15403DED6EA894964A40'::geometry)" " -> Bitmap Index Scan on wfsrcercenationallistedmonuments0_gix (cost=0.00..4.39 rows=14 width=0)" " Index Cond: (geom && '0103000020E610000001000000050000000214F3FEDFAF15403DED6EA894964A400214F3FEDFAF1540F3B4918900974A40ED11F3FE7FB51540F3B4918900974A40ED11F3FE7FB515403DED6EA894964A400214F3FEDFAF15403DED6EA894964A40'::geometry)"

I'm still using the geometry from the feature field, but doing all the smart spatial stuff on the geometry column

The calculations were done on different systems, but on the same data, but I think the differences are clear.

The first result was on a Windows Server 2012R2 with 8GB RAM and 4 CPU cores the second result was on my development laptop running Windows 10 with 8GB Ram and 2 CPU cores with hyperthreading (4 total).

I can make a pull request from my pgcache using the geometry column, but this will be a huge change for all existing systems and databases. Therefore I think it's better to have a discussion about this first before I make this PR.

chelm commented 8 years ago

@PeaceNlove Thanks for bringing this up. I think the original reason why the feature.geometry was used was because we sometimes have issues creating correct geometry types for "multi-geoms". So we could not guarantee that the geom column is populated (at least this is how it was, we should revisit this).

Using the geometry column will indeed be much faster and should be preferrer, but we need to be sure that it get populated correctly in for every possible geometry type.

(Also note that for Esri FeatureServices there is no concept of multigeom types, just arrays of geoms. this is where the conversion to postgis got messy...)

dmfenton commented 8 years ago

@PeaceNlove can you put your refactor in a branch so we can look at it?

PeaceNlove commented 8 years ago

https://github.com/PeaceNlove/koop-pgcache/tree/koop-pgcache-geometry

It's still work in progress and I can have a look at that it gets populated correctly in for every possible geometry type. I probably will have time to do this tomorrow.

PeaceNlove commented 8 years ago

I made a small adjustment in the branch and dropped the geometry type constraint (and the spatial reference constraint as well) This makes it possible to put Polygons, MultiPolygons, Lines etc in one table. I did some small tests with my own WFS provider and with the AGOL provider and both work fine and the spatial index is still used

dmfenton commented 8 years ago

@PeaceNlove did you do this just so you could store geojson directly? I'm not sure I understand the reasoning here. It's going to make it a lot more difficult to generate featureservices if we have multiple geometry types in the db.

chelm commented 8 years ago

@dmfenton I believe that the point of this has nothing to do with geojson (except making the selection queries faster by not using it) and more about storing both the geojson feature geoms (no change from how we do it now) as well as storing the same geometry but within an appropriate geometry column.

In the actual selection of data from the DB the query would now use the geometry column and avoid having to parse out the geojson based geom. Make sense? Was that your question?

PeaceNlove commented 8 years ago

The reason to do this is because when I tried to store features from AGOL, I got both Polygon and MultiPolygon GeoJSON features. The only way to store this is to drop the constraint. It is by the way perfectly possible to still use the geometry data from the featurecolumn as it was before and revert lines 350 and 352 in pg-cache. This way, you only use the geom column for fast spatial queries and nothing else.

But I think you should not be afraid of multiple geometry type in the db. You can put multiple geometry types in the database with the current cache as well, just load some messy geojson from github and you're done :-)

PeaceNlove commented 8 years ago

Well, I just went live with this experimental pg-cache provider.... http://geo-koop.rug.nl/wfs/rce/rce:NationalListedMonuments/FeatureServer/0 <-has 64000 features and performs almost like the real thing (Arcgis for Server)

Please do not ruin my Koop, I still need to block some things like register, drop etc, but the conference is next week so i need something and security is therefore not high on my list.

dmfenton commented 8 years ago

Awesome! I promise to dig into this more soon. Got a few fires on my hands with koop-agol right now.