koopjs / koop

Transform, query, and download geospatial data on the web.
http://koopjs.github.io
Other
651 stars 125 forks source link

recommended provider for large polygon dataset #464

Closed gis-pl closed 1 year ago

gis-pl commented 1 year ago

Hi, I have vector layer with 300k polygons (WGS 84) and tried to connect it to ArcGIS Online Viewer as a feature service using Koop. I used Geojson and PostGIS provider alternatively, but response from Koop is none (with short TTL) or very slow (with TTL=1000). Which provider do you recommend for such dataset? Geojson file with these polygons has about 250MB. Thanks!

rgwozdz commented 1 year ago

Hello @gis-pl. When you say the "response from Koop is none", you mean you are getting a timeout error on the client? Can you copy/paste the service request that is failing here so we can take a look?

Koop providers put their data into memory before transforming into the output format. Some of the less sophisticated providers put the entire dataset into memory, even if the request is only asking for a portion of it. I don't know the details of the PostGIS provider or the request your client is making, but just think of your 250 MB file going into memory and then being looped over to convert it to Esri JSON. I imagine it would be on the slow side. If your requests only a need portion of the 300k polygons (for example, content of a small bounding box), it's possible that you can use a provider that only captures the data that it needs to fulfill the request, as opposed to loading the entire dataset into memory and having Koop filter it. The geojson providers don't do this...they are just file stores and too primitive. The PostGIS provider should be able to do this, though I haven't looked at that one. But if your request is for the entire dataset, you'll run into the same problem with latency and timeouts. Even large, complex, ArcGIS Online polygon datasets can have long delivery times if the format is JSON. It's a huge amount of coordinate data to serialize and send. Two other things you can try. Simplify and or reduce the precision of your polygon geometries, that might help a bit. You could also beef up the server running koop.

Generally speaking, delivering and visually rendering this much vector data, especially in a single request, can be problematic, even if you are not using a in-memory transformer like Koop. For a use case like this, you probably want to use vector tiles.

gis-pl commented 1 year ago

Hi, I mean that when I try to connect data to ArcGIS Online Viewer as a feature service (for example when TTL=100) no features appear in the app and the nodejs server is stopped. 

I need to display only a portion of the data, not the whole dataset, so I tested PostGIS, but it looks like geojson is still built on the database side which takes a long time and stops it. Maybe a better solution is to try to write my own provider which allows building geojson after the answer from the database is generated for a smaller dataset?

rgwozdz commented 1 year ago

@gis-pl - from the sounds of it, possibly the PostGIS provider is not selecting just the portion of the dataset you need; instead it might be trying to fetch the entire dataset and have Koop do the filtering in memory. As noted above, this isn't performant for large datasets. We don't maintain the PostGIS provider here. Can you link me to the repository for the version of the PostGIS provider you are using? I can take a look and confirm what I suspect.

gis-pl commented 1 year ago

@rgwozdz - thanks for the reply but I've abandoned this way of loading data. I used: https://github.com/doneill/koop-provider-pg and this query https://github.com/doneill/koop-provider-pg/blob/main/src/db/sql/createGeoJson.sql but this caused the database to freeze and had to be restarted.

rgwozdz commented 1 year ago

@gis-pl - thank you for the information. I looked at the PG provider. The query there is unlikely to work very with complex or large data; It selects all records from the source table and also does a series of json_build_object operations. As you can imagine, for a large dataset with complex geometry, this will like create bottlenecks in the database as well as downstream.