koopjs / koop-provider-elasticsearch

A provider for koop that can connect to one or more elastic search instances and turn indices/aliases into individual feature services.
Apache License 2.0
13 stars 4 forks source link

Leveraging ES built aggregation functionality #13

Closed keithfraley closed 3 weeks ago

keithfraley commented 6 years ago

Ok, I will post some of my thoughts here on this, starting with the config file. I think a config file would need to following changes in order to trigger results as an aggregation. basically the aggregation feature would return a hash polygon leveraging the geojson that is returned from ES.

Getting the spatial results back should be very straight forward. The hard part will be in ensure that all the attribute fields still come back across because of the need for filtering and time series interactions.

{
  "appInfo": {
    "protocol": "http",
    "listenPort": 80
  },
  "esConnections": {
    "firstESCluster": {
      "id": "clusterID",
      "protocol": "http://",
      "port": 9200,
      "hosts": [
        "escluster.mynetwork.com"
      ],
      "indices": [
        {
          "index": "indexOrAliasName",
          "maxResults": 6000,
          "geometryField": "geometry",
          "geometryType": "geo_point/Point/Polyline/Polygon",
          "returnFields": [
            "fieldFromIndex", "SeenInFeatureService", "SomeDateField"
          ],
          "dateFields": [
            "pickup_date", "SomeDateField"
          ],
          **"aggregation": {
               "type": "count",  (sum/average/max/min/count)
               "output": "geo_point", (hash centroid geo_point, hash polygon, turf hexagon)
               "field": "GEOMETRY", (GEOMETRY for count, field for all other types)
               "resolution": 9  //this might not be needed because it can be calulated on the fly
            },**
          "timeInfo": {
            "startTimeField": "start_date",
            "endTimeField": "end_date",
            "timeExtent": [1438401615000, 1439833466000],
            "timeInterval": 1,
            "timeIntervalUnits": "esriTimeUnitsDays"
          }
        }
      ]
    }
  }
}
dhatcher commented 3 years ago

@keithfraley Check out the branch https://github.com/koopjs/koop-provider-elasticsearch/tree/koop-4.x-update to see upcoming changes that take advantage of built in ES geohash aggregation.

keithfraley commented 3 years ago

Exciting

On Tue, Nov 3, 2020, 1:39 PM Danny Hatcher notifications@github.com wrote:

@keithfraley https://github.com/keithfraley Check out the branch https://github.com/koopjs/koop-provider-elasticsearch/tree/koop-4.x-update to see upcoming changes that take advantage of build in ES geohash aggregation.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/koopjs/koop-provider-elasticsearch/issues/13#issuecomment-721335876, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6LCVFUJWKFFAM7GFVO65LSOBL7TANCNFSM4FIR2ZWQ .

keithfraley commented 3 years ago

noticed that the geohash seems to be having some issues when aggregating large areas, it would appear that the precision is decided on the width of the bounding box? So that as we get closer to the equator the precision factor is skewed (see screenshots) Screenshot 2020-11-09 at 11 58 28 AM

I also noticed that we do on occasion get overlapping aggregation, I think this can be handled, potentially by reducing the numebr Screenshot 2020-11-09 at 11 58 58 AM of the records brought back.

keithfraley commented 3 years ago

In addition, a couple enhancements as we move forward, the aggregation component if es also provides a centroid type geo_point. The really cool part about this is that it is actually the centroid of the records with the geohash, not the geohash itsself. Gives a more accurate location than the polygons, its really useful when building historical movements

The other enhancements is the ability to do more than just counts for aggregations, for example if we want to sum a field for that geohash, that is a pretty easy enhancements by adding that param to the config.

something like

if (indexConfig.aggregation.type !== 'count') {
                esQuery.body.aggs["2"].aggs["1"] = {
                    [indexConfig.aggregation.type]: {
                        "field": indexConfig.aggregation.field
                    },
                }
            }
dhatcher commented 3 years ago

The aggregation scale is determined based on the height of the bounding box that's right, and since we treat every query separately you can have issues like the above show up sometimes. If you want to post those enhancements as new issues we can mark them as new enhancements and track them.