opensearch-project / geospatial

Future home of Geospatial features for OpenSearch
Apache License 2.0
33 stars 34 forks source link

[RFC] - Support New XY Cartesian Shape Field #71

Closed VijayanB closed 2 years ago

VijayanB commented 2 years ago

The purpose of this RFC (request for comments) is to gather community feedback on a new proposal to allow OpenSearch users to facilitate indexing and searching documents that contains X, Y cartesian planar shapes.

XYShape Field Type

The xy_shape field type supports the indexing and searching geometries whose vertices are unit less x, y values such as rectangles, line string, polygons, etc... This will be used by users who would want to index and query geometries whose coordinates represent from 2D planar system. This will be similar to geo_shape field except it will represent cartesian plane which is not based on earth-fixed terrestrial reference system. This new field will be based on Lucene field type XYShape.

Mapping Options

In order to index documents that contains xy_shape fields, users must explicitly map fields to the xy_shape type. This is similar to geo_shape field.

Example

The below mapping definition maps the geometry field to the xy_shape type using defaults.

PUT /xy_shape_index_example
{
  "mappings": {
    "properties": {
      "geometry": {
        "type": "xy_shape"
      }
    }
  }
}

The following options are accepted to be part of xy_shape field as inner type.

Option Description Default other values
orientation Defines how to interpret vertex order for polygons / multipolygons. ccw/right/counterclockwise cw/left/clockwise
ignore_malformed if true, invalid geometries will be ignored. if false, invalid geometries will be rejected and thrown an exception FALSE TRUE
ignore_z_value if true, points that contains Z dimension will be ignored and index only X,Y values. if false, it rejects any points contains Z value and throw an exception. TRUE FALSE
coerce if true, unclosed shapes will be closed automatically. if false, shapes are expected to have closed ring. FALSE TRUE

Overrides default value inside mapping

PUT /cw_xy_shape_index_example
{
  "mappings": {
    "properties": {
      "geometry": {
        "type": "xy_shape",
        "orientation" : "cw"
      }
    }
  }
}

Indexing structure

Shapes can be represented using either as GeoJSON or Well-Known Text(WKT) format. The following table provides a mapping of GeoJSON and WKT to OpenSearch types:

GeoJSON Type WKT Type OpenSearch Type Description
Point POINT point (X,Y) in cartesian plane
LineString LINESTRING linestring An array of list of point
Polygon POLYGON polygon A polygon is a closed boundary ring which is defined by a list of a list of points
MultiPoint MULTIPOINT multipoint List of disconnected points that are closely related.
MultiLineString MULTILINESTRING multilinestring List of different line string
MultiPolygon MULTIPOLYGON multipolygon List of different polygons
GeometryCollection GEOMETRYCOLLECTION geometrycollection This can contain disconnected and differnt types ex: linestring, point, polygon inside single field.
N/A BBOX envelope A bounding rectangle, or envelope, specified by specifying only the top left and bottom right points in the format [[minX, maxY], [maxX, minY]]

Note: The precision of shape is based on number of vertices and point.

The field should have type and coordinates fields as inner type to represent above mentioned type and coordinates values.

Example

Using point type

POST "xy_shape_index_example/_doc?pretty" 
{
  "geometry" : {
    "type" : "point",
    "coordinates" : [-200.123, 110.432]
  }
}

Using polygon type

POST "xy_shape_index_example/_doc?pretty" 
{
  "geometry" : {
    "type" : "polygon",
    "coordinates": [
               [
                   [-10.0, -10.0],
                   [10.0, -10.0],
                   [10.0, 10.0],
                   [-10.0, -10.0]
               ]
           ]
  }
}

Using Geometry Collection

Geometry collection would be helpful to represent multiple disjoint xy_shapes as part of one document.

POST "/xy_shape_index_example/_doc?pretty"
{
  "geometry" : {
    "type": "geometrycollection",
    "geometries": [
      {
        "type": "point",
        "coordinates": [1000.0, 0.0]
      },
      {
        "type": "linestring",
        "coordinates": [
                 [101.0, 0.0],
                 [102.0, 1.0]
               ]
      },
      {
       "type" : "multipolygon",
       "coordinates": [
           [
               [
                   [180.0, 40.0], [180.0, 50.0], [170.0, 50.0],
                   [170.0, 40.0], [180.0, 40.0]
               ]
           ],
           [
               [
                   [-170.0, 40.0], [-170.0, 50.0], [-180.0, 50.0],
                   [-180.0, 40.0], [-170.0, 40.0]
               ]
           ]
       ]
    }
   ]
  }
}

Indexing Strategy

This field type is indexed by encoding this geometry into a triangles and indexing each triangle as a 7 dimension point in a BKD tree, similar to geo_shape. The coordinates provided to the indexer are single precision floating point values.

XYShape Query Type

Queries documents that contain fields indexed using (i) xy_shape type, which either intersect, are contained by, are within or do not intersect with the specified xy_shape (ii) xy_point type, which intersect/contains/within with the specified xy_shape

The query supports two ways of defining the query for documents that contains xy_shape, either by providing a new target xy_shape definition, or by referencing pre-indexed xy_shape from another index.

Using new target shape definition

Here the target xy_shape is defined as an envelope to search for documents, that contains xy_shape within this boundary

GET /xy_shape_index_example/_search
{
  "query": {
    "xy_shape": {
      "geometry": {
        "query_shape": {
          "type": "envelope",
          "coordinates": [ [ 10.0, 10.0], [ 200.0, 300.0] ]
        },
        "relation": "within"
      }
    }
  }
}

Using pre-index shape from another index

Here the target xy_shape is referenced from a field in separate index with default spatial relationship ( intersects)

PUT /xy_shape_index_predefined
{
  "mappings": {
    "properties": {
      "geometry": {
        "type": "xy_shape"
      }
    }
  }
}

PUT /xy_shape_index_predefined/_doc/object1
{
  "geometry": {
    "type": "envelope",
    "coordinates" : [[13.0, 53.0], [14.0, 52.0]]
  }
}

GET /xy_shape_index_example/_search
{
  "query": {
    "bool": {
      "filter": {
        "xy_shape": {
          "geometry": {
            "indexed_shape": {
              "index": "shape_index_predefined",
              "id": "object1",
              "path": "geometry"
            }
          }
        }
      }
    }
  }
}

Spatial Relations between xy_shapes

The following is a complete list of spatial relation operators available:

Hagaygur commented 2 years ago

I'll ask here, since I haven't really managed to figure out the use case for the field. We're indexing multiple type of coordinates, some of which are Cartesian, and some require conversion to Cartesian.

Thing is, when using grid based system.(such as Mercator systems) , you usually need , well, the grid. And as far as UTM goes, the zone. I may be misunderstanding the field in lucene, but I see none of these parameters. Is there an implied underlying grid? What is the use case there precisely?

Note that supporting everything that lucene has is obviously a priority, but I really can't figure out the use of this field.

nknize commented 2 years ago

Note that supporting everything that lucene has is obviously a priority, but I really can't figure out the use of this field.

This isn't just for geospatial applications. If you think of anything prefixed w/ geo implies geographic (e.g., locations on an ellipsoid), this field is for cartesian locations. So for geospatial applications it can be used for projections to a local custom grid, when used in conjunction w/ a Z field it can represent ENU, ECF. Add a zone field to a boolean query and you can absolutely use mercator grid.

Beyond that it works for other spatial applications like virtual worlds, CAD documents, amusement park mapping, VR, and sporting venus (e.g., MLB, NHL, NFL. NBA, MLS analytics).

Think of it as a field that uses euclidean distances instead of haversine; so sorting is much faster w/o having to hijack the geo field for non-geographic spatial applications.

VijayanB commented 2 years ago

Development is completed. Please open a bug report or enhancement in future.