opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
22 stars 33 forks source link

[FEATURE]Add `iplocation` function to PPL for IP address geolocation #672

Open YANG-DB opened 1 month ago

YANG-DB commented 1 month ago

Description: We propose adding an geoip function to OpenSearch's Piped Processing Language (PPL) and SQL to provide built-in IP address geolocation capabilities. This feature would be similar to functionality used in OpenSearch's geospatial feature, enhancing PPL's ability to enrich log data with geographical information based on IP addresses.

Proposed Functionality:

  1. The 'geoip' function should take an IP address as input and return geographical information.
  2. It should support both IPv4 and IPv6 addresses.
  3. The function should return multiple fields including country, region, city, latitude, longitude, and others as available.
  4. It should allow users to specify which geolocation fields to include in the output.
  5. The function should use a regularly updated IP geolocation database for accuracy.

Example Usage:

... | eval geolocation = geoip(ip_field)

This would add a new field 'geolocation' with all available location information for the IP address in 'ip_field'.

... | eval country = geoip(ip_field, "country")
... | eval lat = geoip(ip_field, "lat"), lon = iplocation(ip_field, "lon")

This would add new fields with specific geolocation information.

... | eval location_info = geoip(ip_field, "country,region,city,lat,lon")

This would add a new field 'location_info' with multiple pieces of geolocation data.

Additional considerations

dblock commented 1 month ago

[Catch All Triage - 1, 2, 3, 4]