opensearch-project / geospatial

Future home of Geospatial features for OpenSearch
Apache License 2.0
35 stars 36 forks source link

[RFC] Expose Geo2IP Enrichment feature to other plugins to standardize OpenSearch IP enrichment offering #698

Open andy-k-improving opened 4 days ago

andy-k-improving commented 4 days ago

The purpose of this RFC (request for comments) is to gather community feedbacks on a proposal to provide a way to expose the IP enrichment functionality as a registered Action on OpenSearch, which will allow other OpenSearch Plugins to leverage the enrichment, instead of reinventing the wheel.

Problem Statement

At the moment the GeoSpatial plugin do offer IP enrichment feature, however the scope is limited to Document ingestion and also within the GeoSpatial plugin itself. This segregation will force other Plugin within the eco-system which wish to perform IP conversion to re-invent the wheel to provide the same conversion, even when GeoSpatial plugin is present in the system. And this result in fragmentation and duplicate effort on the IP enrichment capability for OpenSearch platform as a whole.

Current State

The OpenSearch IP Enrichment under GeoSpatial plugin is only available during document ingestion, however this doesn't cover all the use case, one of the example would be the executing PPL command under the SQL plugin, which there is a need to convert given IP address into location information, such as City name, Country name....etc. Without the expose of this functionality from GeoSpatial plugin, team from the SQL Plugin will be forced to implement similar if not identical logic, to perform the same operations that GeoSpatial currently provided (Convert the IP string into location data).
This is not desirable from OpenSearch platform perspective, as this will cause further fragmentation on the IP enrichment functionality front.

Proposal

To expose the IP to Geo data conversion into a standalone function and make it accessible via NodeClient.execute( ) during OpenSearch runtime, which allow other plugin to dynamically check and leverage this conversion rather maintaining their own conversion feature.

Approach

To register an new transportAction during the plugin bootstrap time, which is:

  1. A standlone jar module which other plugin can be imported as an interface to perform the IP address lookup.
  2. The jar module should be self contain with all the necessary classes and objects, such as ActionType, ActionRequest and ActionResult.
  3. The jar module should be decoupled with the implementation and allow user (Other plugin) to interact with the CRUD operations of the dataset and perform IP String conversion as needed.

API Design

heemin32 commented 3 days ago

Hi @andy-k-improving, it's good to know that other plugins might need to use the geo2ip feature. For a plugin to leverage the geo2ip functionality, I believe it would require a datasource as an input parameter. Is DataSet provider datasource?

andy-k-improving commented 3 days ago

Hi @andy-k-improving, it's good to know that other plugins might need to use the geo2ip feature. For a plugin to leverage the geo2ip functionality, I believe it would require a datasource as an input parameter. Is DataSet provider datasource?

@heemin32 Yes, that is correct, I'm proposing to have ip and the datasource as input parameter, and return Map<String, Object> directly from ip2GeoCachedDao.getGeoData( ) or similar datasource implementation. As this is just a proxy method to access GeoSpatial and I prefer not to restrict what to return as this stage.

For now I have a prototype on a remote branch and I will push that by the of the day. Can I go ahead for the implementation, then we can discuss further over on the MR, or it's more preferable to have this thread hang for a while in order to reach more audiences?

heemin32 commented 3 days ago

Feel free to proceed with publishing the PR, and we can continue the discussion there.

Based on https://github.com/opensearch-project/sql/issues/3038, it seems it might be useful to have optional field name as parameter input as well.

andy-k-improving commented 3 days ago

Feel free to proceed with publishing the PR, and we can continue the discussion there.

Based on opensearch-project/sql#3038, it seems it might be useful to have optional field name as parameter input as well.

That make sense, will update the RFC accordingly. Cool, then I will proceed, thanks!