opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
241 stars 178 forks source link

Add GeoIP commercial databases #4407

Open lduriez opened 3 months ago

lduriez commented 3 months ago

Hello

Is your feature request related to a problem? Please describe. The documentation is not clear if we can you GeoIP commercial database (other than GeoIP enterprise database). Today I have subscription for GeoIP2-Country, GeoIP2-ISP and GeoIP2-Anonymous, I wanted to use at least GeoIP2-Country but didn't work. It seems that geoip processor only works with Geo2Lite.

I have the following configuration:

extensions:
  geoip_service:
    maxmind:
      databases:
        country: "/usr/share/GeoIP/GeoIP2-Country.mmdb"
      database_refresh_interval: PT1H

In case you can access database samples here: https://github.com/maxmind/MaxMind-DB/tree/main/test-data

Describe the solution you'd like Have the possible to use GeoIP2 database with configuration like:

extensions:
  geoip_service:
    maxmind:
      databases:
        geoip2-country: "/usr/share/GeoIP/GeoIP2-Country.mmdb"
        geoip2-isp: "/usr/share/GeoIP/GeoIP2-ISP.mmdb"
      database_refresh_interval: PT1H

Describe alternatives you've considered (Optional) Or similar to what it's done in logstash, determine the database in the processor like:

  processor:
    - geoip:
        entries:
          - source: "/clientIp"
        database: "/usr/share/GeoIP/GeoIP2-Country.mmdb"

Additional context It's seems to don't work ever when providing GeoLite2-Country.mmdb by localpath directly.

data-prepper-config.yaml:

extensions:
  geoip_service:
    maxmind:
      databases:
        country: "/usr/share/GeoIP/GeoLite2-Country.mmdb"
      database_refresh_interval: PT1H

pipeline.yaml:

version: "2"
test-pipeline:
  source:
    http:
  processor:
    - parse_json:
        source: "message"
    - geoip:
        entries:
          - source: "/clientIp"
  sink:
    - stdout:

input data sample:

{"clientIP":"185.126.231.50"}
dlvenable commented 3 months ago

@lduriez , Data Prepper does not yet support commercial databases. This would be a great addition and we plan to use the same configurations you have shared.

If you are interested in working on adding this support, I'd be happy to help get you started.