redpanda-data / connect

Fancy stream processing made operationally mundane
https://docs.redpanda.com/redpanda-connect/about/
7.99k stars 789 forks source link

Reverse DNS Resolution Processor #1114

Open newlandk opened 2 years ago

newlandk commented 2 years ago

Would a DNS resolution processor be a good fit for Benthos?

Specifically IP Address -> Hostname resolution

This is a common use case for a few tools I've used in the past to provide quick enrichment of plain IP addresses addresses in logs. Ideally, it would be able to take advantage of a configurable local caching mechanism to reduce outbound DNS requests for successful and failed requests for a specified duration/cache size.

This is a telegraf example that accomplishes the same: https://github.com/influxdata/telegraf/tree/master/plugins/processors/reverse_dns

Jeffail commented 2 years ago

Hey @newlandk, that sounds like a reasonable addition. It would be cool to add this to bloblang so that you could easy do assignments, but we don't have any existing methods/functions that do network calls so we'd need to think about how that might impact mappings.

Error handling should be fine as we can easy add parameters for various aspects of it, it'd look something like this:

root.foo = this.bar.reverse_dns(timeout: "2s").catch("UNKNOWN")
newlandk commented 2 years ago

That would make assignments a lot simpler... 😄

After thinking about this a little bit more I find myself trying to explain a way that would use existing cache capabilities... and could be completely crazy, but here's a configuration that explains something I was pondering.

---
cache_resources:
  - label: dns_leveled_cache
    multilevel: [hot_cache, redis_cache]

  - label: hot_cache
    memory:
      ttl: 3600s

  - label: redis_cache
    redis:
      url: redis://localhost:6379
      kind: cluster
      expiration: 3600s
      retries: 3
      retry_period: 500ms

processors:
  - branch:
      processors:
        - dns_lookup:
            cache_resource: dns_leveled_cache
            servers:
              - 8.8.8.8
              - 8.8.4.4
            lookup_type: ptr
            key: '${! json("ip_address.v4") }'
      result_map: 'root.hostname = content().string()'

I'm still trying to get more familiar with the tool and code, so I'm not sure how feasible or if this is the right way to use cache resources. I am thinking from the perspective of future extensibility of a DNS lookup tool to provide other resources, like hostname to IP, was what I was thinking with regards to the processor vs. bloblang.

Thoughts?