vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.53k stars 1.54k forks source link

New `extract_subnet` transform #1978

Closed ghost closed 3 years ago

ghost commented 4 years ago

It would be nice to have a transform which would take a string containing an IP address (IPv4 or IPv6) and extract a specified subnet for it.

For example, if the IP is 1.2.3.4 and the subnet parameter is set to /24, then the output would be 1.2.3.0. There could also be other ways to configure it, for example by specifying the mask like 255.255.255.0 which would be applied to the address. It is also desirable to have ability to configure these settings differently for IPv4 and IPv6.

This might be useful for anonymizing IP addresses from logs of a public webserver before they are stored anywhere, for example to avoid storing personally identifiable information about the requesters.

binarylogic commented 4 years ago

I like it, but I think this falls below the "fundamental threshold" outlined in #1926. I would love if we could support a simple syntax that serves as a catch-all for these types of transformations. My hope is that:

  1. WASM will be fast enough to serve this purpose with a variety of languages.
  2. If not, we can offer a minimal Rust-native scripting language that does not incur a large performance hit. Even if WASM is fast there is a still a case for a minimal and safe syntax.
ghost commented 4 years ago

If not, we can offer a minimal Rust-native scripting language that does not incur a large performance hit. Even if WASM is fast there is a still a case for a minimal and safe syntax.

Ideally we want a scripting language with JIT which would be easy to write, but then compile loops into efficient machine code on the fly. Julia language can serve as an example of this approach made really high performant, although Julia interpreter as a whole is probably too large to be embedded in Vector (it is interesting though is it possible at all to make a stripped-down version of it which would be small enough to be embeddable, kind of like mRuby or MicroPython).

On the similar lines of thought, if there is a WASM runtime that has native or near-native performance, then it might be possible to build a scripting language which would be JIT-compiled to WASM. This would allow to have a portable JIT implementation which doesn't depend on any particular platform. This approach is used by Jython and JRuby which take source in the corresponding scripting language, compile it on the fly into the JVM bytecode, and then let the JVM efficiently run it. The same can be done on top of WASM, with the difference that WASM runtime might be expected to be smaller than JVM.

lukesteensen commented 4 years ago

if there is a WASM runtime that has native or near-native performance, then it might be possible to build a scripting language which would be JIT-compiled to WASM

I think this is part of the appeal of WASM. Instead of writing our own JIT-ed language, we would simply use a WASM runtime with a JIT. This is similar to how languages target the JVM to take advantage of its advanced runtime. WASM is not as mature at this point, but as the runtimes get better the same code should run faster and faster.