vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.13k stars 1.6k forks source link

Add PROXY(haproxy-style) protocol support for socket and syslog sources #6769

Open rwaweber opened 3 years ago

rwaweber commented 3 years ago

Current Vector Version

0.12.1

Use-cases

Similar in motivation to https://github.com/timberio/vector/issues/6763 but likely a bit more complicated in implementation.

The motivation is roughly the same, in terms of wanting to retain the source address of an event producer that ships to a loadbalancer, whose address then gets set as the backing address.

A loose overview of the PROXY protocol, as well as piece of software that support it, is available from the HAProxy folks at [1]. Alot more detail can found with the spec at [2].

I think that this is also the same protocol that is used by both AWS NLBs[3] and GCP's TCP Loadbalancing service[4]. Which opens the door for some interesting out-of-the-box solutions.

Attempted Solutions

None yet, unfortunately. Though if I were to guess, this is likely going to be a bit more complicated than implementing XFF header extraction.

Apologies for linking a blog post, but this could also provide some interesting insights as to how one could go about implementing their own PROXY protocol parser[5] in addition to some pretty comprehensive research too.

Proposal

Adding PROXY protocol support to socket and syslog connections(of the TCP variant only).

References

[1] HAProxy PROXY protocol thousand foot view: https://www.haproxy.com/blog/haproxy/proxy-protocol/ [2] HAProxy PROXY protocol spec: https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt [3] https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html#proxy-protocol [4] https://cloud.google.com/load-balancing/docs/tcp/setting-up-tcp#proxy-protocol [5] https://seriousben.com/posts/2020-02-exploring-the-proxy-protocol/#parsing-version-1

bderrly commented 2 years ago

@binarylogic, do you have any updates on this feature?

marcel-puchol-jt commented 1 year ago

As a workaround, we are using the following remap transformation:

read_proxy_protocol_v2:
  type: remap
  inputs: ["convert_base64"]
  drop_on_error: true
  drop_on_abort: false
  source: |
    # Code based in https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt

    message = string!(decode_base64(.base64_encoded_message) ?? .message)
    if (starts_with(message, decode_base16!("0D0A0D0A000D0A515549540A")) && length(message) >= 16) {
      protocol_and_command = chunks(encode_base16(slice!(message, 12, 13)), 1)

      # Only version 2 of the protocol is supported
      if (protocol_and_command[0] == "2") {
        command = get!(["local", "proxy"], [to_int!(protocol_and_command[1])])
        family = "unspec"
        protocol = "unspec"

        if (command == "proxy") {
          family_and_protocol = chunks(encode_base16(slice!(message, 13, 14)), 1)

          family = get!(["unspec", "inet", "inet6", "unix"], [to_int!(family_and_protocol[0])])
          protocol = get!(["unspec", "tcp", "udp"], [to_int!(family_and_protocol[1])])

          if (family == "inet") {
            .host = ip_ntop(slice!(message, 16, 20)) ?? "invalid"
            .port = parse_int!(encode_base16(slice!(message, 24, 26)), 16)
          }else if (family == "inet6") {
            .host = ip_ntop(slice!(message, 16, 33)) ?? "invalid"
            .port = parse_int!(encode_base16(slice!(message, 50, 52)), 16)
          }
        }

        .proxy_protocol_v2 = {
          "command": command,
          "family": family,
          "protocol": protocol,
        }

        beginning_message = 16 + parse_int!(encode_base16(slice!(message, 14, 16)), 16)
        .message = slice!(message, beginning_message)
      }
    }

Which can be tested with the following test:

tests:
  - name: syslog / proxy protocol / proxy information properly read
    inputs:
      - type: log
        insert_at: read_proxy_protocol_information
        log_fields:
          timestamp: ignore
          host: "10.220.5.157"
          port: 12345
          base64_encoded_message: "DQoNCgANClFVSVQKIREAVNRq/Z4K3AWd2X4ZcgMABKT9EmUEAD4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAHRlc3QK"
    outputs:
      - extract_from: read_proxy_protocol_information
        conditions:
          - type: vrl
            source: |-
              event = compact(. | { "base64_encoded_message": null })
              assert_eq!(event, {
                "timestamp": "ignore",
                "host": "212.106.253.158",
                "port": 55678,
                "proxy_protocol_v2": {
                  "command": "proxy",
                  "family": "inet",
                  "protocol": "tcp"
                },
                "message": "test\n"
              })