vectordotdev / vector

A high-performance observability data pipeline.
Mozilla Public License 2.0
17k stars 1.47k forks source link

Add PROXY(haproxy-style) protocol support for socket and syslog sources #6769

Open rwaweber opened 3 years ago

rwaweber commented 3 years ago

Current Vector Version



Similar in motivation to but likely a bit more complicated in implementation.

The motivation is roughly the same, in terms of wanting to retain the source address of an event producer that ships to a loadbalancer, whose address then gets set as the backing address.

A loose overview of the PROXY protocol, as well as piece of software that support it, is available from the HAProxy folks at [1]. Alot more detail can found with the spec at [2].

I think that this is also the same protocol that is used by both AWS NLBs[3] and GCP's TCP Loadbalancing service[4]. Which opens the door for some interesting out-of-the-box solutions.

Attempted Solutions

None yet, unfortunately. Though if I were to guess, this is likely going to be a bit more complicated than implementing XFF header extraction.

Apologies for linking a blog post, but this could also provide some interesting insights as to how one could go about implementing their own PROXY protocol parser[5] in addition to some pretty comprehensive research too.


Adding PROXY protocol support to socket and syslog connections(of the TCP variant only).


[1] HAProxy PROXY protocol thousand foot view: [2] HAProxy PROXY protocol spec: [3] [4] [5]

bderrly commented 2 years ago

@binarylogic, do you have any updates on this feature?

marcel-puchol-jt commented 1 year ago

As a workaround, we are using the following remap transformation:

  type: remap
  inputs: ["convert_base64"]
  drop_on_error: true
  drop_on_abort: false
  source: |
    # Code based in

    message = string!(decode_base64(.base64_encoded_message) ?? .message)
    if (starts_with(message, decode_base16!("0D0A0D0A000D0A515549540A")) && length(message) >= 16) {
      protocol_and_command = chunks(encode_base16(slice!(message, 12, 13)), 1)

      # Only version 2 of the protocol is supported
      if (protocol_and_command[0] == "2") {
        command = get!(["local", "proxy"], [to_int!(protocol_and_command[1])])
        family = "unspec"
        protocol = "unspec"

        if (command == "proxy") {
          family_and_protocol = chunks(encode_base16(slice!(message, 13, 14)), 1)

          family = get!(["unspec", "inet", "inet6", "unix"], [to_int!(family_and_protocol[0])])
          protocol = get!(["unspec", "tcp", "udp"], [to_int!(family_and_protocol[1])])

          if (family == "inet") {
            .host = ip_ntop(slice!(message, 16, 20)) ?? "invalid"
            .port = parse_int!(encode_base16(slice!(message, 24, 26)), 16)
          }else if (family == "inet6") {
            .host = ip_ntop(slice!(message, 16, 33)) ?? "invalid"
            .port = parse_int!(encode_base16(slice!(message, 50, 52)), 16)

        .proxy_protocol_v2 = {
          "command": command,
          "family": family,
          "protocol": protocol,

        beginning_message = 16 + parse_int!(encode_base16(slice!(message, 14, 16)), 16)
        .message = slice!(message, beginning_message)

Which can be tested with the following test:

  - name: syslog / proxy protocol / proxy information properly read
      - type: log
        insert_at: read_proxy_protocol_information
          timestamp: ignore
          host: ""
          port: 12345
      - extract_from: read_proxy_protocol_information
          - type: vrl
            source: |-
              event = compact(. | { "base64_encoded_message": null })
              assert_eq!(event, {
                "timestamp": "ignore",
                "host": "",
                "port": 55678,
                "proxy_protocol_v2": {
                  "command": "proxy",
                  "family": "inet",
                  "protocol": "tcp"
                "message": "test\n"