grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
24k stars 3.46k forks source link

Enable IPv6 by default, support IPv6 for frontend #13416

Open flokli opened 4 months ago

flokli commented 4 months ago

Is your feature request related to a problem? Please describe. I got bitten by Loki disabling IPv6 everywhere by default.

I wanted to do a all-in-one deployment on 3 individual machines, which share a IPv6-only network interface.

I configured it to pick IP addresses from that interface (via common/ring/instance_interface_names), and got greeted by the

loki[268306]: no useable address found for interfaces [mycelium]

error message.

After some searching, I discovered Loki requires you to enable IPv6 explicitly for each and every component,and is disabled by default.

https://github.com/grafana/loki/pull/10650 provides a snippet that enables IPv6 in every individual component. It's quite a bit of work, in 2024 I wouldn't expect something to come with IPv6 disabled by default, especially with the rise of IPv6-only deployments (be it Kubernetes or outside).

Describe the solution you'd like

Describe alternatives you've considered More prominently documenting at least 27 additional lines of config are required to make loki ring discovery work in an IPv6-only environment.

Additional context https://github.com/grafana/loki/pull/10650

cc @matthewpi @periklis @leahoswald @rfratto

flokli commented 4 months ago

It seems it isn't even possible to enable IPv6 for all components. Setting frontend.instance-interface-names=mycelium to that interface simply results in an unrecoverable startup error.

gillg commented 4 months ago

@periklis @shwetaap I definitely don't understand why IPv6 is still broken on the mainstream versions.

I use IPv6 everyday on a fullstack IPv6 environement (no IPv4 possible) since April 2023. And unfortunately I can't update Loki anymore, I'm stuck using the custom docker image quay.io/shwetaap/loki:dev. I can share my config if that could help.

auth_enabled: true

common:
  compactor_address: http://loki-loki-distributed-compactor:3100

distributor:
  ring:
    instance_addr: loki-loki-distributed-distributor
    kvstore:
      store: memberlist

frontend:
  compress_responses: true
  log_queries_longer_than: 20s
  tail_proxy_url: http://loki-loki-distributed-querier:3100

frontend_worker:
  frontend_address: loki-loki-distributed-query-frontend-headless:9095

index_gateway:
  mode: simple

ingester:
  lifecycler:
    enable_inet6: true
    ring:
      kvstore:
        store: memberlist
      replication_factor: 2
  wal:
    dir: /var/loki/wal
    enabled: true
    replay_memory_ceiling: 1g

memberlist:
  bind_addr:
  - '::'
  join_members:
  - loki-loki-distributed-memberlist

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        ttl: 24h

query_scheduler:
  use_scheduler_ring: true

ruler:
  enable_alertmanager_discovery: false
  enable_api: true
  enable_sharding: false
  evaluation_interval: 1m
  poll_interval: 1m
  ring:
    kvstore:
      store: memberlist
  rule_path: /tmp/loki/scratch
  storage:
    local:
      directory: /etc/loki/rules
    type: local

server:
  grpc_listen_address: '[::0]'
  grpc_listen_port: 9095
  grpc_server_max_recv_msg_size: 10485760
  http_listen_address: '[::0]'
  http_listen_port: 3100
  http_server_read_timeout: 300s
  http_server_write_timeout: 300s
  log_level: info

I removed some urevelant parts of the config, so don't use it exactly as is, but it illustrates the "tricks" made to work with the custom image. The GRPC binding is explicit grpc_listen_address: '[::0]', the memberlist.bind_addr: ['::'] too else the components can't conect to each others. Outside of that 2 configs there is leteraly nothing specific.

periklis commented 4 months ago

@flokli AFAIU on each machine you run Loki on, you have a single IPv6 interface right? But you want your all-in-one Loki to use IPv4 (i assume lo device here?). However Loki is picking up the IPv6 address instead? Can you post your Loki Config as well as a listing of the interfaces on your machine? Btw this is the FinalAdvertiseAddr function responsible for picking up addresses from your interfaces, maybe you can quickly review if your machine setup runs here in any missing edge case.

@gillg I don't quite understand, we have been running Loki in production with IPv6-only as well as IPv4/IPv6 dual stack kubernetes/openshift clusters for over a year now using GA versions (from 2.8 till 3.1.0 IIRC). The three particular config options for us are documented in this test (FYI the HASH_RING_INSTANCE_ADDR env var is always picked up from .status.podIP for simplicity reasons).

Highlighting the particular config settings in this test:

common: 
  ring:
    instance_addr: ${HASH_RING_INSTANCE_ADDR}

ingester:
  lifecycler:
    enable_inet6: true

memberlist:
  advertise_addr: ${HASH_RING_INSTANCE_ADDR}

The last setting is an issue that we have when merging the common ring config, but we deemed this is ok because not everybody used memberlist before Loki 3.x (e.g. we have users running the ring over Consul or etcd). With Loki 3.x this might be a small improvement to squeeze the configuration down to two knobs.

flokli commented 4 months ago

@flokli AFAIU on each machine you run Loki on, you have a single IPv6 interface right? But you want your all-in-one Loki to use IPv4 (i assume lo device here?). However Loki is picking up the IPv6 address instead? Can you post your Loki Config as well as a listing of the interfaces on your machine? Btw this is the FinalAdvertiseAddr function responsible for picking up addresses from your interfaces, maybe you can quickly review if your machine setup runs here in any missing edge case.

The machines have multiple network interfaces, and various combination of v4/v6 or both on them. They're deployed in different locations, with various changing IPs, and that's why I'd like to have cluster gossip communication to happen via an (encrypted and authenticated) overlay network. This network provides a network interface with only IPv6 addresses on it (in this case, one address in the ULA range, and a IPv6 link-local address). All IPs are stable, derived from key material, which makes authenticating various nodes really only a matter of whitelisting IPs ;-)

So I wanted to have Loki listen on that overlay network interface. Usually, the instance_interface_names config options are used for this, but without also setting enable_ipv6 for each of these, the discovery code is unable to find an IP to pick. The global default also didn't seem to have an effect. I think we should flip the default for enable_ipv6 on all these.

As written in https://github.com/grafana/loki/issues/13416#issuecomment-2209182343, for the frontend it's not even possible to enable IPv6 currently.

My config currently looks like this (not enabling IPv6 for the frontend, explicitly adding instance_enable_ipv6 in many different places):

{
  "common": {
    "instance_interface_names": [
      "mycelium"
    ],
    "path_prefix": "/var/lib/loki",
    "replication_factor": 1,
    "ring": {
      "instance_enable_ipv6": true,
      "instance_interface_names": [
        "mycelium"
      ],
      "kvstore": {
        "store": "memberlist"
      }
    }
  },
  "compactor": {
    "compactor_ring": {
      "instance_enable_ipv6": true,
      "instance_interface_names": [
        "mycelium"
      ]
    }
  },
  "distributor": {
    "ring": {
      "instance_enable_ipv6": true,
      "instance_interface_names": [
        "mycelium"
      ]
    }
  },
  "frontend": {
    "instance_interface_names": [
      "end0"
    ]
  },
  "index_gateway": {
    "ring": {
      "instance_enable_ipv6": true
    }
  },
  "ingester": {
    "lifecycler": {
      "enable_inet6": true
    }
  },
  "memberlist": {
    "join_members": [
      "node1.<redacted>", # only AAAA record for these
      "node2.<redacted>",
      "node3.<redacted>"
    ]
  },
  "query_scheduler": {
    "scheduler_ring": {
      "instance_enable_ipv6": true
    }
  },
  "ruler": {
    "ring": {
      "instance_enable_ipv6": true
    }
  },
  "schema_config": {
    "configs": [
      {
        "from": "2020-07-01",
        "index": {
          "period": "24h",
          "prefix": "index_"
        },
        "object_store": "s3",
        "schema": "v13",
        "store": "tsdb"
      }
    ]
  },
  "server": {
    "http_listen_port": 3100
  },
  "storage_config": {
    "aws": {
      "access_key_id": "${AWS_ACCESS_KEY_ID}",
      "bucketnames": "logs",
      "endpoint": "https://s3.<redacted>",
      "region": "garage",
      "s3": "s3://logs",
      "secret_access_key": "${AWS_SECRET_ACCESS_KEY}"
    },
    "tsdb_shipper": {
      "active_index_directory": "/var/lib/loki/index",
      "cache_location": "/var/lib/loki/index_cache"
    }
  }
}