hashicorp / vault-lambda-extension

Mozilla Public License 2.0
138 stars 29 forks source link

Unable to setup vault HA using raft in containers #143

Closed Jding2159 closed 4 months ago

Jding2159 commented 4 months ago

I am trying to setup a vault cluster with 3 nodes in proxmox but I am having issue getting node 2 and 3 unsealling. When I attempt to unseal I see them show up on node1 when I run

/ # vault operator raft list-peers
Node     Address               State       Voter
----     -------               -----       -----
node1    192.168.1.191:8201    leader      true
node2    192.168.1.192:8201    follower    false
node3    192.168.1.193:8201    follower    false

here is my docker-compose file for all 3 node the only thing thats different is the ipaddress.

version: '3'

services:
  vault:
    image: hashicorp/vault:1.16
    container_name: vault
    ports:
      - 8200:8200
      - 8201:8201
    cap_add:
      - IPC_LOCK
    restart: unless-stopped
    volumes:
      - ./data/logs:/vault/logs
      - ./data/config/config.hcl:/vault/config/config.hcl
      - ./data:/vault/data   # Mount the data directory correctly
      - /mnt/vaultbackup:/mnt/vaultbackup
    environment:
      VAULT_CLUSTER_ADDR: "http://192.168.1.191:8201"
      VAULT_ADDR: "http://0.0.0.0:8200"
    entrypoint: vault server -config=/vault/config/config.hcl

volumes:
  vault_data:

after running raft join and unseal on node2 and node3 inside the container

/ # vault operator raft join http://192.168.1.191:8200
Key       Value
---       -----
Joined    true
/ # vault operator unseal
Unseal Key (will be hidden):
Key                Value
---                -----
Seal Type          shamir
Initialized        true
Sealed             true
Total Shares       1
Threshold          1
Unseal Progress    0/1
Unseal Nonce       n/a
Version            1.16.3
Build Date         2024-05-29T14:28:42Z
Storage Type       raft
HA Enabled         true

This is what I see on node1

/ # vault operator members
Host Name       API Address                  Cluster Address               Active Node    Version    Upgrade Version    Redundancy Zone    Last Echo
---------       -----------                  ---------------               -----------    -------    ---------------    ---------------    ---------
b680d05249f9    http://192.168.1.191:8200    https://192.168.1.191:8201    true           1.16.3     1.16.3             n/a                n/a
/ # vault operator raft list-peers
Node     Address               State       Voter
----     -------               -----       -----
node1    192.168.1.191:8201    leader      true
node2    192.168.1.192:8201    follower    false
node3    192.168.1.193:8201    follower    false

This is my config.hcl, its similar across all 3 node the only thing thats different is the ip address for cluster address and api address

storage "raft" {
  path = "/vault/data"
  node_id = "node1"
}

listener "tcp" {
  address = "0.0.0.0:8200"
  cluster_address = "192.168.1.191:8201"
  tls_disable = "1"
}

ui = true

api_addr = "http://192.168.1.191:8200"
cluster_addr = "http://192.168.1.191:8201"

on node1 docker logs

2024-07-30T01:01:09.711Z [ERROR] storage.raft: failed to appendEntries to: peer="{Nonvoter node3 192.168.1.193:8201}" error="dial tcp 192.168.1.193:8201: connect: connection refused"
2024-07-30T01:01:11.225Z [ERROR] storage.raft: failed to heartbeat to: peer=192.168.1.193:8201 backoff time=2.5s error="dial tcp 192.168.1.193:8201: connect: connection refused"
2024-07-30T01:01:11.390Z [ERROR] storage.raft: failed to heartbeat to: peer=192.168.1.192:8201 backoff time=2.5s error="dial tcp 192.168.1.192:8201: connect: connection refused"

on node2 and 3

2024-07-30T00:58:20.179Z [INFO]  core.cluster-listener.tcp: starting listener: listener_address=192.168.1.192:8201
2024-07-30T00:58:20.179Z [ERROR] core.cluster-listener.tcp: error starting listener: error="listen tcp 192.168.1.192:8201: bind: cannot assign requested address"
2024-07-30T00:58:20.181Z [INFO]  storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:15000000000, ElectionTimeout:15000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:\"node2\", NotifyCh:(chan<- bool)(0xc0028f24d0), LogOutput:io.Writer(nil), LogLevel:\"DEBUG\", Logger:(*hclog.interceptLogger)(0xc002f4d860), NoSnapshotRestoreOnStart:true, skipStartup:false}"
2024-07-30T00:58:20.181Z [INFO]  storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:node1 Address:192.168.1.191:8201} {Suffrage:Nonvoter ID:node2 Address:192.168.1.192:8201}]"
2024-07-30T00:58:20.181Z [INFO]  storage.raft: entering follower state: follower="Node at 192.168.1.192:8201 [Follower]" leader-address= leader-id=
2024-07-30T00:58:20.181Z [INFO]  core: security barrier not initialized
2024-07-30T00:58:24.879Z [INFO]  core: security barrier not initialized
2024-07-30T00:58:24.881Z [INFO]  core: security barrier not initialized
2024-07-30T00:58:35.878Z [INFO]  core: security barrier not initialized
2024-07-30T00:58:35.878Z [INFO]  core: security barrier not initialized
2024-07-30T00:58:40.037Z [WARN]  storage.raft: heartbeat timeout reached, not part of a stable configuration or a non-voter, not triggering a leader election

upon doing some digging it doesnt look like my container is listening on port 8201 on node1 but it is listening on 8200. Outside the container its listening to 8201 just fine

admin@vault1:~/vault/data/config$ sudo netstat -tuln | grep 8201
[sudo] password for admin:
tcp        0      0 0.0.0.0:8201            0.0.0.0:*               LISTEN
tcp6       0      0 :::8201                 :::*                    LISTEN
admin@vault1:~/vault/data/config$ sudo netstat -tuln | grep 8200
tcp        0      0 0.0.0.0:8200            0.0.0.0:*               LISTEN
tcp6       0      0 :::8200                 :::*                    LISTEN

but inside the container, its only listening to 8200

/ # netstat -tuln | grep 8201
/ # netstat -tuln | grep 8200
tcp        0      0 0.0.0.0:8200            0.0.0.0:*               LISTEN

output of docker ps

docker ps
CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS          PORTS                                                           NAMES
b680d05249f9   hashicorp/vault:1.16   "vault server -confi…"   22 minutes ago   Up 22 minutes   0.0.0.0:8200-8201->8200-8201/tcp, :::8200-8201->8200-8201/tcp   vault

What am I doing wrong? I already exposed the 8201 port in the docker compose file.

Any help is appreciated!