hetznercloud / terraform-provider-hcloud

Terraform Hetzner Cloud provider
https://registry.terraform.io/providers/hetznercloud/hcloud/latest
Mozilla Public License 2.0
510 stars 74 forks source link

[Bug]: server starts and stops two times during first boot #1019

Open samene opened 1 week ago

samene commented 1 week ago

What happened?

When creating a hcloud_server resource the server starts and stops two times before finally starting up. The sequence of events is as below

activity_type created status triggered_by
server.start 2024-10-28T05:09:16Z success System
server.start 2024-10-28T05:09:06Z requested BUCw8I***
server.stop 2024-10-28T05:09:04Z success System
server.stop 2024-10-28T05:08:49Z requested BUCw8I***
server.start 2024-10-28T05:08:43Z success System
server.start 2024-10-28T05:08:33Z requested BUCw8I***
network.attach 2024-10-28T05:08:31Z success System
network.attach 2024-10-28T05:08:29Z requested BUCw8I***
server.create 2024-10-28T05:08:27Z success System
server.create 2024-10-28T05:08:24Z requested BUCw8I***
When the same server is created from UI or from the official ansible playbook, only the following events are generated activity_type created status project triggered_by
network.attach 2024-10-28T05:05:51Z success work System
server.create 2024-10-28T05:05:45Z requested Sa***
network.attach 2024-10-28T05:05:42Z success work Sa***
server.create 2024-10-28T05:05:42Z requested Sa***

What did you expect to happen?

Why is the server starting and stopping twice? It should only start once.

I believe this is messing up my cloud-init script. Especially the part where I am adding sudo users

# auth section
disable_root: false
ssh_pwauth: yes
users:
  - name: cloud-user
    groups: users, admin
    expiredate: '2032-09-01'
    lock_passwd: false
    ssh_authorized_keys:
      - ${public_key}
    sudo: "ALL=(ALL) NOPASSWD:ALL"
    shell: /bin/bash

The cloud-init runs the first time, but before it reaches this part the server is restarted. When it starts again and cloud-init runs again but skips the part about sudo user config. Thus my sudo user is missing

Why does the server start and stop twice?

Please provide a minimal working example

resource "hcloud_server" "rancher_vms" {
  count       = var.rancher_nodes
  name        = format("%s-rancher-%d",var.prefix, count.index + 1)
  image       = var.rancher_image
  server_type = var.rancher_flavor
  datacenter  = var.datacenter

  ssh_keys = [data.hcloud_ssh_key.hcloud-pkey.id]
  network {
    network_id = data.hcloud_network.pe-network.id
    alias_ips = []
  }
  public_net {
    ipv4_enabled = false
    ipv6_enabled = false
  }
  user_data = templatefile("user-data.yml.tftpl", {
    http_proxy = var.proxy,
    public_key = data.hcloud_ssh_key.hcloud-pkey.public_key,
    hostname   = format("%s-rancher-%d",var.prefix, count.index + 1)
  })
}

The same observation without user_data too.

samene commented 1 week ago

I traced it back to this commit two years ago https://github.com/hetznercloud/terraform-provider-hcloud/pull/552/files

where there is a poweroff and poweron when there is no public net. I believe this may have been a API issue which is now fixed by hetzner and so this workaround should now be removed. ?

jooola commented 1 week ago

I believe this may have been a API issue which is now fixed

Where does this claim come from? Did you experiment to see if the behavior was fixed?

I opened another PR to run the CI multiple times, and I'll implement a test to make sure we do not have a regression.

samene commented 1 week ago

I believe this may have been a API issue which is now fixed

Where does this claim come from? Did you experiment to see if the behavior was fixed?

I opened another PR to run the CI multiple times, and I'll implement a test to make sure we do not have a regression.

yes I did run a local test and I didnt see any issues with network when public net is disabled.

jooola commented 1 day ago

@samene Could you provide us the full cloud-init configuration ? It might be useful to add them to the regression tests.

samene commented 1 day ago

here it is, as is. you may need to adjust it a bit. The main thing to test is that the user is able to sudo

#cloud-config
write_files:
- path: /etc/environment
  owner: root
  content: |
    http_proxy=${http_proxy}
    https_proxy=${http_proxy}
    no_proxy=127.0.0.1/8,172.17.0.0/16,172.18.0.0/16,10.43.0.0/16,10.151.0.0/16,10.152.0.0/16,.hetzner-test.site,*.hetzner-test.site,.nip.io,*.nip.io
    HTTP_PROXY=${http_proxy}
    HTTPS_PROXY=${http_proxy}
    NO_PROXY=127.0.0.1/8,172.17.0.0/16,172.18.0.0/16,10.43.0.0/16,10.151.0.0/16,10.152.0.0/16,.hetzner-test.site,*.hetzner-test.site,.nip.io,*.nip.io
    CONTAINERD_HTTP_PROXY=${http_proxy}
    CONTAINERD_HTTPS_PROXY=${http_proxy}
    CONTAINERD_NO_PROXY=127.0.0.1/8,172.17.0.0/16,172.18.0.0/16,10.43.0.0/16,10.152.0.0/16,10.151.0.0/16,.hetzner-test.site,*.hetzner-test.site,nip.io,*.nip.io
    PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin
- path: /etc/sysctl.d/60-disable-ipv6.conf
  owner: root
  content: |
    net.ipv6.conf.all.disable_ipv6=1
    net.ipv6.conf.default.disable_ipv6=1
    net.ipv6.conf.lo.disable_ipv6=1
    net.bridge.bridge-nf-call-iptables=1
    net.ipv6.conf.all.forwarding=0    
- path: /etc/sudoers.d/91-sudopath
  owner: root
  content: |
    Defaults secure_path = /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin    
- path: /etc/dnf/dnf.conf
  content: |
    proxy=${http_proxy}
  append: true
- path: /etc/yum.conf
  content: |
    proxy=${http_proxy}
  append: true
- path: /etc/resolv.conf
  owner: root
  content: |
    nameserver 10.152.0.6
    nameserver 185.12.64.1
    domain .      
hostname: ${hostname}
fqdn: ${hostname}.hetzner-test.site
prefer_fqdn_over_hostname: true
# auth section
disable_root: false
ssh_pwauth: yes
users:
  - name: cloud-user
    groups: users, admin
    expiredate: '2032-09-01'
    lock_passwd: false
    ssh_authorized_keys:
      - ${public_key}
    sudo: "ALL=(ALL) NOPASSWD:ALL"
    shell: /bin/bash
  - name: hammer
    groups: users, admin
    expiredate: '2032-09-01'
    lock_passwd: false
    passwd: "$6$HRIebdEL4Q4v6M9S$Z1otUWUQS9slQxJUCYu2hD.s/PxhlX1cD53YNlkIBAU2PofST24MbYpay.7xX/pLwecJlscJQesVltuIw7fCB/"
    sudo: "ALL=(ALL) NOPASSWD:ALL"
    shell: /bin/bash
runcmd:
- until ip a | grep '10.151' > /dev/null; do sleep 5 && echo "waiting for network..."; done
- systemctl stop sshd
- hostnamectl set-hostname ${hostname}
- ip route add default via 10.151.0.1 dev enp7s0
- ip route add 10.152.0.0/24 via 10.151.0.1 dev enp7s0
- sed -i 's/#\?\(PermitRootLogin\s*\).*$/\1 yes/' /etc/ssh/sshd_config
- systemctl restart sshd