Closed choffmeister closed 2 years ago
Hey @choffmeister,
the endpoint only works on newer servers (and servers that are not created from a snapshot). Therefore it works as expected :)
@LKaemmerling The server is new (created yesterday) but created from a snapshot (we are using Talos for K8s).
That is a bummer. Will this change? Because if it stays that way I would mean, that we always have to run on a forked csi-driver. Which is not too big of a problem, but I just wonder, why that is. It will also mean, that if someone recovers from a snapshot, it is not working like it was working before recovery.
Edit: Just found out, that talos can be installed without starting from snapshot (though it is faster). So thanks for pointing me in the right direction. Still would be super interesting to know, why servers from snapshots cannot get the availability zone (I guess they live somewhere just as a freshly created server :smile:)
@choffmeister because we need to make it backwards compatible. We do not know if the snapshot the server is created from already has the new cloud unit datasource, we only know this from our own system images.
That makes a lot of sense. Thanks for sharing! Will find a way to bootstrap our k8s nodes without using snapshots.
@choffmeister because we need to make it backwards compatible. We do not know if the snapshot the server is created from already has the new cloud unit datasource, we only know this from our own system images.
In my opinion, we are already broke the backwards compatibility. Because before "snapshot-server" has access the meta server.
Many could providers add an option to switch on/off the meta server at create time.
The metadata service is accessible, just some fields are missing for older servers.
Though I wonder: Would it really break anything if new endpoints (my understanding is that /region and such are completely new endpoints) are visible to old servers? Should not be a problem or am I missing something?
I am little bit confuse. As I know Hetzner Cloud does not have its own Kubernetes as service solution. But has very good CCM/CSI plugins.
And now, those plugins work only with a few Hetzner os-images. And you cannot make pre build images base on Hetzner images either.
This is looks like vendor lock. Very sad news. Very sad decision...
@sergelogvinov The plugins still work fine if the installation process is adjusted. But it is indeed more complicated now (especially if you have many servers that you want to bootstrap), as you have to always start from a known Hetzner base image and then do a live in-place installation. For example this works out fine for what we use (Talos):
log "Creating server in rescue mode..."
hcloud server create --name ${NODE_NAME} \
--image debian-11 \
--type ${SERVER_TYPE} \
--ssh-key ${SSH_KEY} \
--location ${HCLOUD_LOCATION} \
--user-data-from-file ${NODE_CONFIG} \
--start-after-create=false
hcloud server enable-rescue ${NODE_NAME} --ssh-key ${SSH_KEY}
hcloud server poweron ${NODE_NAME}
cat << EOF | hcloud server ssh ${NODE_NAME}
cd /tmp
wget -O /tmp/talos.raw.xz https://github.com/talos-systems/talos/releases/download/v0.14.2/hcloud-amd64.raw.xz
xz -d -c /tmp/talos.raw.xz | dd of=/dev/sda && sync
EOF
hcloud server shutdown ${NODE_NAME}
hcloud server poweron ${NODE_NAME}
But @LKaemmerling, what I still would like to understand if you could be so kind:
@LKaemmerling the CSI driver has the hcloud API token, would it be possible for the CSI driver to then, instead of querying the availability-zone
metadata endpoint, to query the instance-id
to get the instance ID, and then do a API call to Get a Server, which includes information about the datacenter.
{
"..."
"datacenter": {
"id": 3,
"name": "hel1-dc2",
"description": "Helsinki 1 DC 2",
"location": {
"id": 3,
"name": "hel1",
"description": "Helsinki DC Park 1",
"country": "FI",
"city": "Helsinki",
"latitude": 60.169855,
"longitude": 24.938379,
"network_zone": "eu-central"
},
"...",
},
"...",
}
I think that would be backwards compatible?
And I see the CSI driver is already getting the instance-id
, just before getting the availability-zone
https://github.com/hetznercloud/csi-driver/blob/main/cmd/node/main.go#L27-L31
Tested it a several different machines in the Hetzner cloud, but http://169.254.169.254/hetzner/v1/metadata/availability-zone (dispatched here always returns a
HTTP 404
):This is a problem, since this call is used in the csi-driver where it always gets back the response
availability-zone not found
and from that parses, that the availability zone itself is calledavailability
instead of for examplenbg1
.Note: Other endpoints like http://169.254.169.254/hetzner/v1/metadata or http://169.254.169.254/hetzner/v1/metadata/public-ipv4 work just fine.