Move `storage` k8s node to a new SX65 Hetzner server

benoit74 commented 3 months ago

This is the follow-up to https://github.com/kiwix/operations/issues/215

When want to move storage k8s node to a new SX65 Hetzner server.

We will deploy it in Helsinki to benefit from lower prices with similar worldwide connectivity. PUE is also probably better in Finland than in Germany.

We will use the default 64G RAM to start with, and upgrade only when needed.

The server will have 2x 1T SSD + 4x 22T HDD.

The plan is following setup:

512M RAID1 on the 2 SSD for /boot
250G RAID1 on the 2 SSD for /
rest of the SSD in RAID0 (~ 1.5T) to be used for cache data and cache metadata
all HDD in RAID6 (gives 44T) for /data

This choice is made to:

ensure good availability of /boot and / with RAID1
ensure maximum cache size and performance with RAID0
ensure maximum durability and short recovery in case of disk failure for /data with RAID6 (plus we have already way too much size with 44T)

benoit74 commented 3 months ago

One important thing (not a concern at all AFAIK for our use case)

We block outgoing traffic on ports 25 and 465 on all servers by default. You can create a support request to unblock these ports for a valid use case. Please visit Hetzner Docs for further information.

rgaudin commented 3 months ago

Very good to know and have documented the SMTP blockage. Indeed not an issue for us at this time.

benoit74 commented 3 months ago

RAID6 in LVM needs at least 5 drives ... so let's go for usual RAID5 then.

https://serverfault.com/questions/931150/why-does-lvmraid6-need-5-drives

benoit74 commented 3 months ago

I finally changed the plan quite significantly for following reasons:

I failed to setup the machine manually with lvm as planned
installimage tool provided by Hetzner is using lvm on top of mdadm ; mdadm is handling the RAID, lvm is providing flexibility in volume assignments on top of this RAID
mdadm seems to be sufficient for our use case + provides better raw performances + is capable to handle 4 disks in RAID6

New plan is hence as follow:

storage2 excalidraw

I've kept some free space in LVM (both raid1 and raid6) because this space is not deemed necessary, we might need it for other things in the future, it is always easier to increase partition size than reduce it, ext4 supports transparent on-the-fly partition resizing.

I did not kept any free space on raid0 because this is the cache, and it can be destructed / rebuild quite easily.

I've documented the process at https://github.com/kiwix/operations/wiki/Machine-and-k8s-node-Setup#special-setup-for-dedicated-machines-on-hetzner

Currently the machine is running and copying download.kiwix.org content (I've whitelisted a seat on master.download.kiwix.org module in rsync). It is also resyncing the RAID6 array on /dev/md3 ; only 20% done so far, I expect it to be completed somewhere on Saturday.

I've configured DNS storage2.k8s.kiwix.org and reverse DNS on bastion.

I've done the whole basic configuration we do on all machines.

I've configured the node as k8s node (with role storage2). First services have already started successfully (grafana, nginx-ingress, kilo, konnectivity-agent, kube-proxy, debug-network-tools).

What is left:

measure disks performance to have a baseline (can be done only once /dev/md3 resync is completed, resync is using lot's of IOs)
migrate services to this new node (at k8s level + at DNS level - all CNAME DNS records from all our domains registered at Gandi pointing to storage.k8s.kiwix.org have been checked to have a TTL of 300)
configure the SSD cache (I would like to do it only after the services have been migrated and running for few days, so that we are able to assess the shift on performance - if any)

benoit74 commented 2 months ago

For the record, these are current performance on HDD:

dd if=/dev/zero of=/data/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.05212 s, 265 MB/s

dd if=/dev/zero of=/data/test2.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 39.1908 s, 13.1 kB/s

This is ~2.5x faster than what we've observed on storage node in Scaleway when the node was behaving better.

On SSDs:

dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.57673 s, 681 MB/s

dd if=/dev/zero of=/tmp/test2.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 7.04899 s, 72.6 kB/s

benoit74 commented 2 months ago

What we've done to synchronize data: start a custom rsync server which exposes all /data

Config at /etc/rsyncd.conf:

charset = utf-8

log file = /var/log/rsync.log
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsync.lock

numeric ids = yes
# allow enough time for --list-only on whole thing (20m)
timeout = 1200

uid = 0
gid = 0
# (not working) port = 12389
read only = true

dont compress = *.zim *.gz *.xz *.bz2 *.zip *.img

# private module for storage2 sync
[all.download.kiwix.org]
path = /data
comment = All /data content
max connections = 2
lock file = /var/lock/mirrors.lock
list = no
hosts allow = 135.181.224.247 2a01:4f9:3071:2d08::/64

Start a custom rsync server :

rsync --daemon --config /etc/rsyncd.conf --port 12389 -4

Sync data on storage2 node:

cd /data
rsync -vzrlptD --progress --delete --port=12389 master.download.kiwix.org::all.download.kiwix.org/ ./

Procedure to transfer services:

cordon then drain the old node in k9s

run a "last" rsync
update storage.k8s.kiwix.org DNS record to point to new node IP
wait 5 minutes to wait for DNS propagation
check disk performances:

dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.05402 s, 212 MB/s

delete old storage node

replace storage2 labels to use storage role

kubectl label node scw-kiwix-prod-foreign-e61f0a58d1cf43adafde59a node-role.kubernetes.io/storage2-
kubectl label node scw-kiwix-prod-foreign-e61f0a58d1cf43adafde59a node-role.kubernetes.io/storage=true
kubectl label node scw-kiwix-prod-foreign-e61f0a58d1cf43adafde59a k8s.kiwix.org/role=storage --overwrite

rgaudin commented 2 months ago

rsync command needs -og to preserve owner and group (used only by mirrorbrain-db postgres and zim-receiver AFAIK)
new IP must be in whitelist for zimfarm watcher (ZimitWorker policy)

benoit74 commented 2 months ago

Also forgot to unlabel old storage node:

kubectl label node scw-kiwix-prod-foreign-20abab10b97b430bbe7b571 node-role.kubernetes.io/storage-
kubectl label node scw-kiwix-prod-foreign-20abab10b97b430bbe7b571 node-role.kubernetes.io/storage-old=true
kubectl label node scw-kiwix-prod-foreign-20abab10b97b430bbe7b571 k8s.kiwix.org/role=storage-old --overwrite

benoit74 commented 2 months ago

All services are up and running as far as we know

kiwix / operations

Move `storage` k8s node to a new SX65 Hetzner server #226