kiwix / operations

Kiwix Kubernetes Cluster
http://charts.k8s.kiwix.org/
7 stars 0 forks source link

Move `storage` k8s node to a new SX65 Hetzner server #226

Closed benoit74 closed 2 months ago

benoit74 commented 3 months ago

This is the follow-up to https://github.com/kiwix/operations/issues/215

When want to move storage k8s node to a new SX65 Hetzner server.

We will deploy it in Helsinki to benefit from lower prices with similar worldwide connectivity. PUE is also probably better in Finland than in Germany.

We will use the default 64G RAM to start with, and upgrade only when needed.

The server will have 2x 1T SSD + 4x 22T HDD.

The plan is following setup:

This choice is made to:

benoit74 commented 3 months ago

One important thing (not a concern at all AFAIK for our use case)

We block outgoing traffic on ports 25 and 465 on all servers by default. You can create a support request to unblock these ports for a valid use case. Please visit Hetzner Docs for further information.

rgaudin commented 3 months ago

Very good to know and have documented the SMTP blockage. Indeed not an issue for us at this time.

benoit74 commented 3 months ago

RAID6 in LVM needs at least 5 drives ... so let's go for usual RAID5 then.

https://serverfault.com/questions/931150/why-does-lvmraid6-need-5-drives

benoit74 commented 3 months ago

I finally changed the plan quite significantly for following reasons:

New plan is hence as follow:

storage2 excalidraw

I've kept some free space in LVM (both raid1 and raid6) because this space is not deemed necessary, we might need it for other things in the future, it is always easier to increase partition size than reduce it, ext4 supports transparent on-the-fly partition resizing.

I did not kept any free space on raid0 because this is the cache, and it can be destructed / rebuild quite easily.

I've documented the process at https://github.com/kiwix/operations/wiki/Machine-and-k8s-node-Setup#special-setup-for-dedicated-machines-on-hetzner

Currently the machine is running and copying download.kiwix.org content (I've whitelisted a seat on master.download.kiwix.org module in rsync). It is also resyncing the RAID6 array on /dev/md3 ; only 20% done so far, I expect it to be completed somewhere on Saturday.

I've configured DNS storage2.k8s.kiwix.org and reverse DNS on bastion.

I've done the whole basic configuration we do on all machines.

I've configured the node as k8s node (with role storage2). First services have already started successfully (grafana, nginx-ingress, kilo, konnectivity-agent, kube-proxy, debug-network-tools).

What is left:

benoit74 commented 2 months ago

For the record, these are current performance on HDD:

dd if=/dev/zero of=/data/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.05212 s, 265 MB/s
dd if=/dev/zero of=/data/test2.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 39.1908 s, 13.1 kB/s

This is ~2.5x faster than what we've observed on storage node in Scaleway when the node was behaving better.

On SSDs:

dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.57673 s, 681 MB/s
dd if=/dev/zero of=/tmp/test2.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 7.04899 s, 72.6 kB/s
benoit74 commented 2 months ago

What we've done to synchronize data: start a custom rsync server which exposes all /data

Config at /etc/rsyncd.conf:

charset = utf-8

log file = /var/log/rsync.log
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsync.lock

numeric ids = yes
# allow enough time for --list-only on whole thing (20m)
timeout = 1200

uid = 0
gid = 0
# (not working) port = 12389
read only = true

dont compress = *.zim *.gz *.xz *.bz2 *.zip *.img

# private module for storage2 sync
[all.download.kiwix.org]
path = /data
comment = All /data content
max connections = 2
lock file = /var/lock/mirrors.lock
list = no
hosts allow = 135.181.224.247 2a01:4f9:3071:2d08::/64

Start a custom rsync server :

rsync --daemon --config /etc/rsyncd.conf --port 12389 -4

Sync data on storage2 node:

cd /data
rsync -vzrlptD --progress --delete --port=12389 master.download.kiwix.org::all.download.kiwix.org/ ./

Procedure to transfer services:

Screenshot 2024-09-05 at 11 28 42
dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.05402 s, 212 MB/s
rgaudin commented 2 months ago
benoit74 commented 2 months ago

Also forgot to unlabel old storage node:

kubectl label node scw-kiwix-prod-foreign-20abab10b97b430bbe7b571 node-role.kubernetes.io/storage-
kubectl label node scw-kiwix-prod-foreign-20abab10b97b430bbe7b571 node-role.kubernetes.io/storage-old=true
kubectl label node scw-kiwix-prod-foreign-20abab10b97b430bbe7b571 k8s.kiwix.org/role=storage-old --overwrite
benoit74 commented 2 months ago

All services are up and running as far as we know