Closed benoit74 closed 2 months ago
One important thing (not a concern at all AFAIK for our use case)
We block outgoing traffic on ports 25 and 465 on all servers by default. You can create a support request to unblock these ports for a valid use case. Please visit Hetzner Docs for further information.
Very good to know and have documented the SMTP blockage. Indeed not an issue for us at this time.
RAID6 in LVM needs at least 5 drives ... so let's go for usual RAID5 then.
https://serverfault.com/questions/931150/why-does-lvmraid6-need-5-drives
I finally changed the plan quite significantly for following reasons:
installimage
tool provided by Hetzner is using lvm on top of mdadm ; mdadm is handling the RAID, lvm is providing flexibility in volume assignments on top of this RAIDNew plan is hence as follow:
I've kept some free space in LVM (both raid1 and raid6) because this space is not deemed necessary, we might need it for other things in the future, it is always easier to increase partition size than reduce it, ext4 supports transparent on-the-fly partition resizing.
I did not kept any free space on raid0 because this is the cache, and it can be destructed / rebuild quite easily.
I've documented the process at https://github.com/kiwix/operations/wiki/Machine-and-k8s-node-Setup#special-setup-for-dedicated-machines-on-hetzner
Currently the machine is running and copying download.kiwix.org content (I've whitelisted a seat on master.download.kiwix.org module in rsync). It is also resyncing the RAID6 array on /dev/md3 ; only 20% done so far, I expect it to be completed somewhere on Saturday.
I've configured DNS storage2.k8s.kiwix.org and reverse DNS on bastion.
I've done the whole basic configuration we do on all machines.
I've configured the node as k8s node (with role storage2
). First services have already started successfully (grafana, nginx-ingress, kilo, konnectivity-agent, kube-proxy, debug-network-tools).
What is left:
For the record, these are current performance on HDD:
dd if=/dev/zero of=/data/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.05212 s, 265 MB/s
dd if=/dev/zero of=/data/test2.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 39.1908 s, 13.1 kB/s
This is ~2.5x faster than what we've observed on storage
node in Scaleway when the node was behaving better.
On SSDs:
dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.57673 s, 681 MB/s
dd if=/dev/zero of=/tmp/test2.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 7.04899 s, 72.6 kB/s
What we've done to synchronize data: start a custom rsync server which exposes all /data
Config at /etc/rsyncd.conf
:
charset = utf-8
log file = /var/log/rsync.log
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsync.lock
numeric ids = yes
# allow enough time for --list-only on whole thing (20m)
timeout = 1200
uid = 0
gid = 0
# (not working) port = 12389
read only = true
dont compress = *.zim *.gz *.xz *.bz2 *.zip *.img
# private module for storage2 sync
[all.download.kiwix.org]
path = /data
comment = All /data content
max connections = 2
lock file = /var/lock/mirrors.lock
list = no
hosts allow = 135.181.224.247 2a01:4f9:3071:2d08::/64
Start a custom rsync server :
rsync --daemon --config /etc/rsyncd.conf --port 12389 -4
Sync data on storage2 node:
cd /data
rsync -vzrlptD --progress --delete --port=12389 master.download.kiwix.org::all.download.kiwix.org/ ./
Procedure to transfer services:
run a "last" rsync
update storage.k8s.kiwix.org
DNS record to point to new node IP
wait 5 minutes to wait for DNS propagation
check disk performances:
dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.05402 s, 212 MB/s
delete old storage node
replace storage2 labels to use storage
role
kubectl label node scw-kiwix-prod-foreign-e61f0a58d1cf43adafde59a node-role.kubernetes.io/storage2-
kubectl label node scw-kiwix-prod-foreign-e61f0a58d1cf43adafde59a node-role.kubernetes.io/storage=true
kubectl label node scw-kiwix-prod-foreign-e61f0a58d1cf43adafde59a k8s.kiwix.org/role=storage --overwrite
-og
to preserve owner and group (used only by mirrorbrain-db postgres and zim-receiver AFAIK)ZimitWorker
policy)Also forgot to unlabel old storage node:
kubectl label node scw-kiwix-prod-foreign-20abab10b97b430bbe7b571 node-role.kubernetes.io/storage-
kubectl label node scw-kiwix-prod-foreign-20abab10b97b430bbe7b571 node-role.kubernetes.io/storage-old=true
kubectl label node scw-kiwix-prod-foreign-20abab10b97b430bbe7b571 k8s.kiwix.org/role=storage-old --overwrite
All services are up and running as far as we know
This is the follow-up to https://github.com/kiwix/operations/issues/215
When want to move
storage
k8s node to a new SX65 Hetzner server.We will deploy it in Helsinki to benefit from lower prices with similar worldwide connectivity. PUE is also probably better in Finland than in Germany.
We will use the default 64G RAM to start with, and upgrade only when needed.
The server will have 2x 1T SSD + 4x 22T HDD.
The plan is following setup:
This choice is made to: