scality / metalk8s

An opinionated Kubernetes distribution with a focus on long-term on-prem deployments
Apache License 2.0
357 stars 45 forks source link

Convert existing kubespray cluster into metal-k8s #239

Open virtuman opened 6 years ago

virtuman commented 6 years ago

Has anyone attempted the conversion of an existing kubespray cluster into metal-k8s?

is it possible to do it without any etcd-loss ?

NicolasT commented 6 years ago

As far as I'm aware, this has not been attempted (so please report back if you do!). However, I don't see any reason for trouble assuming the version of Kubespray we rely on is compatible with the version you used to set up the cluster.

virtuman commented 6 years ago

I was half-way done, when realized that the last cluster that i was converting to metal-k8s was running on Ubuntu, which raised a question - why is metal-k8s limited to CentOS - really doesn't seem like there's a point in having it limited to centos since it's become a much bigger than the original cluster deployment script. Do you see this as something that support for ubuntu+debian may be added in the near future?

virtuman commented 6 years ago

so everything seems to have worked perfectly. clearly i had to remove all components that are managed through metal-k8s, such as: nginx-ingress, efk stack, dashboard, and others that were installed manually throughout the life of this cluster, ie. heapster, grafana, prometheus, etc.

This was an Ubuntu cluster. And looks like the only 1 component that didn't work out of the box was the node-exporter, which I installed manually and still need to figure out how to set it up to work properly.

looking at the journalctl -f seeing a bunch of messages like this on all nodes, not sure what this is related to, will start investigating it shortly, but if anyone has any clues - please let me know where to start with this:

Aug 06 22:58:26 srv-eu2 audit: PROCTITLE proctitle=69707461626C65732D726573746F7265002D2D6E6F666C757368002D2D766572626F7365
Aug 06 22:58:26 srv-eu2 audit: NETFILTER_CFG table=raw family=2 entries=29
Aug 06 22:58:26 srv-eu2 audit[32524]: SYSCALL arch=c000003e syscall=54 success=yes exit=0 a0=3 a1=0 a2=40 a3=55b53cb9d000 items=0 ppid=7565 pid=32524 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iptables-restor" exe="/sbin/xtables-multi" key=(null)
Aug 06 22:58:26 srv-eu2 audit: PROCTITLE proctitle=69707461626C65732D726573746F7265002D2D6E6F666C757368002D2D766572626F7365
Aug 06 22:58:26 srv-eu2 audit: NETFILTER_CFG table=filter family=2 entries=1068
Aug 06 22:58:26 srv-eu2 audit[32529]: SYSCALL arch=c000003e syscall=54 success=yes exit=0 a0=3 a1=0 a2=40 a3=7f64d6f67020 items=0 ppid=2332 pid=32529 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iptables-restor" exe="/sbin/xtables-multi" key=(null)
Aug 06 22:58:26 srv-eu2 audit: PROCTITLE proctitle=69707461626C65732D726573746F7265002D2D6E6F666C757368002D2D766572626F7365
Aug 06 22:58:26 srv-eu2 audit: NETFILTER_CFG table=raw family=2 entries=32
Aug 06 22:58:26 srv-eu2 audit[32531]: SYSCALL arch=c000003e syscall=54 success=yes exit=0 a0=3 a1=0 a2=40 a3=55a689046000 items=0 ppid=2332 pid=32531 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iptables-restor" exe="/sbin/xtables-multi" key=(null)
Aug 06 22:58:26 srv-eu2 audit: PROCTITLE proctitle=69707461626C65732D726573746F7265002D2D6E6F666C757368002D2D766572626F7365
NicolasT commented 6 years ago

Hey, thanks for reporting back!

First of all, glad everything worked out. The messages you're seeing in journalctl -f are expected: as part of the MetalK8s deployment, we run the ansible-hardening playbook developed by the OpenStack project. This hardens the host system in various ways, and, among other things, sets up Linux 'auditing', which is what you see here. It's basically logging of various interactions services have with the kernel, for auditing purposes. So this is expected, and harmless.

As to your question 'Why not supporting Ubuntu?', and the failure to deploy node_exporter: we've launched the MetalK8s project as a way to deploy Zenko (https://zenko.io) on customer sites, who are running RHEL or CentOS on their servers. We're only a small team working on MetalK8s, so we focus on what's required for our product to run.

This said, we're not intentionally locking out other Linux distributions. As such, patches to make deployment on Ubuntu succeed would be welcome, assuming they don't break any existing functionality. Also, we currently can't spend CI resources on these alternative platforms, so any community-contributed OS support would need to be validated by third-party CI systems.

Put briefly, it's currently not our main focus, but contributions are definitely welcome!

virtuman commented 6 years ago

Thank you. Is there any reason you are using node-exporter-0.15 ? it keeps on generating thousands of error and warning messages that were fixed in node-exporter-0.16 . or does 0.15 work for you correctly on centos 7.5.* ?

NicolasT commented 6 years ago

The Grafana dashboards for node_exporter-exported information that come with kube-prometheus expect data from 0.15, so that's why we stick with it.

The only errors I'm aware of are denied statfs calls to virtual filesystems (Docker overlays and netns), which is harmless. If there'd be a way to silence those, that'd be a useful fix indeed.

Are you seeing any other errors?

virtuman commented 6 years ago

i thought it was statfs, but hundreds or thousands per minute, really cluttered everything. I don't really see anything that isn't working with node-exporter-0.16, i could be wrong, but grafana dashboards seem to be pretty happy with it too.