infinyon / fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
https://www.fluvio.io/
Apache License 2.0
3.79k stars 504 forks source link

[Bug]: Fluvio Cluster failed to install on microk8s #1933

Open sehz opened 2 years ago

sehz commented 2 years ago

What happened Try to install fluvio on microk8s

Expected behavior Installation

Describe the setup

$ fluvio version
Fluvio CLI           : 0.9.13
Fluvio CLI SHA256    : 2f41e2874c1390657c2574fcd53ff4c9a4f12a7b463be2a4e8efc5afa55870c0
Fluvio Platform      : Not available (kind-kind)
Git Commit           : e1ef6333f3fb12660a096f2ad697961ab2b085db
OS Details           : Darwin 10.16 (kernel 21.1.0)
=== Plugin Versions ===
Fluvio Runner (fluvio-run)     : 0.0.0
Infinyon Cloud CLI (fluvio-cloud) : 0.1.6

How to reproduce it (as minimally and precisely as possible)

 fluvio cluster start
Error: 
   0: Fluvio cluster error
   1: Failed to install Fluvio on Kubernetes
   2: Kubernetes client error
   3: no client cert crt path founded
   4: no client cert crt path founded

Environment (please complete the following information):

sehz commented 2 years ago

This is Kubernetes config:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUREekNDQWZlZ0F3SUJBZ0lVWTc3L0VDNHYweDI2K2VOMEJSMUxwUFgzQjd3d0RRWUpLb1pJaHZjTkFRRUwKQlFBd0Z6RVZNQk1HQTFVRUF3d01NVEF1TVRVeUxqRTRNeTR4TUI0WERUSXhNVEV5TURJd016a3lObG9YRFRNeApNVEV4T0RJd016a3lObG93RnpFVk1CTUdBMVVFQXd3TU1UQXVNVFV5TGpFNE15NHhNSUlCSWpBTkJna3Foa2lHCjl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUF4b0Q5NDNVTGlDdWdYUm5ueDdFMndFTkRPVkduYTA2VktkaUsKY1drNXhJRCtWeVhnV1pWcTN2U3N6S2xxemUrcnQ0bjYzVmZSeWpycmprM3NvMFNNRy9lZEdDajlnY0w2RjZTdgpGYURwY2lwdDhBYTRKVHFJbUY0U0w1UzJKb0RoUjJNdnFWKzRZd2N1d21zZXA5aE14ZGtSRnIzZTNyU0lvR0ltClVudEtRYkRCVmZNVlFiSXBibDFVenAyU3FnZ0xQMHdMbTdBcXdzVVVLUzBEV2puUUNxbmpUb3RyVUp5bHRhRXcKM2VWc3FVeXJCdUNIRHhESW1KOGtDdERwWG1YQlJSZXQ2bkZ6SGJmSHB3MHFPd3hCS3ZrTkRnc1EvdVJudWZVawpYM1QzZVlXak5wb3VlREhaQm9vV2FhZ2hhWGZLajdxcWg0VDlRbWJGcXYzcHErMWxwd0lEQVFBQm8xTXdVVEFkCkJnTlZIUTRFRmdRVXFGd2tDcGZWOHFkUW9qTHIxbXJOaDdLWUJkd3dId1lEVlIwakJCZ3dGb0FVcUZ3a0NwZlYKOHFkUW9qTHIxbXJOaDdLWUJkd3dEd1lEVlIwVEFRSC9CQVV3QXdFQi96QU5CZ2txaGtpRzl3MEJBUXNGQUFPQwpBUUVBRGhoU29VUlZQdWRrZFdkSk5rbVNDcGlOWFp3OFV5b09ZVjNrenpzY1JCemZKRUJUVnorWDhOY2NyUTFsCmx1RFJXVTdiSEJlejcwM1hJVEVUbHo3VmhyZE04Z3hZWEhuNng2dTVvZDJMVlhmbUZ5bVpMWVNRQ1kxeXNtOXgKZlhURjdRcGhFc1BwclU4ZVVERTlvaUNzR29hS0IrR0lIRDgzM3hEWkNTd0d0dHlPOXpTUmE4UDNhNm5YRGR0YwpnR090Mm5kVDFLaTRFUlJyZW82cU4yb0h0WlRyL0dWcVlrNE9IY1pWK1dzZEJOM0xLSjFja1VoK2NNWU9MdWJzCm5qeUwvV1g3U3QyL3dTWmt0RzUyZ2JueHpMYmtCR3ZCSE01UVIzQlIvWnF2Z0Yyai9oNmtVUnp0Lyt1a3l4NjEKWWFPb1V4QWJIV1J0WlpvR3NTZTNmcjJ3bmc9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
    server: https://192.168.64.4:16443
  name: microk8s-cluster
contexts:
- context:
    cluster: microk8s-cluster
    user: admin
  name: microk8s
current-context: microk8s
kind: Config
preferences: {}
users:
- name: admin
  user:
    token: QVRvZXhjRVU0QjIrZXJnMHBwTEZnZk56RnNMc3dWUkdmTXJITXM5cklCQT0K
sehz commented 2 years ago

Need to resolve https://github.com/ubuntu/microk8s/issues/2764 in order to build development image. As well bridge support: https://github.com/canonical/multipass/issues/118. Also instead of nodeport, alternative port forwarding: https://github.com/ubuntu/microk8s/issues/50

sehz commented 2 years ago

Move back to backlog because of too many open issues

github-actions[bot] commented 2 years ago

Stale issue message

sehz commented 2 years ago

@tjtelan You was able to install with proxy mode?

tjtelan commented 2 years ago

Short answer: yes. Without --proxy-addr the cluster start will not complete

$ fluvio cluster start
Current channel: stable
📝 Running pre-flight checks
    ✅ Kubectl active cluster microk8s at: https://172.31.42.67:16443 found
    ✅ Supported helm version 3.5.0+g32c2223 is installed
    ✅ Supported Kubernetes server 1.21.11-3+68355092a9b768 found
    ✅ Fixed: Fluvio Sys chart 0.9.23 is installed
    ✅ Previous fluvio installation not found
🎉 All checks passed!
✅ Installed Fluvio app chart: 0.9.23
🖥️  Trying to connect to SC: 172.31.28.69:30003 308 seconds elapsed \
^C

Steps:

There were a few things I had to do to get microk8s to work.

$ sudo microk8s start
Started.

$ mkdir -p ~/.kube
$ microk8s config > ~/.kube/config
$ fluvio cluster start --proxy-addr ip-172-31-42-67.us-east-2.compute.internal
Current channel: stable
📝 Running pre-flight checks
    ✅ Kubectl active cluster microk8s at: https://172.31.42.67:16443 found
    ✅ Supported helm version 3.5.0+g32c2223 is installed
    ✅ Supported Kubernetes server 1.21.11-3+68355092a9b768 found
    ✅ Fixed: Fluvio Sys chart 0.9.23 is installed
    ✅ Previous fluvio installation not found
🎉 All checks passed!
✅ Installed Fluvio app chart: 0.9.23
✅ Connected to SC: ip-172-31-42-67.us-east-2.compute.internal:30003
👤 Profile set
✅ SPU group main launched with 1 replicas
🎯 Successfully installed Fluvio!
sehz commented 2 years ago

Great. This is something @XtremeDevX can help

github-actions[bot] commented 2 years ago

Stale issue message

tjtelan commented 2 years ago

I have a rough idea about how I would tackle this issue.

First, I'll characterize what I'm seeing:

Current status

Fluvio needs kubernetes, and assumes that there's a cluster for it to use

So it's not unreasonable to assume there's a config. The assumption is the config lives at $HOME/.kube/config.

Fluvio needs kubectl

So we assume that there's a binary named kubectl in $PATH to cover a few functions not covered by k8-api.

(Related: https://github.com/infinyon/fluvio/issues/1131 covers what functions kubectl is used for)

Initial discovery of SPU hostname(s) during fluvio cluster install is too rigid for our needs

During installation of SC/SPU via fluvio cluster start, the SC doesn't know the correct external IP/hostname of the (first) SPU to advertise to the user. There was no way it couldThere doesn't seem to be a trivial way to autodiscover this.

We also assume the SPU is also available at the same IP/hostname as the SC when the startup handshake succeeds. Since the client connects directly to the SPU, this problem is intensified when the SPUs are on dedicated hosts.

Proposed solutions

Loosen assumptions for finding kubernetes config

We have at least 2 options to achieve this. I'll provide in order of my preference.

  1. Rely on kubectl more and let it provide us a combined config. By default this command is kubectl config view. An immediate benefit is we'll respect the KUBECONFIG env var.

See: https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/#set-the-kubeconfig-environment-variable

In cases like microk8s, we can support a different env var to supply a different command, like KUBECONFIG_CMD

e.g.

KUBECONFIG_CMD="microk8s config"
KUBECONFIG_CMD="k3d kubeconfig get fluvio"
  1. Respect the KUBECONFIG env var

We still assume the config file exists somewhere on disk. There's some dimension to this, but this explains how to combine multiple files.

See: https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/#set-the-kubeconfig-environment-variable

Loosen assumptions for finding kubectl

Until #1131 is solved, we need more flexibility w/ the variety of prepackaged kubectl

Many of the k8s distros provide a command for kubectl, because they don't assume you have it in $PATH. We can provide an env var like KUBECTL_CMD

e.g.

KUBECTL_CMD="minikube kubectl"
KUBECTL_CMD="microk8s kubectl"

Provide list of external SPU IPs with fluvio cluster start

I have a less clear idea for how to deal with this in a general case. But we can't always assume all SPU are available at one hostname, like --proxy-addr supports.

This might result in mild bit-banging, but during startup we can give all our external IPs.

If we have the appropriate node/pod affinity/anti-affinity, we can have stronger guarantees that SPU get scheduled exclusively onto individual nodes.

In that case, we can start a cluster with a comma-separated list

fluvio cluster start --spu 3 --spu-external-addr external-spu-addr-1.myhost.tld,external-spu-addr-2.myhost.tld,external-spu-addr-3.myhost.tld
github-actions[bot] commented 2 years ago

Stale issue message