kubestellar / kubeflex

A flexible and scalable platform for running Kubernetes control plane APIs.
Apache License 2.0
47 stars 13 forks source link

bug: kflex ctx timeout too short and no output on success #291

Closed clubanderson closed 1 week ago

clubanderson commented 2 weeks ago

Describe the bug

running demo environment setup. kflex ctx its1 will timeout just as its1 is ready

Steps To Reproduce

  1. use bash script to install demo env.
#!/bin/bash                                                                                                                     

##########################################
function wait-for-cmd() (
    cmd="$@"
    wait_counter=0
    while ! (eval "$cmd") ; do
        if (($wait_counter > 100)); then
            echo "Failed to ${cmd}."
            exit 1
        fi
        ((wait_counter += 1))
        sleep 5
    done
)
##########################################

kind delete cluster --name kubeflex
kind delete cluster --name cluster1
kind delete cluster --name cluster2
kubectl config delete-context kind-kubeflex
kubectl config delete-context cluster1
kubectl config delete-context cluster2

export KUBESTELLAR_VERSION=0.24.0

bash <(curl -s https://raw.githubusercontent.com/kubestellar/kubestellar/v0.24.0/scripts/create-kind-cluster-with-SSL-passthrough.sh) --name kubeflex --port 9443

helm upgrade --install ks-core oci://ghcr.io/kubestellar/kubestellar/core-chart \
    --version $KUBESTELLAR_VERSION \
    --set-json='ITSes=[{"name":"its1"}]' \
    --set-json='WDSes=[{"name":"wds1"}]'

kubectl config delete-context its1 || true
kflex ctx its1
kubectl config delete-context wds1 || true
kflex ctx wds1

: wait for OCM cluster manager up
echo OCM time
wait-for-cmd '(($(wrap-cmd kubectl --context kind-kubeflex get deployments.apps -n open-cluster-management -o jsonpath='{.status.readyReplicas}' cluster-manager 2>/dev/null || echo 0) >= 1))'

: set flags to "" if you have installed KubeStellar on an OpenShift cluster
flags="--force-internal-endpoint-lookup"
clusters=(cluster1 cluster2);
for cluster in "${clusters[@]}"; do
   kind create cluster --name ${cluster}
   kubectl config rename-context kind-${cluster} ${cluster}
   clusteradm --context its1 get token | grep '^clusteradm join' | sed "s/<cluster_name>/${cluster}/" | awk '{print $0 " --context '${cluster}' --singleton '${flags}'"}' | sh
done

watch kubectl --context its1 get csr
  1. wait for completion
  2. observe that 'kflex ctx its1' may sometimes timeout

Expected Behavior

  1. kflex ctx should wait a bit longer than it presently does
  2. kflex ctx should give affirmative output when it succeeds, as it already outputs gives negative response when it fails

Additional Context

No response

MikeSpreitzer commented 2 weeks ago

Some comments on that script.

mspreitz@mjs13 ~ % which wrap-cmd
wrap-cmd not found

mspreitz@mjs13 ~ % which watch
watch not found
MikeSpreitzer commented 2 weeks ago

Also, that wait-for-cmd invocation is like the following, as far as shell grammar is concerned. Not necessarily incorrect, but misleading to the eye.

mspreitz@mjs13 ~ % echo 'abc'{def}'ghi'
abc{def}ghi
MikeSpreitzer commented 2 weeks ago

Also, I started in this state:

mspreitz@mjs13 kubestellar % yq .preferences ${KUBECONFIG:-$HOME/.kube/config}
extensions:
  - extension:
      data:
        kflex-initial-ctx-name: kind-hub
      metadata:
        creationTimestamp: null
        name: kflex-config-extension-name
    name: kflex-config-extension-name

That led the script to just fail in the kflex ctx commands. I have release 0.6.3 of kflex installed.

FYI, the Getting Started page of the KubeStellar website has been updated to include clearing out that troublesome bit of state among the initial cleanup commands.

MikeSpreitzer commented 1 week ago

Also, since this script does not set -e, the failure of the wait-for-cmd invocation does not stop the script from continuing.

MikeSpreitzer commented 1 week ago

Remember that on Linux, bash is in /usr/bin. That's why I start with #!/usr/bin/env bash.