Closed sungsoo closed 2 years ago
# Hard reinstall of clients
sudo snap remove --purge juju
rm -rf ~/.local/share/juju
sudo snap install juju --classic
# Hard re-install of controllers or machines needs a bit more
# Gladly juju leaves a helper to do so
$ sudo /usr/sbin/remove-juju-services
(base) ╭─sungsoo@z840 ~
╰─$ microk8s inspect
[sudo] password for sungsoo:
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-apiserver-kicker is running
Service snap.microk8s.daemon-kubelite is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy current linux distribution to the final report tarball
Copy openSSL information to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Inspecting juju
Inspect Juju
Inspecting kubeflow
Inspect Kubeflow
아래와 같은 경고 메세지가 나옴 (iptables 관련)
WARNING: IPtables FORWARD policy is DROP. Consider enabling traffic forwarding with: sudo iptables -P FORWARD ACCEPT
The change can be made persistent with: sudo apt-get install iptables-persistent
아래와 같이 수정하면 된다는 글이 있음
Adding --iptables=false to /var/snap/microk8s/current/args/dockerd fixes it.
sudo apt-get install iptables-persistent
(base) ╭─sungsoo@z840 ~
╰─$ microk8s enable registry
The registry will be created with the default size of 20Gi.
You can use the "size" argument while enabling the registry, eg microk8s.enable registry:size=30Gi
Addon storage is already enabled.
Applying registry manifest
namespace/container-registry created
persistentvolumeclaim/registry-claim created
deployment.apps/registry created
service/registry created
configmap/local-registry-hosting configured
The registry is enabled
Microk8s 재설치 후, 다음과 같은 오류가 발생한다.
(base) ╭─sungsoo@z840 ~/kubeflow/istio-1.11.0
╰─$ bin/istioctl install --set profile=demo -y
Error: fetch Kubernetes config file: Get "https://129.254.187.182:16443/api?timeout=32s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "10.152.183.1")
이 오류는 다음과 같이 config 내용을 복사해 주면 해결된다.
(base) ╭─sungsoo@z840 ~/kubeflow/istio-1.11.0
╰─$ cp /var/snap/microk8s/current/credentials/client.config ${HOME}/.kube/config
문제 해결 후, 설치해도 Ingress gateway, Egress gateways에서 설치 오류가 발생한다.
(base) ╭─sungsoo@z840 ~/kubeflow/istio-1.11.0
╰─$ istioctl install --set profile=demo -y 1 ↵
✔ Istio core installed
✔ Istiod installed
✘ Egress gateways encountered an error: failed to wait for resource: resources not ready after 5m0s: timed out waiting for the conditionsystem/istio-ingressgateway
Deployment/istio-system/istio-egressgateway (containers with unready status: [istio-proxy])
✘ Ingress gateways encountered an error: failed to wait for resource: resources not ready after 5m0s: timed out waiting for the condition
Deployment/istio-system/istio-ingressgateway (containers with unready status: [istio-proxy])
- Pruning removed resources Error: failed to install manifests: errors occurred during operation
원인을 분석해 보자.
먼저, 해당 pod 상태를 살펴보자.
(base) ╭─sungsoo@z840 ~
╰─$ k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-f7868dd95-5qvc5 1/1 Running 0 5h32m
kube-system coredns-7f9c69c78c-sjtf5 1/1 Running 0 5h31m
kube-system calico-node-d72j4 1/1 Running 0 5h32m
...
istio-system istiod-79b65d448f-hjpcg 1/1 Running 0 125m
istio-system istio-egressgateway-6f9d4548b-wxqkq 0/1 Running 0 125m
istio-system istio-ingressgateway-5dc645f586-jdf8n 0/1 Running 0 125m
2개 pod (egressgateway, ingressgateway) 가 READY 상태가 아니다. pod의 세부 내용을 살펴보자.
(base) ╭─sungsoo@z840 ~
╰─$ k describe pod istio-egressgateway-6f9d4548b-wxqkq -n istio-system 1 ↵
Name: istio-egressgateway-6f9d4548b-wxqkq
Namespace: istio-system
Priority: 0
Node: z840/129.254.187.182
Start Time: Wed, 06 Jul 2022 11:22:32 +0900
Labels: app=istio-egressgateway
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 4m1s (x3743 over 128m) kubelet Readiness probe failed: Get "http://10.1.76.82:15021/healthz/ready": dial tcp 10.1.76.82:15021: connect: connection refused
(base) ╭─sungsoo@z840 ~
╰─$ k describe pod istio-ingressgateway-5dc645f586-jdf8n -n istio-system
Name: istio-ingressgateway-5dc645f586-jdf8n
Namespace: istio-system
Priority: 0
Node: z840/129.254.187.182
Start Time: Wed, 06 Jul 2022 11:22:32 +0900
Labels: app=istio-ingressgateway
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 10s (x3893 over 129m) kubelet Readiness probe failed: Get "http://10.1.76.83:15021/healthz/ready": dial tcp 10.1.76.83:15021: connect: connection refused
둘 다 connection refused (timeout: 5분) 되었다고 나온다. 왜 일까?
쿠버네티스 네트워크 스터디 1주차 2편: 네트워크 네임스페이스
유용한 한글로 된 쿠버네티스 스터디 사이트임
Helm을 이용한 설치방법으로 진행해 보자.
You can fetch that script, and then execute it locally. It's well documented so that you can read through it and understand what it is doing before you run it.
$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
$ chmod 700 get_helm.sh
$ ./get_helm.sh
╭─sungsoo@ubuntu ~
╰─$ helm install istio-base istio/base -n istio-system 1 ↵
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/sungsoo/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/sungsoo/.kube/config
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "https://192.168.55.227:16443/version": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "10.152.183.1")
╭─sungsoo@ubuntu ~
╰─$ microk8s config > ~/.kube/config
╭─sungsoo@ubuntu ~
╰─$ helm install istio-base istio/base -n istio-system
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/sungsoo/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/sungsoo/.kube/config
NAME: istio-base
LAST DEPLOYED: Wed Jul 6 22:50:35 2022
NAMESPACE: istio-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Istio base successfully installed!
To learn more about the release, try:
$ helm status istio-base
$ helm get all istio-base
╭─sungsoo@ubuntu ~
╰─$ helm install istiod istio/istiod -n istio-system --wait
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/sungsoo/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/sungsoo/.kube/config
NAME: istiod
LAST DEPLOYED: Wed Jul 6 22:51:06 2022
NAMESPACE: istio-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
"istiod" successfully installed!
╭─sungsoo@ubuntu ~
╰─$ kubectl create namespace istio-ingress 1 ↵
namespace/istio-ingress created
╭─sungsoo@ubuntu ~
╰─$ kubectl label namespace istio-ingress istio-injection=enabled
namespace/istio-ingress labeled
╭─sungsoo@ubuntu ~
╰─$ helm install istio-ingress istio/gateway -n istio-ingress --wait
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/sungsoo/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/sungsoo/.kube/config
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "http://localhost:8080/version": dial tcp 127.0.0.1:8080: connect: connection refused
helm으로 설치하는 것은 실패!
다시 istioctl을 이용해서, 다음과 같이 설치하니 성공했다.
(base) ╭─sungsoo@z840 ~/kubeflow/istio-1.11.0
╰─$ bin/istoctl install
zsh: no such file or directory: bin/istoctl
(base) ╭─sungsoo@z840 ~/kubeflow/istio-1.11.0
╰─$ bin/istioctl install 127 ↵
This will install the Istio 1.11.0 default profile with ["Istio core" "Istiod" "Ingress gateways"] components into the cluster. Proceed? (y/N) y
✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✔ Installation complete
Thank you for installing Istio 1.11. Please take a few minutes to tell us about your install/upgrade experience! https://forms.gle/kWULBRjUv7hHci7T6
istio 관련 pod가 제대로 실행되었는지 확인한다.
(base) ╭─sungsoo@z840 ~
╰─$ k get pods -A -w
istio-system istiod-75d5bf4676-tvztm 1/1 Running 0 28s
istio-system istio-ingressgateway-85fbdd86f7-pl2lc 1/1 Running 0 23
이제 Knative를 설치해 보자.
아래 명령어을 통해, Knative 버전 1.0을 설치한다.
kubectl apply --filename https://github.com/knative/serving/releases/download/knative-v1.0.0/serving-crds.yaml
kubectl apply --filename https://github.com/knative/serving/releases/download/knative-v1.0.0/serving-core.yaml
kubectl apply --filename https://github.com/knative/net-istio/releases/download/knative-v1.0.0/release.yaml
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v1.3.0/cert-manager.yaml
kubectl wait --for=condition=available --timeout=600s deployment/cert-manager-webhook -n cert-manager
kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.8.0/kserve.yaml
kubectl wait --for=condition=ready pod -l control-plane=kserve-controller-manager -n kserve --timeout=300s
kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.8.0/kserve-runtimes.yaml
앞의 모든 내용을 실행 후, 설치가 제대로 되어있는지 POD 상태를 확인한다.
(base) ╭─sungsoo@z840 ~
╰─$ k get pods -A -w
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-f7868dd95-5qvc5 1/1 Running 0 22h
kube-system coredns-7f9c69c78c-sjtf5 1/1 Running 0 22h
kube-system calico-node-d72j4 1/1 Running 0 22h
... 중간 생략
traindb-ml ml-pipeline-visualizationserver-569ccd5d86-jcmvn 1/1 Running 0 20h
traindb-ml ml-pipeline-ui-artifact-77dfb58d8b-lf8rt 1/1 Running 0 20h
istio-system istiod-75d5bf4676-tvztm 1/1 Running 0 10m
istio-system istio-ingressgateway-85fbdd86f7-pl2lc 1/1 Running 0 9m55s
knative-serving autoscaler-6c8884d6ff-k9rkf 1/1 Running 0 5m7s
knative-serving activator-68b7698d74-cn24l 1/1 Running 0 5m8s
knative-serving controller-76cf997d95-95xmz 1/1 Running 0 5m7s
knative-serving domain-mapping-57fdbf97b-j6sqf 1/1 Running 0 5m6s
knative-serving domainmapping-webhook-66c5f7d596-h9qzf 1/1 Running 0 5m6s
knative-serving webhook-7df8fd847b-2wskb 1/1 Running 0 5m5s
knative-serving net-istio-controller-544874485d-8n5xz 1/1 Running 0 2m58s
knative-serving net-istio-webhook-695d588d65-wq7mp 1/1 Running 0 2m58s
cert-manager cert-manager-cainjector-655d695d74-czptn 1/1 Running 0 2m19s
cert-manager cert-manager-76b7c557d5-b8hl2 1/1 Running 0 2m18s
cert-manager cert-manager-webhook-7955b9bb97-7pv7v 1/1 Running 0 2m18s
kserve kserve-controller-manager-0 2/2 Running 0 72s
설치했던 모든 POD가 정상적으로 시작 (READY) 되었으면 성공이다!
Kserve가 어떻게 실행되었는지 세부내용을 있는지 확인해 보자.
(base) ╭─sungsoo@z840 ~
╰─$ k describe pod kserve-controller-manager-0 -n kserve
Name: kserve-controller-manager-0
Namespace: kserve
Priority: 0
Node: z840/129.254.187.182
Start Time: Thu, 07 Jul 2022 06:39:14 +0900
Labels: control-plane=kserve-controller-manager
... 중간 생략
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m36s default-scheduler Successfully assigned kserve/kserve-controller-manager-0 to z840
Warning FailedMount 4m36s (x2 over 4m36s) kubelet MountVolume.SetUp failed for volume "cert" : secret "kserve-webhook-server-cert" not found
Normal Pulling 4m33s kubelet Pulling image "kserve/kserve-controller:v0.8.0"
Normal Pulled 4m26s kubelet Successfully pulled image "kserve/kserve-controller:v0.8.0" in 7.872368634s
Normal Created 4m25s kubelet Created container manager
Normal Started 4m25s kubelet Started container manager
Normal Pulling 4m25s kubelet Pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0"
Normal Pulled 4m19s kubelet Successfully pulled image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" in 5.875500128s
Normal Created 4m19s kubelet Created container kube-rbac-proxy
Normal Started 4m19s kubelet Started container kube-rbac-proxy
analysis: KServe setup and testing
We'll maintain this issue as the following document in the link (README.md).
https://github.com/traindb-project/traindb-ml/tree/main/kserve