sungsoo / sungsoo.github.io

Sung-Soo Kim's Blog
30 stars 8 forks source link

analysis: KServe setup and testing #25

Closed sungsoo closed 2 years ago

sungsoo commented 2 years ago

analysis: KServe setup and testing

We'll maintain this issue as the following document in the link (README.md).

https://github.com/traindb-project/traindb-ml/tree/main/kserve

Kserve architecture diagram

sungsoo commented 2 years ago

Juju uninstallation

# Hard reinstall of clients
sudo snap remove --purge  juju
rm -rf ~/.local/share/juju
sudo snap install juju --classic

# Hard re-install of controllers or machines needs a bit more
# Gladly juju leaves a helper to do so
$ sudo /usr/sbin/remove-juju-services
sungsoo commented 2 years ago

Microk8s inspection

(base) ╭─sungsoo@z840 ~
╰─$ microk8s inspect
[sudo] password for sungsoo:
Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-kubelite is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster
Inspecting juju
  Inspect Juju
Inspecting kubeflow
  Inspect Kubeflow

아래와 같은 경고 메세지가 나옴 (iptables 관련)

WARNING:  IPtables FORWARD policy is DROP. Consider enabling traffic forwarding with: sudo iptables -P FORWARD ACCEPT
The change can be made persistent with: sudo apt-get install iptables-persistent

아래와 같이 수정하면 된다는 글이 있음

Adding --iptables=false to /var/snap/microk8s/current/args/dockerd fixes it.

sudo apt-get install iptables-persistent

관련글: microk8s.inspect takes a very long time

sungsoo commented 2 years ago

Setup private registry

(base) ╭─sungsoo@z840 ~
╰─$ microk8s enable registry
The registry will be created with the default size of 20Gi.
You can use the "size" argument while enabling the registry, eg microk8s.enable registry:size=30Gi
Addon storage is already enabled.
Applying registry manifest
namespace/container-registry created
persistentvolumeclaim/registry-claim created
deployment.apps/registry created
service/registry created
configmap/local-registry-hosting configured
The registry is enabled
sungsoo commented 2 years ago

Istio Installation

Microk8s 재설치 후, 다음과 같은 오류가 발생한다.

(base) ╭─sungsoo@z840 ~/kubeflow/istio-1.11.0
╰─$ bin/istioctl install --set profile=demo -y
Error: fetch Kubernetes config file: Get "https://129.254.187.182:16443/api?timeout=32s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "10.152.183.1")

이 오류는 다음과 같이 config 내용을 복사해 주면 해결된다.

(base) ╭─sungsoo@z840 ~/kubeflow/istio-1.11.0
╰─$ cp /var/snap/microk8s/current/credentials/client.config ${HOME}/.kube/config

문제 해결 후, 설치해도 Ingress gateway, Egress gateways에서 설치 오류가 발생한다.

참고 사이트

(base) ╭─sungsoo@z840 ~/kubeflow/istio-1.11.0
╰─$ istioctl install --set profile=demo -y                                                                                                                        1 ↵
✔ Istio core installed
✔ Istiod installed
✘ Egress gateways encountered an error: failed to wait for resource: resources not ready after 5m0s: timed out waiting for the conditionsystem/istio-ingressgateway
  Deployment/istio-system/istio-egressgateway (containers with unready status: [istio-proxy])
✘ Ingress gateways encountered an error: failed to wait for resource: resources not ready after 5m0s: timed out waiting for the condition
  Deployment/istio-system/istio-ingressgateway (containers with unready status: [istio-proxy])
- Pruning removed resources                                                                                                                                           Error: failed to install manifests: errors occurred during operation

원인을 분석해 보자.

먼저, 해당 pod 상태를 살펴보자.

(base) ╭─sungsoo@z840 ~
╰─$ k get pods -A
NAMESPACE                       NAME                                               READY   STATUS    RESTARTS   AGE
kube-system                     calico-kube-controllers-f7868dd95-5qvc5            1/1     Running   0          5h32m
kube-system                     coredns-7f9c69c78c-sjtf5                           1/1     Running   0          5h31m
kube-system                     calico-node-d72j4                                  1/1     Running   0          5h32m
...
istio-system                    istiod-79b65d448f-hjpcg                            1/1     Running   0          125m
istio-system                    istio-egressgateway-6f9d4548b-wxqkq                0/1     Running   0          125m
istio-system                    istio-ingressgateway-5dc645f586-jdf8n              0/1     Running   0          125m

2개 pod (egressgateway, ingressgateway) 가 READY 상태가 아니다. pod의 세부 내용을 살펴보자.

(base) ╭─sungsoo@z840 ~
╰─$ k describe pod istio-egressgateway-6f9d4548b-wxqkq -n istio-system                                                                                            1 ↵
Name:         istio-egressgateway-6f9d4548b-wxqkq
Namespace:    istio-system
Priority:     0
Node:         z840/129.254.187.182
Start Time:   Wed, 06 Jul 2022 11:22:32 +0900
Labels:       app=istio-egressgateway

...

Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Warning  Unhealthy  4m1s (x3743 over 128m)  kubelet  Readiness probe failed: Get "http://10.1.76.82:15021/healthz/ready": dial tcp 10.1.76.82:15021: connect: connection refused

(base) ╭─sungsoo@z840 ~
╰─$ k describe pod istio-ingressgateway-5dc645f586-jdf8n -n istio-system
Name:         istio-ingressgateway-5dc645f586-jdf8n
Namespace:    istio-system
Priority:     0
Node:         z840/129.254.187.182
Start Time:   Wed, 06 Jul 2022 11:22:32 +0900
Labels:       app=istio-ingressgateway

...

Events:
  Type     Reason     Age                    From     Message
  ----     ------     ----                   ----     -------
  Warning  Unhealthy  10s (x3893 over 129m)  kubelet  Readiness probe failed: Get "http://10.1.76.83:15021/healthz/ready": dial tcp 10.1.76.83:15021: connect: connection refused

둘 다 connection refused (timeout: 5분) 되었다고 나온다. 왜 일까?

sungsoo commented 2 years ago

리눅스 방화벽 - IPTALBES 란?

sungsoo commented 2 years ago

쿠버네티스 네트워크 스터디 1주차 2편: 네트워크 네임스페이스

유용한 한글로 된 쿠버네티스 스터디 사이트임

sungsoo commented 2 years ago

Helm Installation

Helm을 이용한 설치방법으로 진행해 보자.

You can fetch that script, and then execute it locally. It's well documented so that you can read through it and understand what it is doing before you run it.

$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
$ chmod 700 get_helm.sh
$ ./get_helm.sh
╭─sungsoo@ubuntu ~ 
╰─$ helm install istio-base istio/base -n istio-system                           1 ↵
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/sungsoo/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/sungsoo/.kube/config
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "https://192.168.55.227:16443/version": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "10.152.183.1")

╭─sungsoo@ubuntu ~ 
╰─$ microk8s config > ~/.kube/config     

╭─sungsoo@ubuntu ~ 
╰─$ helm install istio-base istio/base -n istio-system
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/sungsoo/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/sungsoo/.kube/config
NAME: istio-base
LAST DEPLOYED: Wed Jul  6 22:50:35 2022
NAMESPACE: istio-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Istio base successfully installed!

To learn more about the release, try:
  $ helm status istio-base
  $ helm get all istio-base
╭─sungsoo@ubuntu ~ 
╰─$ helm install istiod istio/istiod -n istio-system --wait
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/sungsoo/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/sungsoo/.kube/config
NAME: istiod
LAST DEPLOYED: Wed Jul  6 22:51:06 2022
NAMESPACE: istio-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
"istiod" successfully installed!

╭─sungsoo@ubuntu ~ 
╰─$ kubectl create namespace istio-ingress                                            1 ↵
namespace/istio-ingress created
╭─sungsoo@ubuntu ~ 
╰─$ kubectl label namespace istio-ingress istio-injection=enabled
namespace/istio-ingress labeled
╭─sungsoo@ubuntu ~ 
╰─$ helm install istio-ingress istio/gateway -n istio-ingress --wait
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/sungsoo/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/sungsoo/.kube/config
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "http://localhost:8080/version": dial tcp 127.0.0.1:8080: connect: connection refused

helm으로 설치하는 것은 실패!

다시 istioctl을 이용해서, 다음과 같이 설치하니 성공했다.

(base) ╭─sungsoo@z840 ~/kubeflow/istio-1.11.0
╰─$ bin/istoctl install
zsh: no such file or directory: bin/istoctl
(base) ╭─sungsoo@z840 ~/kubeflow/istio-1.11.0
╰─$ bin/istioctl install                                                                                                                                        127 ↵
This will install the Istio 1.11.0 default profile with ["Istio core" "Istiod" "Ingress gateways"] components into the cluster. Proceed? (y/N) y
✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✔ Installation complete
Thank you for installing Istio 1.11.  Please take a few minutes to tell us about your install/upgrade experience!  https://forms.gle/kWULBRjUv7hHci7T6

istio 관련 pod가 제대로 실행되었는지 확인한다.

(base) ╭─sungsoo@z840 ~
╰─$ k get pods -A -w
istio-system                    istiod-75d5bf4676-tvztm                            1/1     Running   0          28s
istio-system                    istio-ingressgateway-85fbdd86f7-pl2lc              1/1     Running   0          23

Install Knative

이제 Knative를 설치해 보자.

아래 명령어을 통해, Knative 버전 1.0을 설치한다.

kubectl apply --filename https://github.com/knative/serving/releases/download/knative-v1.0.0/serving-crds.yaml
kubectl apply --filename https://github.com/knative/serving/releases/download/knative-v1.0.0/serving-core.yaml
kubectl apply --filename https://github.com/knative/net-istio/releases/download/knative-v1.0.0/release.yaml

Install Cert Manager

kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v1.3.0/cert-manager.yaml
kubectl wait --for=condition=available --timeout=600s deployment/cert-manager-webhook -n cert-manager

Install KServe

kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.8.0/kserve.yaml
kubectl wait --for=condition=ready pod -l control-plane=kserve-controller-manager -n kserve --timeout=300s

Install KServe built-in servingruntimes

kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.8.0/kserve-runtimes.yaml

설치 상태 확인

앞의 모든 내용을 실행 후, 설치가 제대로 되어있는지 POD 상태를 확인한다.

(base) ╭─sungsoo@z840 ~
╰─$ k get pods -A -w                  
NAMESPACE                       NAME                                               READY   STATUS    RESTARTS   AGE
kube-system                     calico-kube-controllers-f7868dd95-5qvc5            1/1     Running   0          22h
kube-system                     coredns-7f9c69c78c-sjtf5                           1/1     Running   0          22h
kube-system                     calico-node-d72j4                                  1/1     Running   0          22h

... 중간 생략

traindb-ml                      ml-pipeline-visualizationserver-569ccd5d86-jcmvn   1/1     Running   0          20h
traindb-ml                      ml-pipeline-ui-artifact-77dfb58d8b-lf8rt           1/1     Running   0          20h
istio-system                    istiod-75d5bf4676-tvztm                            1/1     Running   0          10m
istio-system                    istio-ingressgateway-85fbdd86f7-pl2lc              1/1     Running   0          9m55s
knative-serving                 autoscaler-6c8884d6ff-k9rkf                        1/1     Running   0          5m7s
knative-serving                 activator-68b7698d74-cn24l                         1/1     Running   0          5m8s
knative-serving                 controller-76cf997d95-95xmz                        1/1     Running   0          5m7s
knative-serving                 domain-mapping-57fdbf97b-j6sqf                     1/1     Running   0          5m6s
knative-serving                 domainmapping-webhook-66c5f7d596-h9qzf             1/1     Running   0          5m6s
knative-serving                 webhook-7df8fd847b-2wskb                           1/1     Running   0          5m5s
knative-serving                 net-istio-controller-544874485d-8n5xz              1/1     Running   0          2m58s
knative-serving                 net-istio-webhook-695d588d65-wq7mp                 1/1     Running   0          2m58s
cert-manager                    cert-manager-cainjector-655d695d74-czptn           1/1     Running   0          2m19s
cert-manager                    cert-manager-76b7c557d5-b8hl2                      1/1     Running   0          2m18s
cert-manager                    cert-manager-webhook-7955b9bb97-7pv7v              1/1     Running   0          2m18s
kserve                          kserve-controller-manager-0                        2/2     Running   0          72s

설치했던 모든 POD가 정상적으로 시작 (READY) 되었으면 성공이다!

Kserve가 어떻게 실행되었는지 세부내용을 있는지 확인해 보자.

(base) ╭─sungsoo@z840 ~
╰─$ k describe pod kserve-controller-manager-0 -n kserve
Name:         kserve-controller-manager-0
Namespace:    kserve
Priority:     0
Node:         z840/129.254.187.182
Start Time:   Thu, 07 Jul 2022 06:39:14 +0900
Labels:       control-plane=kserve-controller-manager

... 중간 생략

Events:
  Type     Reason       Age                    From               Message
  ----     ------       ----                   ----               -------
  Normal   Scheduled    4m36s                  default-scheduler  Successfully assigned kserve/kserve-controller-manager-0 to z840
  Warning  FailedMount  4m36s (x2 over 4m36s)  kubelet            MountVolume.SetUp failed for volume "cert" : secret "kserve-webhook-server-cert" not found
  Normal   Pulling      4m33s                  kubelet            Pulling image "kserve/kserve-controller:v0.8.0"
  Normal   Pulled       4m26s                  kubelet            Successfully pulled image "kserve/kserve-controller:v0.8.0" in 7.872368634s
  Normal   Created      4m25s                  kubelet            Created container manager
  Normal   Started      4m25s                  kubelet            Started container manager
  Normal   Pulling      4m25s                  kubelet            Pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0"
  Normal   Pulled       4m19s                  kubelet            Successfully pulled image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" in 5.875500128s
  Normal   Created      4m19s                  kubelet            Created container kube-rbac-proxy
  Normal   Started      4m19s                  kubelet            Started container kube-rbac-proxy