megaease / easemesh

A service mesh implementation for connecting, control, and observe services in spring-cloud.
https://megaease.com/easemesh
Apache License 2.0
507 stars 61 forks source link

Mesh installer #19

Closed zhao-kun closed 3 years ago

zhao-kun commented 3 years ago

The installer/uninstall for EaseMesh

install command

Deploy infrastructure components of the EaseMesh

Usage:
  emctl install [flags]

Examples:
emctl install <args>

Flags:
      --clean-when-failed                               Clean resources when installation failed, default true (default true)
      --easeemesh-ingress-replicas int                   (default 1)
      --easegress-image string                           (default "megaease/easegress:latest")
      --easemesh-control-plane-replicas int              (default 3)
      --easemesh-operator-image string                   (default "megaease/easemesh-operator:latest")
      --easemesh-operator-replicas int                   (default 1)
  -f, --file string                                     A yaml file specifying the install params.
      --heartbeat-interval int                           (default 5)
  -h, --help                                            help for install
      --image-registry-url string                        (default "docker.io")
      --mesh-control-plane-admin-port int               Port of mesh control plane admin for management (default 2381)
      --mesh-control-plane-check-healthz-max-time int   Max timeout in second for checking control panel component whether ready or not (default 60 seconds) (default 60)
      --mesh-control-plane-client-port int              Mesh control plane client port for remote accessing (default 2379)
      --mesh-control-plane-peer-port int                Port of mesh control plane for consensus each other (default 2380)
      --mesh-control-plane-pv-capacity string

                                                        PersistentVolume does not have enough resources, the required number is %d, but only %d.
                                                        EaseMesh control plane needs PersistentVolume to store data. You need to create PersistentVolume in advance and specify its storageClassName as %s.

                                                        You can create PersistentVolume by the following definition:

                                                        apiVersion: v1
                                                        kind: PersistentVolume
                                                        metadata:
                                                          labels:
                                                            app: easemesh
                                                          name: easemesh-pv
                                                        spec:
                                                          storageClassName: %s
                                                          accessModes:
                                                          - {ReadWriteOnce}
                                                          capacity:
                                                            storage: {%s}
                                                          hostPath:
                                                            path: {/opt/easemesh/}
                                                            type: "DirectoryOrCreate"
                                                             (default "3Gi")
      --mesh-control-plane-service-admin-port int        (default 2381)
      --mesh-control-plane-service-name string           (default "easemesh-controlplane-svc")
      --mesh-control-plane-service-peer-port int         (default 2380)
      --mesh-namespace string                            (default "easemesh")
      --mesh-storage-class-name string                   (default "easemesh-storage")
      --registry-type string                            The registry type for application service registry, one of: eureka|consul|nacos. (default "eureka")

reset command

Reset infrastructure components of the EaseMesh

Usage:
  emctl reset [flags]

Examples:
emctl reset

Flags:
  -h, --help                                     help for reset
      --mesh-control-plane-service-name string    (default "easemesh-controlplane-svc")
      --mesh-namespace string                     (default "easemesh")

To help you guys to review the codes, I will explain codes briefly

Modules

I divided the installer into four packages, each package install one of the resources need by the EaseMesh. Each package export four functions which are responsible for:

All four exported functions are wrapped as an InstallStage (or filter) object, all stage object was composed as a Chain of Responsibility to be invoked one by one

The package including :

Controller Plane

MeshIngress

MeshIngress is our ingress for mesh, which is an API object that manages external access to the mesh services in a cluster, typically HTTP. MeshIngress resources including:

CRD

CRD is K8s' CustomeResourceDefinition

zhao-kun commented 3 years ago

There is a known bugs about EG deployment, which need to be discussed with @xxx7xxxx and @benja-wu for more details

clusterMembers:
- id: 1031933740400126356
  name: ""
  peerURL: http://easemesh-control-plane-2.easemesh-controlplane-hs.easemesh:2380
- id: 5797175598832725525
  name: easemesh-control-plane-0
  peerURL: http://easemesh-control-plane-0.easemesh-controlplane-hs.easemesh:2380
- id: 1664029883460182567
  name: easemesh-control-plane-1
  peerURL: http://easemesh-control-plane-1.easemesh-controlplane-hs.easemesh:2380
knownMembers:
- id: 1031933740400126356
  name: ""
  peerURL: http://easemesh-control-plane-2.easemesh-controlplane-hs.easemesh:2380
- id: 5797175598832725525
  name: easemesh-control-plane-0
  peerURL: http://easemesh-control-plane-0.easemesh-controlplane-hs.easemesh:2380
- id: 1664029883460182567
  name: easemesh-control-plane-1
  peerURL: http://easemesh-control-plane-1.easemesh-controlplane-hs.easemesh:2380

Once the situation appeared, no matter how many times I reboot EG, EG can't be boot normally.

Logs in leader:

cluster-join-urls [http://easemesh-control-plane-0.easemesh-controlplane-hs.easemesh:2380 http://easemesh-control-plane-1.easemesh-controlplane-hs.easemesh:2380 http://easemesh-control-plane-2.easemesh-controlplane-hs.easemesh:2380] changed to empty because it tries to join itself
2021-06-04T11:10:22.205Z        INFO    server/main.go:61       Easegress release: 1.0.0, repo: https://github.com/zhao-kun/easegress-1, commit: git-0122cf3
2021-06-04T11:10:22.205Z        INFO    storage/storage.go:250  /running_objects.yaml not exist
2021-06-04T11:10:22.205Z        INFO    cluster/cluster.go:382  client connect with endpoints: [http://easemesh-control-plane-2.easemesh-controlplane-hs.easemesh:2380 http://easemesh-control-plane-0.easemesh-controlplane-hs.easemesh:2380 http://easemesh-control-plane-1.easemesh-controlplane-hs.easemesh:2380]
2021-06-04T11:10:22.206Z        INFO    cluster/cluster.go:186  starting etcd cluster
2021-06-04T11:10:22.206Z        INFO    cluster/cluster.go:396  client is ready
2021-06-04T11:10:32.206Z        ERROR   storage/storage.go:151  pull config failed: context deadline exceeded
2021-06-04T11:10:32.207Z        ERROR   cluster/cluster.go:237  add self to cluster failed: context canceled
2021-06-04T11:10:32.207Z        INFO    cluster/config.go:126   etcd config: init-cluster:easemesh-control-plane-0=http://easemesh-control-plane-0.easemesh-controlplane-hs.easemesh:2380,easemesh-control-plane-1=http://easemesh-control-plane-1.easemesh-controlplane-hs.easemesh:2380 cluster-state:existing force-new-cluster:false
2021-06-04T11:10:42.207Z        ERROR   storage/storage.go:151  pull config failed: context canceled
2021-06-04T11:10:42.211Z        INFO    storage/storage.go:250  /running_objects.yaml not exist
2021-06-04T11:10:52.214Z        ERROR   storage/storage.go:151  pull config failed: context canceled
2021-06-04T11:11:02.215Z        ERROR   storage/storage.go:151  pull config failed: context canceled
2021-06-04T11:11:02.216Z        INFO    supervisor/supervisor.go:197    create system controller StatusSyncController
2021-06-04T11:11:02.216Z        ERROR   api/api.go:107  get cluster mutex /config/lock failed: lease is not ready
2021-06-04T11:11:02.217Z        ERROR   storage/storage.go:313  sync runtime failed: put status failed: lease is not ready
2021-06-04T11:11:02.228Z        INFO    api/api.go:113  api server running in 0.0.0.0:2381
2021-06-04T11:11:05Z    ERROR   storage/storage.go:313  sync runtime failed: put status failed: lease is not ready
2021-06-04T11:11:10.001Z        ERROR   storage/storage.go:313  sync runtime failed: put status failed: lease is not ready
2021-06-04T11:11:15.003Z        ERROR   storage/storage.go:313  sync runtime failed: put status failed: lease is not ready
2021-06-04T11:11:20Z    ERROR   storage/storage.go:313  sync runtime failed: put status failed: lease is not ready

Logs in follower:

2021-06-04T11:30:43.546Z        INFO    cluster/cluster.go:550  hard stop server
2021-06-04T11:30:43.546Z        ERROR   cluster/cluster.go:255  start server timeout(10m0s)
panic: start server timeout(10m0s)
goroutine 29 [running]:
github.com/megaease/easegress/pkg/cluster.(*cluster).getReady(0xc00032eb00, 0x0, 0x176d4a7)
        /home/zhaokun/work/zhao-kun/easegress-1/pkg/cluster/cluster.go:256 +0x5dc
github.com/megaease/easegress/pkg/cluster.(*cluster).run(0xc00032eb00)
        /home/zhaokun/work/zhao-kun/easegress-1/pkg/cluster/cluster.go:188 +0xa5
created by github.com/megaease/easegress/pkg/cluster.New
        /home/zhaokun/work/zhao-kun/easegress-1/pkg/cluster/cluster.go:158 +0x265
zhao-kun commented 3 years ago

For testing, we need to prepare PV previously. The following PV example was offered to you guys to refer

apiVersion: v1
kind: PersistentVolume
metadata:
  name: easemesh-storage-pv-2
spec:
  capacity:
    storage: 4Gi
  # volumeMode field requires BlockVolume Alpha feature gate to be enabled.
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: easemesh-storage
  local:
    path: /volumes/easemesh-storage
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - kube-2
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: easemesh-storage-pv-3
spec:
  capacity:
    storage: 4Gi
  # volumeMode field requires BlockVolume Alpha feature gate to be enabled.
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: easemesh-storage
  local:
    path: /volumes/easemesh-storage
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - kube-3
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: easemesh-storage-pv-1
spec:
  capacity:
    storage: 4Gi
  # volumeMode field requires BlockVolume Alpha feature gate to be enabled.
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: easemesh-storage
  local:
    path: /volumes/easemesh-storage
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - kube-1