vesoft-inc / nebula-operator

Operation utilities for Nebula Graph
https://vesoft-inc.github.io/nebula-operator
Apache License 2.0
79 stars 28 forks source link

how to set storage class for aliyun, and hope to support minikube for test and learning #21

Closed wangqia0309 closed 3 years ago

wangqia0309 commented 3 years ago

config/samples/apps_v1alpha1_nebulacluster.yaml please add more examples

MegaByte875 commented 3 years ago

Hi, if you use ACK, you only need change the storageClassName 'gp2' to 'alicloud-disk-ssd',please read the instructions Use dynamically provisioned disks for stateful applications. We will consider testing environment minikube.

wey-gu commented 3 years ago

config/samples/apps_v1alpha1_nebulacluster.yaml please add more examples

by the way, If the reason you used minikube is for test/playground purposes, you could try https://github.com/wey-gu/nebula-operator-kind , it's a toy project to help create a nebula-operator sample cluster in one line, please note it's not for the production of course.

wangqia0309 commented 3 years ago

@MegaByte875 @wey-gu found one error when helm install nebula-operator to aliyun Error: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "Certificate" in version "cert-manager.io/v1", unable to recognize "": no matches for kind "Issuer" in version "cert-manager.io/v1"]

wey-gu commented 3 years ago

@MegaByte875 @wey-gu found one error when helm install nebula-operator to aliyun Error: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "Certificate" in version "cert-manager.io/v1", unable to recognize "": no matches for kind "Issuer" in version "cert-manager.io/v1"]

You need to install dependencies first, here the error refers to cert-manager CRD cannot be recognized, that is cert-manager is not installed :-)

ref: https://github.com/vesoft-inc/nebula-operator/blob/master/doc/user/install_guide.md

RBAC enabled (optional) CoreDNS >= 1.6.0 CertManager >= 1.2.0 OpenKruise >= 0.8.0 Helm >= 3.2.0

wangqia0309 commented 3 years ago

image nebula-operator is launched, but storage,meta,graph process can't normally run @MegaByte875 @wey-gu

NAME READY STATUS RESTARTS AGE nebula-graphd-0 0/1 Running 0 6m16s nebula-metad-0 0/1 Running 0 6m17s nebula-operator-controller-manager-deployment-74fb689875-8p854 2/2 Running 0 58m nebula-operator-controller-manager-deployment-74fb689875-gkkpx 2/2 Running 0 58m nebula-operator-scheduler-deployment-fc9c797c6-5nhjm 2/2 Running 0 58m nebula-operator-scheduler-deployment-fc9c797c6-s5pgv 2/2 Running 0 58m nebula-storaged-0 0/1 Running 0 6m17s nebula-storaged-1 0/1 Running 0 6m17s nebula-storaged-2 0/1 Running 0 6m17s

MegaByte875 commented 3 years ago

please show me the output "kubectl describe pod nebula-storaged-0" and "kubectl get pod nebula-storaged-0 -oyaml" @wangqia0309

wangqia0309 commented 3 years ago

please show me the output "kubectl describe pod nebula-storaged-0" and "kubectl get pod nebula-storaged-0 -oyaml" @wangqia0309

@MegaByte875 this is describe

Name:         nebula-storaged-0
Namespace:    nebula
Priority:     0
Node:         cn-beijing.172.17.0.236/172.17.0.236
Start Time:   Fri, 11 Jun 2021 18:27:32 +0800
Labels:       app.kubernetes.io/cluster=nebula
              app.kubernetes.io/component=storaged
              app.kubernetes.io/managed-by=nebula-operator
              app.kubernetes.io/name=nebula-graph
              controller-revision-hash=nebula-storaged-675dfb4688
              statefulset.kubernetes.io/pod-name=nebula-storaged-0
Annotations:  kubernetes.io/psp: ack.privileged
              nebula-graph.io/cm-hash: 563a13ee319762c8
Status:       Running
IP:           172.22.0.2
IPs:
  IP:           172.22.0.2
Controlled By:  StatefulSet/nebula-storaged
Containers:
  storaged:
    Container ID:  docker://a55dc78665ba3938ef5079993b6bc4bb4cfc70833edbd8d6ef0635ba02dc0083
    Image:         registry-vpc.cn-beijing.aliyuncs.com/galixir/nebula:nebula-storaged-2.0
    Image ID:      docker-pullable://registry-vpc.cn-beijing.aliyuncs.com/galixir/nebula@sha256:0756d3ba427debc62239805bb2136f009d2305ce06b220b75e61f158056d75fb
    Ports:         9779/TCP, 19779/TCP, 19780/TCP, 9778/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      /bin/bash
      -ecx
      exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf --meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559 --local_ip=$(hostname).nebula-storaged-headless.nebula.svc.cluster.local --ws_ip=$(hostname).nebula-storaged-headless.nebula.svc.cluster.local --minloglevel=1 --v=0 --daemonize=false
    State:          Running
      Started:      Fri, 11 Jun 2021 18:27:44 +0800
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:        500m
      memory:     500Mi
    Readiness:    http-get http://:19779/status delay=20s timeout=5s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /usr/local/nebula/data from storaged (rw,path="data")
      /usr/local/nebula/etc from nebula-storaged (rw)
      /usr/local/nebula/logs from storaged (rw,path="logs")
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4gnrq (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  storaged:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  storaged-nebula-storaged-0
    ReadOnly:   false
  nebula-storaged:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      nebula-storaged
    Optional:  false
  default-token-4gnrq:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-4gnrq
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                        From     Message
  ----     ------     ----                       ----     -------
  Warning  Unhealthy  2m25s (x31858 over 3d16h)  kubelet  Readiness probe failed: Get http://172.22.0.2:19779/status: dial tcp 172.22.0.2:19779: connect: connection refused

and yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: ack.privileged
    nebula-graph.io/cm-hash: 563a13ee319762c8
  creationTimestamp: "2021-06-11T10:27:30Z"
  generateName: nebula-storaged-
  labels:
    app.kubernetes.io/cluster: nebula
    app.kubernetes.io/component: storaged
    app.kubernetes.io/managed-by: nebula-operator
    app.kubernetes.io/name: nebula-graph
    controller-revision-hash: nebula-storaged-675dfb4688
    statefulset.kubernetes.io/pod-name: nebula-storaged-0
  name: nebula-storaged-0
  namespace: nebula
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: nebula-storaged
    uid: 71326412-f235-4fe4-a979-98fcb9bf42a2
  resourceVersion: "929981884"
  selfLink: /api/v1/namespaces/nebula/pods/nebula-storaged-0
  uid: ad0b6616-4dd4-4380-adf0-2253e85f9c98
spec:
  containers:
  - command:
    - /bin/bash
    - -ecx
    - exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf
      --meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559
      --local_ip=$(hostname).nebula-storaged-headless.nebula.svc.cluster.local --ws_ip=$(hostname).nebula-storaged-headless.nebula.svc.cluster.local
      --minloglevel=1 --v=0 --daemonize=false
    image: registry-vpc.cn-beijing.aliyuncs.com/galixir/nebula:nebula-storaged-2.0
    imagePullPolicy: IfNotPresent
    name: storaged
    ports:
    - containerPort: 9779
      name: thrift
      protocol: TCP
    - containerPort: 19779
      name: http
      protocol: TCP
    - containerPort: 19780
      name: http2
      protocol: TCP
    - containerPort: 9778
      name: admin
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /status
        port: 19779
        scheme: HTTP
      initialDelaySeconds: 20
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 500Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /usr/local/nebula/logs
      name: storaged
      subPath: logs
    - mountPath: /usr/local/nebula/data
      name: storaged
      subPath: data
    - mountPath: /usr/local/nebula/etc
      name: nebula-storaged
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-4gnrq
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: nebula-storaged-0
  imagePullSecrets:
  - name: acr-credential-a0fa064cb4ce770d628e28389a5eff36
  - name: acr-credential-79873d0d479756dcb41f2157e7ef6512
  - name: acr-credential-24854a7970e1cadb8173632e77a2be46
  - name: acr-credential-64e9a936224ff365bbd88cdc91a39a86
  - name: acr-credential-df469bbe2cfaa576fab48b6f52d33a82
  nodeName: cn-beijing.172.17.0.236
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  subdomain: nebula-storaged-headless
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  topologySpreadConstraints:
  - labelSelector:
      matchLabels:
        app.kubernetes.io/cluster: nebula
        app.kubernetes.io/component: storaged
        app.kubernetes.io/managed-by: nebula-operator
        app.kubernetes.io/name: nebula-graph
    maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
  volumes:
  - name: storaged
    persistentVolumeClaim:
      claimName: storaged-nebula-storaged-0
  - configMap:
      defaultMode: 420
      items:
      - key: nebula-storaged.conf
        path: nebula-storaged.conf
      name: nebula-storaged
    name: nebula-storaged
  - name: default-token-4gnrq
    secret:
      defaultMode: 420
      secretName: default-token-4gnrq
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-06-11T10:27:32Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-06-11T10:27:32Z"
    message: 'containers with unready status: [storaged]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-06-11T10:27:32Z"
    message: 'containers with unready status: [storaged]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-06-11T10:27:32Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://a55dc78665ba3938ef5079993b6bc4bb4cfc70833edbd8d6ef0635ba02dc0083
    image: registry-vpc.cn-beijing.aliyuncs.com/galixir/nebula:nebula-storaged-2.0
    imageID: docker-pullable://registry-vpc.cn-beijing.aliyuncs.com/galixir/nebula@sha256:0756d3ba427debc62239805bb2136f009d2305ce06b220b75e61f158056d75fb
    lastState: {}
    name: storaged
    ready: false
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-06-11T10:27:44Z"
  hostIP: 172.17.0.236
  phase: Running
  podIP: 172.22.0.2
  podIPs:
  - ip: 172.22.0.2
  qosClass: Burstable
  startTime: "2021-06-11T10:27:32Z"
veezhang commented 3 years ago

@wangqia0309 Hi, Can you provide some log?

wangqia0309 commented 3 years ago

@wangqia0309 Hi, Can you provide some log?

  • log for nebula-storaged-0
kubectl exec -it nebula-storaged-0 -- cat logs/nebula-storaged.INFO 
  • log for nebula-metad-0
kubectl exec -it nebula-metad-0 -- cat logs/nebula-metad.INFO

these services were not running,no container @veezhang

veezhang commented 3 years ago

@wangqia0309 Emm, I'll create a cluster with aliyun. And which version of kubernetes?

wangqia0309 commented 3 years ago

@wangqia0309 Emm, I'll create a cluster with aliyun. And which version of kubernetes?

Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.8-aliyun.1", GitCommit:"2cbb16c", GitTreeState:"", BuildDate:"2021-01-27T02:20:04Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

veezhang commented 3 years ago

@wangqia0309

Can you provide the following information?

veezhang commented 3 years ago

@wangqia0309 Maybe the storage you requested does not meet the requirements of aliyun. If so, please modify your yaml definition.

See https://partners-intl.aliyun.com/help/doc-detail/25513.htm for details.

veezhang commented 3 years ago

@wangqia0309 There is an example to create Nebula Cluster on aliyun. Hope it is useful to you.

Create a Kubernetes and wait for ready.

Setup cert-manager, openkruise and nebula-operator.

helm install cert-manager cert-manager --repo https://charts.jetstack.io \
  --namespace cert-manager --create-namespace --version v1.3.1 \
  --set installCRDs=true

helm install kruise https://github.com/openkruise/kruise/releases/download/v0.8.1/kruise-chart.tgz

helm install nebula-operator nebula-operator --repo https://vesoft-inc.github.io/nebula-operator/charts \
  --namespace nebula-operator-system --create-namespace --version 0.1.0 \
  --set image.kubeRBACProxy.image=kubesphere/kube-rbac-proxy:v0.8.0 \
  --set image.kubeScheduler.image=kubesphere/kube-scheduler:v1.18.8

Create a Nebula Cluster

cat <<EOF | kubectl apply -f -
apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCluster
metadata:
  name: nebula
spec:
  graphd:
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
    replicas: 1
    image: vesoft/nebula-graphd
    version: v2.0.0
    service:
      type: NodePort
      externalTrafficPolicy: Local
    storageClaim:
      resources:
        requests:
          storage: 20Gi
      storageClassName: alicloud-disk-ssd
  metad:
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
    replicas: 1
    image: vesoft/nebula-metad
    version: v2.0.0
    storageClaim:
      resources:
        requests:
          storage: 20Gi
      storageClassName: alicloud-disk-ssd
  storaged:
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
    replicas: 3
    image: vesoft/nebula-storaged
    version: v2.0.0
    storageClaim:
      resources:
        requests:
          storage: 20Gi
      storageClassName: alicloud-disk-ssd
  reference:
    name: statefulsets.apps
    version: v1
  schedulerName: default-scheduler
  imagePullPolicy: IfNotPresent
EOF

Create a console to connect the nebula cluster


cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: nebula-console
spec:
  containers:
    - name: nebula-console
      image: vesoft/nebula-console:v2-nightly
      command:
      - sleep
      - "1000000"
EOF

Have fun


kubectl exec -it nebula-console -- nebula-console -u root -p a --addr nebula-graphd-svc --port 9669
2021/06/17 08:43:54 [INFO] connection pool is initialized successfully

Welcome to Nebula Graph!

(root@nebula) [(none)]> show hosts
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| Host                                                                   | Port | Status   | Leader count | Leader distribution  | Partition distribution |
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "nebula-storaged-0.nebula-storaged-headless.default.svc.cluster.local" | 9779 | "ONLINE" | 0            | "No valid partition" | "No valid partition"   |
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "nebula-storaged-1.nebula-storaged-headless.default.svc.cluster.local" | 9779 | "ONLINE" | 0            | "No valid partition" | "No valid partition"   |
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "nebula-storaged-2.nebula-storaged-headless.default.svc.cluster.local" | 9779 | "ONLINE" | 0            | "No valid partition" | "No valid partition"   |
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "Total"                                                                |      |          | 0            |                      |                        |
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
Got 4 rows (time spent 3315/4918 us)

Thu, 17 Jun 2021 08:44:03 UTC

(root@nebula) [(none)]>
wangqia0309 commented 3 years ago
terminate called after throwing an instance of 'std::system_error'
what(): Failed to resolve address for 'nebula-graphd-0.nebula-graphd-svc.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
*** Aborted at 1623927557 (unix time) try "date -d @1623927557" if you are using GNU date ***
PC: @ 0x7f0ee1fd4387 __GI_raise
*** SIGABRT (@0x1) received by PID 1 (TID 0x7f0ee2ec18c0) from PID 1; stack trace: ***
@ 0x1e5f9c1 (unknown)
@ 0x7f0ee237b62f (unknown)
@ 0x7f0ee1fd4387 __GI_raise
@ 0x7f0ee1fd5a77 __GI_abort
@ 0x107f647 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x2219b85 __cxxabiv1::__terminate()
@ 0x2219bd0 std::terminate()
@ 0x2219d03 __cxa_throw
@ 0x1063e8b (unknown)
@ 0x1d12292 folly::SocketAddress::getAddrInfo()
@ 0x1d122b3 folly::SocketAddress::setFromHostPort()
@ 0x19fe77e nebula::WebService::start()
@ 0x1080872 main
@ 0x7f0ee1fc0554 __libc_start_main
@ 0x1096b4d (unknown)

@veezhang i found the error log, i don't know why the address can't be resolved

wangqia0309 commented 3 years ago

@veezhang 应该是域名解析的问题,我们这边的k8s有自己的设置,需要是svc.gsvc.glx.local这样的后缀格式,但是你们的镜像里启动时候,写死了 --meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559 --local_ip=nebula-graphd-0.nebula-graphd-svc.nebula.svc.cluster.local --ws_ip=nebula-graphd-0.nebula-graphd-svc.nebula.svc.cluster.local 这种域名后缀,svc.cluster.local 这种情况该怎么解决呢,能不能不指定后面这段,因为我们的dns会自动补充上我们自己的域名后缀

veezhang commented 3 years ago

@wangqia0309 I've create a PR #29

After merged, please configure kubernetesClusterDomain with gsvc.glx.local.

Notes:

wey-gu commented 3 years ago

The outcome of this thread is gold, @veezhang thanks! And it could end up a quite reusable experience/blog post on nebula-operators on top of aliyun. @QingZ11

Thanks @wangqia0309 for your time exploring and helping to improve the nebula graph :-).

wangqia0309 commented 3 years ago

thanks for all, this is the best experience i had been with outer community, hope for the better nebula

veezhang commented 3 years ago

@wangqia0309 Thanks!