Closed wangqia0309 closed 3 years ago
Hi, if you use ACK, you only need change the storageClassName 'gp2' to 'alicloud-disk-ssd',please read the instructions Use dynamically provisioned disks for stateful applications. We will consider testing environment minikube.
config/samples/apps_v1alpha1_nebulacluster.yaml please add more examples
by the way, If the reason you used minikube is for test/playground purposes, you could try https://github.com/wey-gu/nebula-operator-kind , it's a toy project to help create a nebula-operator sample cluster in one line, please note it's not for the production of course.
@MegaByte875 @wey-gu found one error when helm install nebula-operator to aliyun Error: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "Certificate" in version "cert-manager.io/v1", unable to recognize "": no matches for kind "Issuer" in version "cert-manager.io/v1"]
@MegaByte875 @wey-gu found one error when helm install nebula-operator to aliyun Error: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "Certificate" in version "cert-manager.io/v1", unable to recognize "": no matches for kind "Issuer" in version "cert-manager.io/v1"]
You need to install dependencies first, here the error refers to cert-manager CRD cannot be recognized, that is cert-manager is not installed :-)
ref: https://github.com/vesoft-inc/nebula-operator/blob/master/doc/user/install_guide.md
RBAC enabled (optional) CoreDNS >= 1.6.0 CertManager >= 1.2.0 OpenKruise >= 0.8.0 Helm >= 3.2.0
nebula-operator is launched, but storage,meta,graph process can't normally run @MegaByte875 @wey-gu
NAME READY STATUS RESTARTS AGE nebula-graphd-0 0/1 Running 0 6m16s nebula-metad-0 0/1 Running 0 6m17s nebula-operator-controller-manager-deployment-74fb689875-8p854 2/2 Running 0 58m nebula-operator-controller-manager-deployment-74fb689875-gkkpx 2/2 Running 0 58m nebula-operator-scheduler-deployment-fc9c797c6-5nhjm 2/2 Running 0 58m nebula-operator-scheduler-deployment-fc9c797c6-s5pgv 2/2 Running 0 58m nebula-storaged-0 0/1 Running 0 6m17s nebula-storaged-1 0/1 Running 0 6m17s nebula-storaged-2 0/1 Running 0 6m17s
please show me the output "kubectl describe pod nebula-storaged-0" and "kubectl get pod nebula-storaged-0 -oyaml" @wangqia0309
please show me the output "kubectl describe pod nebula-storaged-0" and "kubectl get pod nebula-storaged-0 -oyaml" @wangqia0309
@MegaByte875 this is describe
Name: nebula-storaged-0
Namespace: nebula
Priority: 0
Node: cn-beijing.172.17.0.236/172.17.0.236
Start Time: Fri, 11 Jun 2021 18:27:32 +0800
Labels: app.kubernetes.io/cluster=nebula
app.kubernetes.io/component=storaged
app.kubernetes.io/managed-by=nebula-operator
app.kubernetes.io/name=nebula-graph
controller-revision-hash=nebula-storaged-675dfb4688
statefulset.kubernetes.io/pod-name=nebula-storaged-0
Annotations: kubernetes.io/psp: ack.privileged
nebula-graph.io/cm-hash: 563a13ee319762c8
Status: Running
IP: 172.22.0.2
IPs:
IP: 172.22.0.2
Controlled By: StatefulSet/nebula-storaged
Containers:
storaged:
Container ID: docker://a55dc78665ba3938ef5079993b6bc4bb4cfc70833edbd8d6ef0635ba02dc0083
Image: registry-vpc.cn-beijing.aliyuncs.com/galixir/nebula:nebula-storaged-2.0
Image ID: docker-pullable://registry-vpc.cn-beijing.aliyuncs.com/galixir/nebula@sha256:0756d3ba427debc62239805bb2136f009d2305ce06b220b75e61f158056d75fb
Ports: 9779/TCP, 19779/TCP, 19780/TCP, 9778/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Command:
/bin/bash
-ecx
exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf --meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559 --local_ip=$(hostname).nebula-storaged-headless.nebula.svc.cluster.local --ws_ip=$(hostname).nebula-storaged-headless.nebula.svc.cluster.local --minloglevel=1 --v=0 --daemonize=false
State: Running
Started: Fri, 11 Jun 2021 18:27:44 +0800
Ready: False
Restart Count: 0
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 500m
memory: 500Mi
Readiness: http-get http://:19779/status delay=20s timeout=5s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/usr/local/nebula/data from storaged (rw,path="data")
/usr/local/nebula/etc from nebula-storaged (rw)
/usr/local/nebula/logs from storaged (rw,path="logs")
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4gnrq (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
storaged:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: storaged-nebula-storaged-0
ReadOnly: false
nebula-storaged:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: nebula-storaged
Optional: false
default-token-4gnrq:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-4gnrq
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 2m25s (x31858 over 3d16h) kubelet Readiness probe failed: Get http://172.22.0.2:19779/status: dial tcp 172.22.0.2:19779: connect: connection refused
and yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: ack.privileged
nebula-graph.io/cm-hash: 563a13ee319762c8
creationTimestamp: "2021-06-11T10:27:30Z"
generateName: nebula-storaged-
labels:
app.kubernetes.io/cluster: nebula
app.kubernetes.io/component: storaged
app.kubernetes.io/managed-by: nebula-operator
app.kubernetes.io/name: nebula-graph
controller-revision-hash: nebula-storaged-675dfb4688
statefulset.kubernetes.io/pod-name: nebula-storaged-0
name: nebula-storaged-0
namespace: nebula
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: nebula-storaged
uid: 71326412-f235-4fe4-a979-98fcb9bf42a2
resourceVersion: "929981884"
selfLink: /api/v1/namespaces/nebula/pods/nebula-storaged-0
uid: ad0b6616-4dd4-4380-adf0-2253e85f9c98
spec:
containers:
- command:
- /bin/bash
- -ecx
- exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf
--meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559
--local_ip=$(hostname).nebula-storaged-headless.nebula.svc.cluster.local --ws_ip=$(hostname).nebula-storaged-headless.nebula.svc.cluster.local
--minloglevel=1 --v=0 --daemonize=false
image: registry-vpc.cn-beijing.aliyuncs.com/galixir/nebula:nebula-storaged-2.0
imagePullPolicy: IfNotPresent
name: storaged
ports:
- containerPort: 9779
name: thrift
protocol: TCP
- containerPort: 19779
name: http
protocol: TCP
- containerPort: 19780
name: http2
protocol: TCP
- containerPort: 9778
name: admin
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /status
port: 19779
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: 500m
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/local/nebula/logs
name: storaged
subPath: logs
- mountPath: /usr/local/nebula/data
name: storaged
subPath: data
- mountPath: /usr/local/nebula/etc
name: nebula-storaged
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-4gnrq
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostname: nebula-storaged-0
imagePullSecrets:
- name: acr-credential-a0fa064cb4ce770d628e28389a5eff36
- name: acr-credential-79873d0d479756dcb41f2157e7ef6512
- name: acr-credential-24854a7970e1cadb8173632e77a2be46
- name: acr-credential-64e9a936224ff365bbd88cdc91a39a86
- name: acr-credential-df469bbe2cfaa576fab48b6f52d33a82
nodeName: cn-beijing.172.17.0.236
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
subdomain: nebula-storaged-headless
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/cluster: nebula
app.kubernetes.io/component: storaged
app.kubernetes.io/managed-by: nebula-operator
app.kubernetes.io/name: nebula-graph
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
volumes:
- name: storaged
persistentVolumeClaim:
claimName: storaged-nebula-storaged-0
- configMap:
defaultMode: 420
items:
- key: nebula-storaged.conf
path: nebula-storaged.conf
name: nebula-storaged
name: nebula-storaged
- name: default-token-4gnrq
secret:
defaultMode: 420
secretName: default-token-4gnrq
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2021-06-11T10:27:32Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2021-06-11T10:27:32Z"
message: 'containers with unready status: [storaged]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2021-06-11T10:27:32Z"
message: 'containers with unready status: [storaged]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2021-06-11T10:27:32Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://a55dc78665ba3938ef5079993b6bc4bb4cfc70833edbd8d6ef0635ba02dc0083
image: registry-vpc.cn-beijing.aliyuncs.com/galixir/nebula:nebula-storaged-2.0
imageID: docker-pullable://registry-vpc.cn-beijing.aliyuncs.com/galixir/nebula@sha256:0756d3ba427debc62239805bb2136f009d2305ce06b220b75e61f158056d75fb
lastState: {}
name: storaged
ready: false
restartCount: 0
started: true
state:
running:
startedAt: "2021-06-11T10:27:44Z"
hostIP: 172.17.0.236
phase: Running
podIP: 172.22.0.2
podIPs:
- ip: 172.22.0.2
qosClass: Burstable
startTime: "2021-06-11T10:27:32Z"
@wangqia0309 Hi, Can you provide some log?
log for nebula-storaged-0
kubectl exec -it nebula-storaged-0 -- cat logs/nebula-storaged.INFO
log for nebula-metad-0
kubectl exec -it nebula-metad-0 -- cat logs/nebula-metad.INFO
@wangqia0309 Hi, Can you provide some log?
- log for nebula-storaged-0
kubectl exec -it nebula-storaged-0 -- cat logs/nebula-storaged.INFO
- log for nebula-metad-0
kubectl exec -it nebula-metad-0 -- cat logs/nebula-metad.INFO
these services were not running,no container @veezhang
@wangqia0309 Emm, I'll create a cluster with aliyun. And which version of kubernetes?
@wangqia0309 Emm, I'll create a cluster with aliyun. And which version of kubernetes?
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.8-aliyun.1", GitCommit:"2cbb16c", GitTreeState:"", BuildDate:"2021-01-27T02:20:04Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
@wangqia0309
Can you provide the following information?
pv
and pvc
kubectl get pv,pvc
kubectl describe pvc storaged-nebula-storaged-0
@wangqia0309 Maybe the storage you requested does not meet the requirements of aliyun. If so, please modify your yaml definition.
See https://partners-intl.aliyun.com/help/doc-detail/25513.htm for details.
@wangqia0309 There is an example to create Nebula Cluster on aliyun. Hope it is useful to you.
cert-manager
, openkruise
and nebula-operator
.helm install cert-manager cert-manager --repo https://charts.jetstack.io \
--namespace cert-manager --create-namespace --version v1.3.1 \
--set installCRDs=true
helm install kruise https://github.com/openkruise/kruise/releases/download/v0.8.1/kruise-chart.tgz
helm install nebula-operator nebula-operator --repo https://vesoft-inc.github.io/nebula-operator/charts \
--namespace nebula-operator-system --create-namespace --version 0.1.0 \
--set image.kubeRBACProxy.image=kubesphere/kube-rbac-proxy:v0.8.0 \
--set image.kubeScheduler.image=kubesphere/kube-scheduler:v1.18.8
cat <<EOF | kubectl apply -f -
apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCluster
metadata:
name: nebula
spec:
graphd:
resources:
requests:
cpu: "500m"
memory: "500Mi"
limits:
cpu: "1"
memory: "1Gi"
replicas: 1
image: vesoft/nebula-graphd
version: v2.0.0
service:
type: NodePort
externalTrafficPolicy: Local
storageClaim:
resources:
requests:
storage: 20Gi
storageClassName: alicloud-disk-ssd
metad:
resources:
requests:
cpu: "500m"
memory: "500Mi"
limits:
cpu: "1"
memory: "1Gi"
replicas: 1
image: vesoft/nebula-metad
version: v2.0.0
storageClaim:
resources:
requests:
storage: 20Gi
storageClassName: alicloud-disk-ssd
storaged:
resources:
requests:
cpu: "500m"
memory: "500Mi"
limits:
cpu: "1"
memory: "1Gi"
replicas: 3
image: vesoft/nebula-storaged
version: v2.0.0
storageClaim:
resources:
requests:
storage: 20Gi
storageClassName: alicloud-disk-ssd
reference:
name: statefulsets.apps
version: v1
schedulerName: default-scheduler
imagePullPolicy: IfNotPresent
EOF
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: nebula-console
spec:
containers:
- name: nebula-console
image: vesoft/nebula-console:v2-nightly
command:
- sleep
- "1000000"
EOF
kubectl exec -it nebula-console -- nebula-console -u root -p a --addr nebula-graphd-svc --port 9669
2021/06/17 08:43:54 [INFO] connection pool is initialized successfully
Welcome to Nebula Graph!
(root@nebula) [(none)]> show hosts
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution |
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "nebula-storaged-0.nebula-storaged-headless.default.svc.cluster.local" | 9779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" |
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "nebula-storaged-1.nebula-storaged-headless.default.svc.cluster.local" | 9779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" |
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "nebula-storaged-2.nebula-storaged-headless.default.svc.cluster.local" | 9779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" |
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "Total" | | | 0 | | |
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
Got 4 rows (time spent 3315/4918 us)
Thu, 17 Jun 2021 08:44:03 UTC
(root@nebula) [(none)]>
terminate called after throwing an instance of 'std::system_error'
what(): Failed to resolve address for 'nebula-graphd-0.nebula-graphd-svc.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
*** Aborted at 1623927557 (unix time) try "date -d @1623927557" if you are using GNU date ***
PC: @ 0x7f0ee1fd4387 __GI_raise
*** SIGABRT (@0x1) received by PID 1 (TID 0x7f0ee2ec18c0) from PID 1; stack trace: ***
@ 0x1e5f9c1 (unknown)
@ 0x7f0ee237b62f (unknown)
@ 0x7f0ee1fd4387 __GI_raise
@ 0x7f0ee1fd5a77 __GI_abort
@ 0x107f647 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x2219b85 __cxxabiv1::__terminate()
@ 0x2219bd0 std::terminate()
@ 0x2219d03 __cxa_throw
@ 0x1063e8b (unknown)
@ 0x1d12292 folly::SocketAddress::getAddrInfo()
@ 0x1d122b3 folly::SocketAddress::setFromHostPort()
@ 0x19fe77e nebula::WebService::start()
@ 0x1080872 main
@ 0x7f0ee1fc0554 __libc_start_main
@ 0x1096b4d (unknown)
@veezhang i found the error log, i don't know why the address can't be resolved
@veezhang 应该是域名解析的问题,我们这边的k8s有自己的设置,需要是svc.gsvc.glx.local这样的后缀格式,但是你们的镜像里启动时候,写死了 --meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559 --local_ip=nebula-graphd-0.nebula-graphd-svc.nebula.svc.cluster.local --ws_ip=nebula-graphd-0.nebula-graphd-svc.nebula.svc.cluster.local 这种域名后缀,svc.cluster.local 这种情况该怎么解决呢,能不能不指定后面这段,因为我们的dns会自动补充上我们自己的域名后缀
@wangqia0309 I've create a PR #29
After merged, please configure kubernetesClusterDomain
with gsvc.glx.local
.
Notes:
helm repo add nebula-operator https://vesoft-inc.github.io/nebula-operator/charts
helm repo update
_See Nebula Operator Install Guide for details._
The outcome of this thread is gold, @veezhang thanks! And it could end up a quite reusable experience/blog post on nebula-operators on top of aliyun. @QingZ11
Thanks @wangqia0309 for your time exploring and helping to improve the nebula graph :-).
thanks for all, this is the best experience i had been with outer community, hope for the better nebula
@wangqia0309 Thanks!
config/samples/apps_v1alpha1_nebulacluster.yaml please add more examples