Closed krishnakekan619 closed 2 years ago
Hi @jkryl @gila, Hope you guys are doing good today!
Approach- delete mayastor, crd, namespace and recreate/reinstall mayastor I have deleted everything including crd and namespace tried to reinstall the mayastor from scratch while installing mayastor again faced below issue:
mayastor daemonset is not running and going to in a loop
mayastor-czrdt 1/1 Running 1 23m
mayastor-brgrw 0/1 Error 5 23m
mayastor-brgrw 0/1 CrashLoopBackOff 5 23m
mayastor-czrdt 0/1 Error 1 24m
mayastor-czrdt 0/1 CrashLoopBackOff 1 24m
mayastor-czrdt 1/1 Running 2 24m
We can see below logs for the daemonset
[2021-06-11T07:22:43.668504167+00:00 INFO mayastor:main.rs:46] Starting Mayastor ..
[2021-06-11T07:22:43.668613480+00:00 INFO mayastor:main.rs:47] kernel io_uring support: no
[2021-06-11T07:22:43.668629087+00:00 INFO mayastor:main.rs:51] free_pages: 2048 nr_pages: 2048
thread 'main' panicked at 'Invalid Host Name: Custom { kind: Other, error: "failed to lookup address information: Name or service not known" }', mayastor/src/subsys/mbus/mod.rs:41:49
note: run with RUST_BACKTRACE=1
environment variable to display a backtrace
kubectl describe ds/mayastor-brgrw -n mayastor
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 30m default-scheduler Successfully assigned mayastor/mayastor-brgrw to cpocwl02
Normal Pulling 30m kubelet Pulling image "busybox:latest"
Normal Pulled 30m kubelet Successfully pulled image "busybox:latest" in 3.98728808s
Normal Created 30m kubelet Created container message-bus-probe
Normal Started 30m kubelet Started container message-bus-probe
Normal Pulled 12m kubelet Successfully pulled image "mayadata/mayastor:v0.7.0" in 2m12.734778131s
Normal Pulled 11m kubelet Successfully pulled image "mayadata/mayastor:v0.7.0" in 3.069738739s
Normal Pulled 11m kubelet Successfully pulled image "mayadata/mayastor:v0.7.0" in 4.257629762s
Normal Started 10m (x4 over 12m) kubelet Started container mayastor
Normal Pulled 10m kubelet Successfully pulled image "mayadata/mayastor:v0.7.0" in 2.979018302s
Normal Pulling 9m28s (x5 over 14m) kubelet Pulling image "mayadata/mayastor:v0.7.0"
Normal Created 9m21s (x5 over 12m) kubelet Created container mayastor
Normal Pulled 9m21s kubelet Successfully pulled image "mayadata/mayastor:v0.7.0" in 6.804279623s
Warning BackOff 46s (x41 over 11m) kubelet Back-off restarting failed container
# kubectl logs -n mayastor pod/mayastor-brgrw
[2021-06-11T10:33:30.306704036+00:00 INFO mayastor:main.rs:46] Starting Mayastor ..
[2021-06-11T10:33:30.306872051+00:00 INFO mayastor:main.rs:47] kernel io_uring support: no
[2021-06-11T10:33:30.306909127+00:00 INFO mayastor:main.rs:51] free_pages: 2048 nr_pages: 2048
thread 'main' panicked at 'Invalid Host Name: Custom { kind: Other, error: "failed to lookup address information: Name or service not known" }', mayastor/src/subsys/mbus/mod.rs:41:49
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[2021-06-11T10:34:22.085561709+00:00 INFO mayastor:main.rs:46] Starting Mayastor ..
[2021-06-11T10:34:22.085666183+00:00 INFO mayastor:main.rs:47] kernel io_uring support: no
[2021-06-11T10:34:22.085686827+00:00 INFO mayastor:main.rs:51] free_pages: 1024 nr_pages: 1024
thread 'main' panicked at 'Invalid Host Name: Custom { kind: Other, error: "failed to lookup address information: Name or service not known" }', mayastor/src/subsys/mbus/mod.rs:41:49
note: run with RUST_BACKTRACE=1
environment variable to display a backtrace
2. before creating any pool on nodes it is picking previously created node-pools(while reinstalling)
kubectl get msp -n mayastor NAME NODE STATE AGE pool-on-node-wl01 wl01 pending 102d pool-on-node-wl02 wl02 pending 102d pool-on-node-wl03 wl03 pending 102d
Could you suggest us anything here? Thanks
Hey @krishnakekan619, it seems like mayastor is not able to reach the nats service. The message has been improved recently but maybe you're still on an older version. Would you please be able to check the state of the nats deployment?
Hi @jkryl @gila thanks for your immediate response
kubectl -n mayastor get pods --selector=app=nats
NAME READY STATUS RESTARTS AGE
nats-6fdd6dfb4f-58rtp 1/1 Running 3 5h40m
kubectl describe pod/nats-6fdd6dfb4f-58rtp -n mayastor
Name: nats-6fdd6dfb4f-58rtp
Namespace: mayastor
Priority: 0
Node: cpocwl01/xx.xx.xx.1
Start Time: Fri, 11 Jun 2021 13:59:26 +0800
Labels: app=nats
pod-template-hash=6fdd6dfb4f
Annotations: cni.projectcalico.org/podIP: 10.42.87.194/32
cni.projectcalico.org/podIPs: 10.42.87.194/32
Status: Running
IP: 10.42.87.194
IPs:
IP: 10.42.87.194
Controlled By: ReplicaSet/nats-6fdd6dfb4f
Containers:
nats:
Container ID: docker://5a3acc52d5093e394729b98157adc92b9b67067168ce594a103ab32f08f0015d
Image: nats:2.1-alpine3.11
Image ID: docker-pullable://nats@sha256:ebe6d1b23a177223608c68d8617049228b00ee54d4e758d2eca44238326b141b
Port: 4222/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 11 Jun 2021 15:38:32 +0800
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 11 Jun 2021 15:30:27 +0800
Finished: Fri, 11 Jun 2021 15:37:29 +0800
Ready: True
Restart Count: 3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-x47dq (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-x47dq:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-x47dq
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
kubectl logs pod/nats-6fdd6dfb4f-58rtp -n mayastor
[1] 2021/06/11 07:38:32.192405 [INF] Starting nats-server version 2.1.8 [1] 2021/06/11 07:38:32.192440 [INF] Git commit [c0b574f] [1] 2021/06/11 07:38:32.192747 [INF] Starting http monitor on 0.0.0.0:8222 [1] 2021/06/11 07:38:32.192815 [INF] Listening for client connections on 0.0.0.0:4222 [1] 2021/06/11 07:38:32.192831 [INF] Server id is NBZZ7GL3DVUCGAK6TRHJ6CMBF6A4IR5TUJVR5JV3XFJ5YOA2FW2QDJ [1] 2021/06/11 07:38:32.192835 [INF] Server is ready [1] 2021/06/11 07:38:32.193029 [INF] Listening for route connections on 0.0.0.0:6222
Any suggestions on this? Thanks
Could you please describe the nats service as well? Also, if you could run a separate container and try to reach nats, eg: nc -vz nats 4222? Alternatively, you could also kubectl -n mayastor delete pod mayastor-xxxxxx and that should trigger the init-container to run again, and the init-container will probe nats for you.
hi @jkryl @gila Thanks for your quick response,
i ran the another pod and tried netcat command and it s working -> below is the output:
[root@centos-01 /]# nc -vz 10.43.5.130 4222
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.43.5.130:4222.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds
delete mayastor pods
kubectl delete pod/mayastor-4g9sv -n mayastor
pod "mayastor-4g9sv" deleted
kubectl delete pod/mayastor-ks4cd -n mayastor
pod "mayastor-ks4cd" deleted
status of daemonset pods
kubectl get pods -n mayastor -w
NAME READY STATUS RESTARTS AGE
mayastor-5dr45 0/1 Init:0/1 0 105s
mayastor-5qf82 0/1 Init:0/1 0 81s
events of daemonset pods are still the same
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m1s default-scheduler Successfully assigned mayastor/mayastor-5dr45 to cpocwl01
Normal Pulled 2m1s kubelet Container image "busybox:latest" already present on machine
Normal Created 2m kubelet Created container message-bus-probe
Normal Started 2m kubelet Started container message-bus-probe
couldyou please provice us any other pointers on this? thanks
can you get the logs from those pods now? It seems like they can't reach the nats service. Seems like the DNS service is not resolving nats?
Hi @gila previously running daemonset pod logs are below
# kubectl logs -n mayastor pod/mayastor-brgrw
[2021-06-11T10:33:30.306704036+00:00 INFO mayastor:main.rs:46] Starting Mayastor ..
[2021-06-11T10:33:30.306872051+00:00 INFO mayastor:main.rs:47] kernel io_uring support: no
[2021-06-11T10:33:30.306909127+00:00 INFO mayastor:main.rs:51] free_pages: 2048 nr_pages: 2048
thread 'main' panicked at 'Invalid Host Name: Custom { kind: Other, error: "failed to lookup address information: Name or service not known" }', mayastor/src/subsys/mbus/mod.rs:41:49
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
# kubectl logs -n mayastor pod/mayastor-czrdt
[2021-06-11T10:34:22.085561709+00:00 INFO mayastor:main.rs:46] Starting Mayastor ..
[2021-06-11T10:34:22.085666183+00:00 INFO mayastor:main.rs:47] kernel io_uring support: no
[2021-06-11T10:34:22.085686827+00:00 INFO mayastor:main.rs:51] free_pages: 1024 nr_pages: 1024
thread 'main' panicked at 'Invalid Host Name: Custom { kind: Other, error: "failed to lookup address information: Name or service not known" }', mayastor/src/subsys/mbus/mod.rs:41:49
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Currently available daemonset pod logs
kubectl logs -n mayastor pod/mayastor-5dr45
Error from server (BadRequest): container "mayastor" in pod "mayastor-5dr45" is waiting to start: PodInitializing
kubectl logs -n mayastor pod/mayastor-5qf82
Error from server (BadRequest): container "mayastor" in pod "mayastor-5qf82" is waiting to start: PodInitializing
nats logs are already given in above comment, let me know if any other logs do you want?
@gila FYR below is overall logs detail in a text file mayastor-logs-gila-1.txt
I need the pods from the init-container, you need to specify it: kubectl -n mayastor logs mayastor-5qf82 -c message-bus-probe
hi @gila below is logs you have asked for
# kubectl -n mayastor logs mayastor-5qf82 -c message-bus-probe
Waiting for message bus...
nc: bad address 'nats'
Waiting for message bus...
nc: bad address 'nats'
nc: bad address 'nats'
Waiting for message bus...
# kubectl -n mayastor logs mayastor-5dr45 -c message-bus-probe
nc: bad address 'nats'
Waiting for message bus...
nc: bad address 'nats'
Waiting for message bus...
nc: bad address 'nats'
Waiting for message bus...
nc: bad address 'nats'
Waiting for message bus...
hmm this seems to be some kind of DNS issue if we can't access nats by name..
hi @tiagolobocastro, @jkryl i can access nats service by its name i.e. nats
[root@centos /]# nc -vz 10.43.5.130 4222
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.43.5.130:4222.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.
[root@centos /]# nc -vz nats 4222
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.43.5.130:4222.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.
conf file of other pod
[root@centos /]# cat /etc/resolv.conf
nameserver 10.43.0.10
search mayastor.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
Could you please suggest any woraround because service is able to access by other pods and connectivty is happening Thanks,
@krishnakekan619 I've had a similar issue today, strangely enough. I had to restart kube-router and coredns pods and then the services we're able to resolve properly.
@gila we have restarted our dns-utils pods several times but still the error persists.
Describe the bug We have previously installed mayastor on our on premise RKE cluster After rebooting worker nodes we faced an issue-
To Reproduce Steps to reproduce the behavior: We need to reboot the worker nodes where storage blocks are present
Expected behavior after rebooting the worker node, mayastor has to work smoothly and msp should shows in online status
Screenshots after rebooting the worker nodes mayastor MSP status showing me as a pending
also below daemonset shows only 1 daeomonset is in ready status
OS info (please complete the following information):
Additional context we have checked below things
Do u suggest us any workaround so that even in the future if anyone reboot the worker nodes mayastor should work smoothly? @jkryl Thanks in advance