Closed max-grzanna closed 2 years ago
Kanto is not running yet, connected to #31. Test if direct testing in VM is possible. Use of Emissary Ingress, Hono, Influx (in Hono integrated).
Testing via Sandbox to have a functioning Kanto version, allows testing with containers #29.
At the moment, there is still a problem with the containers not starting correctly. Some pods are in running state, but the readiness and liveness checks keep failing, for example emissary ingress and the Hono components.
Wrong network settings with vagrant and VirtualBox like DNS-config could be the cause of these problems. The logs are not very informative so far.
Another reason could be missing resources. Using a more lightweight Debian distribution or maximizing the VirtualBox CPU cores and ram does not seem to fix the problem.
We reached out to another contact person for further information how to fix the issues. Another possibility is to use the hono and ditto sandbox for the communication via API.
@max-grzanna please add the log output to the ticket.
The following are the log outputs for the emissary-ingress and one of the Hono (device registry) pods: emissary-ingress-logs.txt hono-device-registry-logs.txt
Regarding the Hono-Device-Registry: The Logs to me indicate a networking issue based on the repeating exception:
io.netty.resolver.dns.DnsResolveContext$SearchDomainUnknownHostException: Failed to resolve 'kuksa-cloud-dispatch-router' and search domain query for configured domains failed as well: [kuksa.svc.cluster.local, svc.cluster.local, cluster.local]
But my guess is, that you already go this far, too.
Are all the services correctly showing up in your cluster? What is the name of the service dispatch router in your scenario? Can you provide the output of 'kubectl get svc'? Maybe there is some configuration issue in the Helm-Chart where either a wrong service is configured or your entries are ignored for the device registry.
Regarding the Ingress Logs:
I have not a good either, but it again seems like the container is not able to make proper network connections, e.g., when trying to connect to http://localhost:8004
.
Just based on the logs the interesting part might be:
time="2022-07-05 18:02:32.0369" level=error msg="Post \"http://localhost:8004/_internal/v0/watt?url=http%3A%2F%2Flocalhost%3A9696%2Fsnapshot\": dial tcp [::1]:8004: connect: connection refused" func=github.com/datawire/ambassador/v2/cmd/entrypoint.notifyWebhookUrl file="/go/cmd/entrypoint/notify.go:124" CMD=entrypoint PID=1 THREAD=/watcher/notifyCh time="2022-07-05 18:02:33.0392" level=error msg="Post \"http://localhost:8004/_internal/v0/watt?url=http%3A%2F%2Flocalhost%3A9696%2Fsnapshot\": dial tcp [::1]:8004: connect: connection refused" func=github.com/datawire/ambassador/v2/cmd/entrypoint.notifyWebhookUrl file="/go/cmd/entrypoint/notify.go:124" CMD=entrypoint PID=1 THREAD=/watcher/notifyCh time="2022-07-05 18:02:34.0465" level=error msg="Post \"http://localhost:8004/_internal/v0/watt?url=http%3A%2F%2Flocalhost%3A9696%2Fsnapshot\": dial tcp [::1]:8004: connect: connection refused" func=github.com/datawire/ambassador/v2/cmd/entrypoint.notifyWebhookUrl file="/go/cmd/entrypoint/notify.go:124" CMD=entrypoint PID=1 THREAD=/watcher/notifyCh time="2022-07-05 18:02:35.0481" level=error msg="Post \"http://localhost:8004/_internal/v0/watt?url=http%3A%2F%2Flocalhost%3A9696%2Fsnapshot\": dial tcp [::1]:8004: connect: connection refused" func=github.com/datawire/ambassador/v2/cmd/entrypoint.notifyWebhookUrl file="/go/cmd/entrypoint/notify.go:124" CMD=entrypoint PID=1 THREAD=/watcher/notifyCh 2022-07-05 18:02:35 diagd 2.3.1 [P18TMainThread] WARNING: Scout: could not post report: HTTPSConnectionPool(host='metriton.datawire.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f882053aa90>: Failed to establish a new connection: [Errno -3] Try again'))
and later
2022-07-05 18:02:41 diagd 2.3.1 [P20TAEW] WARNING: Scout: could not post report: HTTPSConnectionPool(host='metriton.datawire.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f881ff25a00>: Failed to establish a new connection: [Errno -3] Try again')) time="2022-07-05 18:03:14.0748" level=error msg="goroutine \":signal_handler:0\" exited with error: received signal terminated (triggering graceful shutdown)" func="github.com/datawire/dlib/dgroup.(*Group).goWorkerCtx.func1.1" file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:380" CMD=entrypoint PID=1 THREAD=":signal_handler:0"
Thank you for checking the logs and helping out. The services should be running properly, I think:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kuksa-cloud-emissary-ingress-admin ClusterIP 10.43.78.100 <none> 8877/TCP,8005/TCP 4m45s
kuksa-cloud-emissary-ingress-agent ClusterIP 10.43.38.70 <none> 80/TCP 4m45s
kuksa-cloud-artemis ClusterIP 10.43.35.167 <none> 5671/TCP 4m45s
kuksa-cloud-dispatch-router-ext NodePort 10.43.210.178 <none> 15671:30671/TCP,15672:30672/TCP 4m45s
kuksa-cloud-dispatch-router ClusterIP 10.43.250.73 <none> 5673/TCP 4m45s
kuksa-cloud-adapter-amqp-vertx NodePort 10.43.102.51 <none> 5672:32672/TCP,5671:32671/TCP 4m45s
kuksa-cloud-adapter-http-vertx NodePort 10.43.55.165 <none> 8080:30080/TCP,8443:30443/TCP 4m45s
kuksa-cloud-adapter-mqtt-vertx NodePort 10.43.129.5 <none> 1883:31883/TCP,8883:30883/TCP 4m44s
kuksa-cloud-service-auth ClusterIP 10.43.255.159 <none> 5671/TCP 4m44s
kuksa-cloud-service-command-router ClusterIP 10.43.21.90 <none> 5671/TCP 4m44s
kuksa-cloud-service-device-registry-ext NodePort 10.43.74.144 <none> 28080:31080/TCP,28443:31443/TCP 4m44s
kuksa-cloud-service-device-registry ClusterIP 10.43.215.86 <none> 5671/TCP,8080/TCP,8443/TCP 4m44s
kuksa-cloud-service-device-registry-headless ClusterIP None <none> <none> 4m44s
kuksa-cloud-influxdb ClusterIP 10.43.104.115 <none> 8086/TCP,8088/TCP 4m44s
kuksa-cloud-app-store ClusterIP 10.43.6.27 <none> 8080/TCP,8089/TCP 4m44s
kuksa-cloud-emissary-ingress LoadBalancer 10.43.18.231 10.0.2.15 3000:30347/TCP,38080:32644/TCP,8086:32224/TCP,48080:31065/TCP,1883:30967/TCP,18080:31788/TCP,58080:32502/TCP,5671:30204/TCP,28080:30617/TCP 4m45s
If I install the helm chart locally on my Ubuntu machine, the cloud is running and these problems are not present. But inside a VirtualBox VM, the pods are not starting properly. So I guess the helm charts are fine?
Networking seems to be the issue, as you said @eriksven. Maybe DNS?
On first view, the services seem to be ok and based on your description I would agree that the configuration in the Helm charts should not be the issue here.
Regarding the DNS idea, I am not sure, because the call to localhost in the Ingress container seemed to fail as well. Did you manage to get any connectivity within between containers the cluster in the VitualBox VM? Maybe the additional level of virtualization resultsin some misconfiguration of the network interfaces in relation to the Cluster. Just a guess, since it worked with the cluster on your host system.
I noticed that you moved this issue into the backlog. So you are focusing on other topics and are not completely blocked here now, right?
Thanks for your help and your comments @eriksven. We pushed the ticket back to the backlog because the problem was blocking, and we thought about using the Hono and Ditto sandboxes for the moment. But yesterday I found the root of the problem and I think we can continue working with the Vagrant VM Setup.
The problem was the CoreDns-Pod inside the kube-system namespace:
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
plugin/forward: no nameservers found
According to this Stack Overflow Topic Kubernetes CoreDNS in CrashLoopBackOff a temporary solution is to hardcode the Google DNS server into the manifest with the following command:
kubectl -n kube-system edit configmaps coredns -o yaml
Modify the sectionforward . /etc/resolv.conf
with forward . 172.16.232.1
(mycase i set 8.8.8.8
for the timebeing)
The connection between the two VMS seem to work, but is tested only with Hono at the moment.
Understand how Eclipse Kanto works and what configuration options are available.