max-grzanna / Eclipse-Kuksa-Automotive-Edge-Extension

Eclipse Kuksa Automotive Edge Extension
3 stars 0 forks source link

Establish connection between kanto and cloud backend #32

Closed max-grzanna closed 2 years ago

max-grzanna commented 2 years ago

Understand how Eclipse Kanto works and what configuration options are available.

cb6569 commented 2 years ago

Kanto is not running yet, connected to #31. Test if direct testing in VM is possible. Use of Emissary Ingress, Hono, Influx (in Hono integrated).

Testing via Sandbox to have a functioning Kanto version, allows testing with containers #29.

max-grzanna commented 2 years ago

At the moment, there is still a problem with the containers not starting correctly. Some pods are in running state, but the readiness and liveness checks keep failing, for example emissary ingress and the Hono components.

grafik

grafik

Wrong network settings with vagrant and VirtualBox like DNS-config could be the cause of these problems. The logs are not very informative so far.

max-grzanna commented 2 years ago

Another reason could be missing resources. Using a more lightweight Debian distribution or maximizing the VirtualBox CPU cores and ram does not seem to fix the problem.

cb6569 commented 2 years ago

We reached out to another contact person for further information how to fix the issues. Another possibility is to use the hono and ditto sandbox for the communication via API.

bs-jokri commented 2 years ago

@max-grzanna please add the log output to the ticket.

max-grzanna commented 2 years ago

The following are the log outputs for the emissary-ingress and one of the Hono (device registry) pods: emissary-ingress-logs.txt hono-device-registry-logs.txt

eriksven commented 2 years ago

Regarding the Hono-Device-Registry: The Logs to me indicate a networking issue based on the repeating exception:

io.netty.resolver.dns.DnsResolveContext$SearchDomainUnknownHostException: Failed to resolve 'kuksa-cloud-dispatch-router' and search domain query for configured domains failed as well: [kuksa.svc.cluster.local, svc.cluster.local, cluster.local] But my guess is, that you already go this far, too.

Are all the services correctly showing up in your cluster? What is the name of the service dispatch router in your scenario? Can you provide the output of 'kubectl get svc'? Maybe there is some configuration issue in the Helm-Chart where either a wrong service is configured or your entries are ignored for the device registry.

eriksven commented 2 years ago

Regarding the Ingress Logs:

I have not a good either, but it again seems like the container is not able to make proper network connections, e.g., when trying to connect to http://localhost:8004 .

Just based on the logs the interesting part might be:

time="2022-07-05 18:02:32.0369" level=error msg="Post \"http://localhost:8004/_internal/v0/watt?url=http%3A%2F%2Flocalhost%3A9696%2Fsnapshot\": dial tcp [::1]:8004: connect: connection refused" func=github.com/datawire/ambassador/v2/cmd/entrypoint.notifyWebhookUrl file="/go/cmd/entrypoint/notify.go:124" CMD=entrypoint PID=1 THREAD=/watcher/notifyCh time="2022-07-05 18:02:33.0392" level=error msg="Post \"http://localhost:8004/_internal/v0/watt?url=http%3A%2F%2Flocalhost%3A9696%2Fsnapshot\": dial tcp [::1]:8004: connect: connection refused" func=github.com/datawire/ambassador/v2/cmd/entrypoint.notifyWebhookUrl file="/go/cmd/entrypoint/notify.go:124" CMD=entrypoint PID=1 THREAD=/watcher/notifyCh time="2022-07-05 18:02:34.0465" level=error msg="Post \"http://localhost:8004/_internal/v0/watt?url=http%3A%2F%2Flocalhost%3A9696%2Fsnapshot\": dial tcp [::1]:8004: connect: connection refused" func=github.com/datawire/ambassador/v2/cmd/entrypoint.notifyWebhookUrl file="/go/cmd/entrypoint/notify.go:124" CMD=entrypoint PID=1 THREAD=/watcher/notifyCh time="2022-07-05 18:02:35.0481" level=error msg="Post \"http://localhost:8004/_internal/v0/watt?url=http%3A%2F%2Flocalhost%3A9696%2Fsnapshot\": dial tcp [::1]:8004: connect: connection refused" func=github.com/datawire/ambassador/v2/cmd/entrypoint.notifyWebhookUrl file="/go/cmd/entrypoint/notify.go:124" CMD=entrypoint PID=1 THREAD=/watcher/notifyCh 2022-07-05 18:02:35 diagd 2.3.1 [P18TMainThread] WARNING: Scout: could not post report: HTTPSConnectionPool(host='metriton.datawire.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f882053aa90>: Failed to establish a new connection: [Errno -3] Try again')) and later

2022-07-05 18:02:41 diagd 2.3.1 [P20TAEW] WARNING: Scout: could not post report: HTTPSConnectionPool(host='metriton.datawire.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f881ff25a00>: Failed to establish a new connection: [Errno -3] Try again')) time="2022-07-05 18:03:14.0748" level=error msg="goroutine \":signal_handler:0\" exited with error: received signal terminated (triggering graceful shutdown)" func="github.com/datawire/dlib/dgroup.(*Group).goWorkerCtx.func1.1" file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:380" CMD=entrypoint PID=1 THREAD=":signal_handler:0"

max-grzanna commented 2 years ago

Thank you for checking the logs and helping out. The services should be running properly, I think:

NAME                                           TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                                                                                                       AGE
kuksa-cloud-emissary-ingress-admin             ClusterIP      10.43.78.100    <none>        8877/TCP,8005/TCP                                                                                                                             4m45s
kuksa-cloud-emissary-ingress-agent             ClusterIP      10.43.38.70     <none>        80/TCP                                                                                                                                        4m45s
kuksa-cloud-artemis                            ClusterIP      10.43.35.167    <none>        5671/TCP                                                                                                                                      4m45s
kuksa-cloud-dispatch-router-ext                NodePort       10.43.210.178   <none>        15671:30671/TCP,15672:30672/TCP                                                                                                               4m45s
kuksa-cloud-dispatch-router                    ClusterIP      10.43.250.73    <none>        5673/TCP                                                                                                                                      4m45s
kuksa-cloud-adapter-amqp-vertx                 NodePort       10.43.102.51    <none>        5672:32672/TCP,5671:32671/TCP                                                                                                                 4m45s
kuksa-cloud-adapter-http-vertx                 NodePort       10.43.55.165    <none>        8080:30080/TCP,8443:30443/TCP                                                                                                                 4m45s
kuksa-cloud-adapter-mqtt-vertx                 NodePort       10.43.129.5     <none>        1883:31883/TCP,8883:30883/TCP                                                                                                                 4m44s
kuksa-cloud-service-auth                       ClusterIP      10.43.255.159   <none>        5671/TCP                                                                                                                                      4m44s
kuksa-cloud-service-command-router             ClusterIP      10.43.21.90     <none>        5671/TCP                                                                                                                                      4m44s
kuksa-cloud-service-device-registry-ext        NodePort       10.43.74.144    <none>        28080:31080/TCP,28443:31443/TCP                                                                                                               4m44s
kuksa-cloud-service-device-registry            ClusterIP      10.43.215.86    <none>        5671/TCP,8080/TCP,8443/TCP                                                                                                                    4m44s
kuksa-cloud-service-device-registry-headless   ClusterIP      None            <none>        <none>                                                                                                                                        4m44s
kuksa-cloud-influxdb                           ClusterIP      10.43.104.115   <none>        8086/TCP,8088/TCP                                                                                                                             4m44s
kuksa-cloud-app-store                          ClusterIP      10.43.6.27      <none>        8080/TCP,8089/TCP                                                                                                                             4m44s
kuksa-cloud-emissary-ingress                   LoadBalancer   10.43.18.231    10.0.2.15     3000:30347/TCP,38080:32644/TCP,8086:32224/TCP,48080:31065/TCP,1883:30967/TCP,18080:31788/TCP,58080:32502/TCP,5671:30204/TCP,28080:30617/TCP   4m45s

If I install the helm chart locally on my Ubuntu machine, the cloud is running and these problems are not present. But inside a VirtualBox VM, the pods are not starting properly. So I guess the helm charts are fine?

Networking seems to be the issue, as you said @eriksven. Maybe DNS?

eriksven commented 2 years ago

On first view, the services seem to be ok and based on your description I would agree that the configuration in the Helm charts should not be the issue here.

Regarding the DNS idea, I am not sure, because the call to localhost in the Ingress container seemed to fail as well. Did you manage to get any connectivity within between containers the cluster in the VitualBox VM? Maybe the additional level of virtualization resultsin some misconfiguration of the network interfaces in relation to the Cluster. Just a guess, since it worked with the cluster on your host system.

I noticed that you moved this issue into the backlog. So you are focusing on other topics and are not completely blocked here now, right?

max-grzanna commented 2 years ago

Thanks for your help and your comments @eriksven. We pushed the ticket back to the backlog because the problem was blocking, and we thought about using the Hono and Ditto sandboxes for the moment. But yesterday I found the root of the problem and I think we can continue working with the Vagrant VM Setup.

The problem was the CoreDns-Pod inside the kube-system namespace:

[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
plugin/forward: no nameservers found

According to this Stack Overflow Topic Kubernetes CoreDNS in CrashLoopBackOff a temporary solution is to hardcode the Google DNS server into the manifest with the following command:

kubectl -n kube-system edit configmaps coredns -o yaml

Modify the sectionforward . /etc/resolv.conf with forward . 172.16.232.1 (mycase i set 8.8.8.8 for the timebeing)

The connection between the two VMS seem to work, but is tested only with Hono at the moment.