Open stelucz opened 11 months ago
Skupper is supported for the basic kubernetes networking model.
In the log you provided, it appears as though skupper routers in peer namespaces are attempting to establish mTls links to the router in the target namespace. If there is a networking intermediary in between, it must be configured to pass through the traffic. I am not familiar with cilium routers but presume there is a way to configure appropriately.
What does it mean basic kubernetes networking model?
I did not configure any peerings as Skupper router pod is not in ready state.
By basic networking I meant normal pod-to-pod and pod-to-service communications in a namespace.
The ports 45671 and 55671 are the ports used by the skupper-routers to establish mTls links. It appears that you have configured other sites to establish connections to the skupper-router in the namespace you provided.
Actually, I did not. These are all commands I have run on fresh cluster:
skupper init
Waiting for LoadBalancer IP or hostname...
Waiting for status...
Skupper status is not loaded yet.
Skupper is now installed in namespace 'default'. Use 'skupper status' to get more information.
skupper status
Status pending...
Hi,
Would you be able to attach the output from the 'skupper debug dump' command?
If there is no other skupper deployment in your environment, it would be helpful to understand why there are connection attempts to the router on ports 45671 and 55671.
It's more likely that the environment is probing the open ports for some reason. There are connection attempts at both the interior and edge listening ports. Those can probably be ignored.
The issue is that Kubernetes isn't getting health probe responses on port
My guess is that Cilium is not a factor in this situation.
-Ted
On Wed, Dec 20, 2023 at 12:08 PM ajssmith @.***> wrote:
Hi,
Would you be able to attach the output from the 'skupper debug dump' command?
If there is no other skupper deployment in your environment, it would be helpful to understand why there are connection attempts to the router on ports 45671 and 55671.
— Reply to this email directly, view it on GitHub https://github.com/skupperproject/skupper/issues/1331#issuecomment-1864837799, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFEKJWK3MRVVGR7YFYEALDYKMLO5AVCNFSM6AAAAABA3CKL32VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRUHAZTONZZHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>
skupper.tar.gz @ajssmith dump attached. @ted-ross there's no limit or requests set for Skupper router deployment.
...
resources: {}
securityContext:
runAsNonRoot: true
...
Based on information I gathered, you are right Ted. Cilium agent (router) is probing services/endpoints, so that's the reason for these log messages.
Regarding failed health probes, can it be connected with these error messages?
Looking at the logs, it would appear that the skupper components are unable to resolve the skupper local services:
dial tcp: lookup skupper-router-local.default.svc.cluster.local: no such host
I am not sure if this is related to the unsuccessful health checks but it looks like the collection of skupper pods are running ok except for inability to resolve services.
Thanks for pointing it out. Is the hostname (cluster name) of service configurable somehow? Because our clusters do not have cluster.local
domain, instead it is based on actual cluster name.
iiuc and if you are using CoreDNS as a deployment, you can configure the local domain in the kubelet with flag --cluster-domain or by modifying the CoreDNS corefile via config map:
Thanks, I meant if I can tell Skupper to use different domain than cluster.local.
Hi @ajssmith I managed to include cluster.local into DNS. However, is it possible to tell Skupper to use cluster name/own domain instead of cluster.local?
Once I manually added cluster.local into DNS. Skupper started to work, I can expose services between clusters, however headless service is not working. Service and pods are created in destination cluster, but connectivity from it does not work.
curl ds-service/v1
curl: (7) Failed to connect to ds-service port 80 after 215 ms: Couldn't connect to server
curl ds-0.ds-service/v1
curl: (7) Failed to connect to ds-0.ds-service port 80 after 1 ms: Couldn't connect to server
host ds-0.ds-service
ds-0.ds-service.default.svc.deva-stehlluk-azure has address 10.2.1.239
host ds-service
ds-service.default.svc.deva-stehlluk-azure has address 10.2.1.205
ds-service.default.svc.deva-stehlluk-azure has address 10.2.1.229
ds-service.default.svc.deva-stehlluk-azure has address 10.2.1.239
It works fine locally in source cluster, where statefulset is deployed.
definition of statefulset and headless service:
---
apiVersion: v1
kind: Service
metadata:
name: ds-service
labels:
app: ds
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: ds
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: ds
spec:
serviceName: "ds-service"
replicas: 3
selector:
matchLabels:
app: ds
template:
metadata:
labels:
app: ds
spec:
containers:
- name: ds
image: docker.io/cilium/starwars
Command to expose:
skupper expose statefulset ds --headless --port 80
I followed cockroachdb example and its works fine, at least looks so. So, I have no clue why my example does not work.
Can you help me with troubleshooting please? Thanks a lot.
I managed to get logs from proxy container (skupper router) in destination cluster, this is the output:
2024-01-08 13:38:27.733984 +0000 SERVER (info) Container Name: ds-0-4dda95dd-0e93-4ff8-843f-65ab84de370d
2024-01-08 13:38:27.734076 +0000 ROUTER (info) Router started in Edge mode
2024-01-08 13:38:27.734083 +0000 ROUTER (info) Version: 2.5.0
2024-01-08 13:38:27.734424 +0000 ROUTER_CORE (info) Streaming link scrubber: Scan interval: 30 seconds, max free pool: 128 links
2024-01-08 13:38:27.734440 +0000 ROUTER_CORE (info) Core module enabled: streaming_link_scrubber
2024-01-08 13:38:27.734444 +0000 ROUTER_CORE (info) Core module present but disabled: mobile_sync
2024-01-08 13:38:27.734488 +0000 ROUTER_CORE (info) Stuck delivery detection: Scan interval: 30 seconds, Delivery age threshold: 10 seconds
2024-01-08 13:38:27.734498 +0000 ROUTER_CORE (info) Core module enabled: stuck_delivery_detection
2024-01-08 13:38:27.735020 +0000 ROUTER_CORE (info) Core module enabled: heartbeat_server
2024-01-08 13:38:27.735055 +0000 ROUTER_CORE (info) Core module enabled: heartbeat_edge
2024-01-08 13:38:27.735061 +0000 ROUTER_CORE (info) Core module enabled: address_lookup_client
2024-01-08 13:38:27.735172 +0000 ROUTER_CORE (info) Core module present but disabled: mesh_discovery_interior
2024-01-08 13:38:27.735202 +0000 ROUTER_CORE (info) Core module enabled: mesh_discovery_edge
2024-01-08 13:38:27.735315 +0000 ROUTER_CORE (info) Core module present but disabled: edge_addr_tracking
2024-01-08 13:38:27.735341 +0000 ROUTER_CORE (info) Core module present but disabled: core_test_hooks
2024-01-08 13:38:27.735352 +0000 ROUTER_CORE (info) Core module enabled: edge_router
2024-01-08 13:38:27.735925 +0000 ROUTER_CORE (info) Protocol adaptor registered: tcp_lite
2024-01-08 13:38:27.736061 +0000 TCP_ADAPTOR (info) Router buffer memory ceiling: 15.61 GiB (4114941 buffers)
2024-01-08 13:38:27.736078 +0000 ROUTER (info) Adaptor buffer memory ceiling: 16756043776 (1022707 buffers)
2024-01-08 13:38:27.736083 +0000 ROUTER_CORE (info) Protocol adaptor registered: tcp
2024-01-08 13:38:27.736096 +0000 ROUTER_CORE (info) Protocol adaptor registered: http2
2024-01-08 13:38:27.736172 +0000 ROUTER_CORE (info) Protocol adaptor registered: http/1.x
2024-01-08 13:38:27.736367 +0000 ROUTER_CORE (info) Protocol adaptor registered: amqp
2024-01-08 13:38:27.736829 +0000 ROUTER_CORE (info) Router Core thread running. 0/ds-0-4dda95dd-0e93-4ff8-843f-65ab84de370d
2024-01-08 13:38:27.740852 +0000 FLOW_LOG (info) Protocol logging started
2024-01-08 13:38:27.831191 +0000 AGENT (info) Activating management agent on $_management_internal
2024-01-08 13:38:27.832723 +0000 POLICY (info) Policy configured maxConnections: 65535, policyDir: '',access rules enabled: 'false', use hostname patterns: 'false'
2024-01-08 13:38:27.833697 +0000 POLICY (info) Policy fallback defaultVhost is defined: '$default'
2024-01-08 13:38:27.834209 +0000 CONN_MGR (info) Created SSL Profile with name skupper-internal
2024-01-08 13:38:27.835371 +0000 TCP_ADAPTOR (info) Configured TcpListener (legacy encap) for ds-service-0:80, 0.0.0.0:80
2024-01-08 13:38:27.836595 +0000 CONN_MGR (info) Configured Connector: skupper-router.default.svc.cluster.local:45671 proto=any, role=edge, sslProfile=skupper-internal
2024-01-08 13:38:27.836643 +0000 CONN_MGR (info) Inter-router data connections set to 0
2024-01-08 13:38:27.838386 +0000 CONN_MGR (info) Configured Listener: localhost:5672 proto=any, role=normal
2024-01-08 13:38:27.839252 +0000 SERVER (info) Operational, 4 Threads Running (process ID 1)
2024-01-08 13:38:27.839383 +0000 ROUTER (info) Process VmSize 187.11 MiB RSS 36.32 MiB (15.61 GiB available memory)
2024-01-08 13:38:27.845684 +0000 SERVER (info) Listening on localhost:5672
2024-01-08 13:38:28.004631 +0000 ROUTER_CORE (info) [C1] Connection Opened: dir=out host=skupper-router.default.svc.cluster.local:45671 encrypted=TLSv1.3 auth=EXTERNAL user=(null) container_id=default-skupper-router-679c4958b-wqrr9 props={:product="skupper-router", :version="2.5.0", :"qd.conn-id"=7296, :"qd.annotations-version"=2}
2024-01-08 13:38:28.004853 +0000 ROUTER_CORE (info) Edge connection (id=1) to interior established
2024-01-08 13:38:28.013093 +0000 ROUTER_CORE (info) EDGE_MESH - Mesh identifier sent to Interior: V_zgbTSMBcUapgb
2024-01-08 13:38:28.017509 +0000 FLOW_LOG (info) Event collector detected. Begin sending beacons.
2024-01-08 13:38:28.017685 +0000 ROUTER (error) Listener ingress:80: proactor listener error on 0.0.0.0:80: proton:io (Permission denied - listen on 0.0.0.0:80)
2024-01-08 13:38:28.019053 +0000 ROUTER (error) Listener ingress:80: proactor listener error on 0.0.0.0:80: proton:io (Permission denied - listen on 0.0.0.0:80)
2024-01-08 13:40:19.969166 +0000 ROUTER (error) Listener ingress:80: proactor listener error on 0.0.0.0:80: proton:io (Permission denied - listen on 0.0.0.0:80)
2024-01-08 13:40:19.969225 +0000 FLOW_LOG (info) LOG [9U1R4:991930] BEGIN END parent=9U1R4:0 logSeverity=3 logText=LOG_ROUTER: Listener ingress:80: proactor listener error on 0.0.0.0:80: proton:io (Permission denied - listen on 0.0.0.0:80) sourceFile=/build/src/adaptors/adaptor_listener.c sourceLine=168
2024-01-08 13:40:19.969237 +0000 FLOW_LOG (info) LOG [9U1R4:991931] BEGIN END parent=9U1R4:0 logSeverity=3 logText=LOG_ROUTER: Listener ingress:80: proactor listener error on 0.0.0.0:80: proton:io (Permission denied - listen on 0.0.0.0:80) sourceFile=/build/src/adaptors/adaptor_listener.c sourceLine=168
2024-01-08 13:40:19.969166 +0000 ROUTER (error) Listener ingress:80: proactor listener error on 0.0.0.0:80: proton:io (Permission denied - listen on 0.0.0.0:80)
These error messages repeat infinitely.
The issue here is probably the port 80. I tried different example with using nginx with port 8080 and headless service in destination cluster works fine.
Is it known limitation of Skupper?
That is a current limitation when using the --headless option. I've raised an issue to track a fix for that: https://github.com/skupperproject/skupper/issues/1344
@grs Thanks a lot for quick reply! :)
Describe the bug Skupper does not start on Kubernetes cluster with Cilium.
Skupper router pod describe:
source addresses are Cilium "routers" of Kubernetes nodes.
How To Reproduce Steps to reproduce the behavior:
Expected behavior Skupper up and running.
Environment details
Additional context Is Skupper even supported with Cilium or other CNIs? I have not found any information about that in documentation.