k8spacket / k8spacket-helm-chart

k8spacket helm chart
Apache License 2.0
20 stars 7 forks source link

k8spacket service unreachable #3

Closed bobertrublik closed 2 years ago

bobertrublik commented 2 years ago

Hello,

I'm trying to access the metrics of the k8spacket pods. However none of them are reachable by Grafana or a pod in the same namespace running curl.

[ root@curl-test:/ ]$ curl http://k8spacket.k8spacket.svc.cluster.local:8080
curl: (7) Failed to connect to k8spacket.k8spacket.svc.cluster.local port 8080: Connection timed out

Same for any other endpoints defined in k8spacket.go.

k8spacket commented 2 years ago

@bobertrublik This address is the internal address of the Kubernetes cluster. It will be not reachable from your local machine (if I understand [ root@curl-test:/ ]$ well) Now I see info a pod in the same namespace running curl. Check further hints:

Check if k8spacket is running:

kubectl -n k8spacket get pods

Check if k8spacket service exists:

kubectl -n k8spacket get svc

If not, please share here logs from one of the k8spacket pods.

Additionally, share the screen of Node Graph API plugin configuration (https://grafana.com/grafana/plugins/hamedkarbasi93-nodegraphapi-datasource/) from your Grafana instance.

bobertrublik commented 2 years ago

I started a pod with curl which runs next to k8spacket pods in the same namespace, so it should have no problems accessing k8spacket.

Pods:

NAME              READY   STATUS    RESTARTS   AGE
curl-test         1/1     Running   0          2m5s
k8spacket-5fchh   1/1     Running   0          2m10s
k8spacket-627t9   1/1     Running   0          2m10s
k8spacket-stkv6   1/1     Running   0          2m10s

Service:

NAME        TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
k8spacket   ClusterIP   172.16.56.82   <none>        8080/TCP   2m29s

The logs until k8spacket starts capturing packets:

2022/08/11 12:51:05 Serving requests on port 8080
2022/08/11 12:51:05 Refreshing interfaces for capturing...
2022/08/11 12:51:05 Starting capture on interface "cali2e0637c950b"
2022/08/11 12:51:05 Starting capture on interface "cali1f3b59b0059"
2022/08/11 12:51:05 Starting capture on interface "calid16cb36db4b"
2022/08/11 12:51:05 Starting capture on interface "calic03726a14ca"
2022/08/11 12:51:05 Starting capture on interface "cali528d52fdb4c"
2022/08/11 12:51:05 Starting capture on interface "cali1223e6c6c31"
2022/08/11 12:51:05 Starting capture on interface "cali433d7080645"
2022/08/11 12:51:05 Starting capture on interface "calie2e9896354d"
2022/08/11 12:51:05 Starting capture on interface "cali64429976fc5"
2022/08/11 12:51:05 Starting capture on interface "cali4f2d01b045d"
2022/08/11 12:51:05 Starting capture on interface "cali00c20cd856b"
2022/08/11 12:51:05 Starting capture on interface "calic8587a62da4"
2022/08/11 12:51:05 Starting capture on interface "calibc9f0bd94b0"
2022/08/11 12:51:05 Starting capture on interface "cali52b87dc4ff7"
2022/08/11 12:51:05 Starting capture on interface "cali73a037b6acb"
2022/08/11 12:51:05 Starting capture on interface "cali85ccd12a9b9"
2022/08/11 12:51:05 Starting capture on interface "cali36b3aeb53ab"
2022/08/11 12:51:05 Starting capture on interface "tunl0"
2022/08/11 12:51:05 Starting capture on interface "calif64e29ba28d"
2022/08/11 12:51:05 Starting capture on interface "cali5c3ecdedc6f"
2022/08/11 12:51:05 Starting capture on interface "cali599a5c75980"
2022/08/11 12:51:05 Starting capture on interface "cali92216366d5a"
2022/08/11 12:51:05 Starting capture on interface "cali95a3d8a7835"
2022/08/11 12:51:05 Starting capture on interface "cali65896cb1ab0"
2022/08/11 12:51:05 Starting capture on interface "cali83581ffecf2"
bobertrublik commented 2 years ago

Ok if I run the curl command a few times in a row cancelling when nothing happens it reaches the pod about 1 out of 5 times? Really strange.

[ root@curl-test:/ ]$ curl http://k8spacket.k8spacket.svc.cluster.local:8080/metrics
^C
[ root@curl-test:/ ]$ curl http://k8spacket.k8spacket.svc.cluster.local:8080/metrics
^C
[ root@curl-test:/ ]$ curl http://k8spacket.k8spacket.svc.cluster.local:8080/metrics
curl: (7) Failed to connect to k8spacket.k8spacket.svc.cluster.local port 8080: Connection timed out
[ root@curl-test:/ ]$ curl http://k8spacket.k8spacket.svc.cluster.local:8080/metrics
^C
[ root@curl-test:/ ]$ curl http://k8spacket.k8spacket.svc.cluster.local:8080/metrics
^C
[ root@curl-test:/ ]$ curl http://k8spacket.k8spacket.svc.cluster.local:8080/metrics
# HELP go_build_info Build information about the main Go module.
# TYPE go_build_info gauge
go_build_info{checksum="",path="github.com/k8spacket",version="(devel)"} 1
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
k8spacket commented 2 years ago

@bobertrublik I'm wonder about this interface:

2022/08/11 12:51:05 Starting capture on interface "tunl0"

And suppose it could be a problem

In values.yaml for helm there is a property command: "ip address | grep @ | sed -E 's/.* (\\w+)@.*/\\1/' | tr '\\n' ',' | sed 's/.$//'"

Change it to:

command: "ip address | grep @if | sed -E 's/.* (\\w+)@if.*/\\1/' | tr '\\n' ',' | sed 's/.$//'"

And reinstall k8spacket.


Or you can do it directly in daemonset as well:

kubectl -n k8spacket edit daemonsets.apps k8spacket

and then:

        - name: K8S_PACKET_TCP_LISTENER_INTERFACES_COMMAND
          value: ip address | grep @if | sed -E 's/.* (\w+)@if.*/\1/' | tr '\n' ',' | sed 's/.$//'
bobertrublik commented 2 years ago

I changed it but didn't get noticably better :(

Environment:
      K8S_PACKET_NAME_LABEL_VALUE:                         k8spacket
      K8S_PACKET_HIDE_SRC_PORT:                            true
      K8S_PACKET_REVERSE_GEOIP2_DB_PATH:                   /home/k8spacket/GeoLite2-City.mmdb
      K8S_PACKET_REVERSE_WHOIS_REGEXP:                     (?:OrgName:|org-name:)\s*(.*)
      K8S_PACKET_TCP_ASSEMBLER_MAX_PAGES_PER_CONN:         50
      K8S_PACKET_TCP_ASSEMBLER_MAX_PAGES_TOTAL:            50
      K8S_PACKET_TCP_ASSEMBLER_FLUSHING_PERIOD:            10s
      K8S_PACKET_TCP_ASSEMBLER_FLUSHING_CLOSE_OLDER_THAN:  20s
      K8S_PACKET_TCP_LISTENER_INTERFACES_COMMAND:          ip address | grep @if | sed -E 's/.* (\w+)@if.*/\1/' | tr '\n' ',' | sed 's/.$//'
      K8S_PACKET_TCP_LISTENER_INTERFACES_REFRESH_PERIOD:   10s

Also in Grafana when testing the datasource URL it fails most of the time.

The logs:

2022/08/11 14:03:13 Serving requests on port 8080
2022/08/11 14:03:13 Refreshing interfaces for capturing...
2022/08/11 14:03:13 Starting capture on interface "cali7ea0577d3f1"
2022/08/11 14:03:13 Starting capture on interface "calie813d30afc5"
2022/08/11 14:03:13 Starting capture on interface "calicf49bb793df"
2022/08/11 14:03:13 Starting capture on interface "cali6e53a9a258f"
2022/08/11 14:03:13 Starting capture on interface "calif7fccfaa4cc"
2022/08/11 14:03:13 Starting capture on interface "cali434dd2a1f75"
2022/08/11 14:03:13 Starting capture on interface "cali3df33f30e41"
2022/08/11 14:03:13 Starting capture on interface "cali2543f5110bf"
2022/08/11 14:03:13 Starting capture on interface "cali2afdef92a43"
2022/08/11 14:03:13 Starting capture on interface "calif9c78f37416"
2022/08/11 14:03:13 Starting capture on interface "cali401e21c8aeb"
2022/08/11 14:03:13 Starting capture on interface "cali00f0a380d6a"
2022/08/11 14:03:13 Starting capture on interface "cali4026fc5579a"
2022/08/11 14:03:13 Starting capture on interface "caliae18b86a1f9"
2022/08/11 14:03:13 Starting capture on interface "cali0af45212d8c"
2022/08/11 14:03:13 Starting capture on interface "cali4fcdb447e21"
2022/08/11 14:03:13 Starting capture on interface "caliba02546de21"
2022/08/11 14:03:13 Starting capture on interface "cali3f610232439"
2022/08/11 14:03:13 Starting capture on interface "cali3b7522ed88b"
k8spacket commented 2 years ago

@bobertrublik Could you check CPU usage of k8spacket pods? (e.g., https://github.com/robscott/kube-capacity)

Additionally what I can suggest is to try to listen on one network interface first and check If there is some performance issue.

f.e. (check current interfaces first. In the example below I took it from your comment)

K8S_PACKET_TCP_LISTENER_INTERFACES_COMMAND:         echo cali7ea0577d3f1
bobertrublik commented 2 years ago

Resource usage looks good

k8spacket-9pft8                                                 250m (3%)      500m (6%)     1000Mi (1%)       1500Mi (2%)
k8spacket-bswbl                                                 250m (3%)      500m (6%)     1000Mi (1%)       1500Mi (2%)
k8spacket-kkflx                                                 250m (3%)      500m (6%)     1000Mi (1%)       1500Mi (2%)

Interestingly after I set

K8S_PACKET_TCP_LISTENER_INTERFACES_COMMAND: echo caliceb690ae662 | tr -d '\n'

curl basically worked 99% of the time. How to interpret this?

Bandwith metrics:

image
bobertrublik commented 2 years ago

Currently I got the logs and metrics graphs working, I guess the metrics can be read even if the connection times out from time to time. The node graph still times out with the following error.

status:504
statusText:""
data:Object
message:""
error:""
response:""
config:Object
method:"GET"
url:"api/datasources/proxy/3/nodegraphds/api/graph/data?namespace=&include=&exclude=&stats-type=connection"
retry:0
headers:Object
hideFromInspector:false
message:"Query error: 504 "
k8spacket commented 2 years ago

@bobertrublik this

k8spacket-9pft8                                                 250m (3%)      500m (6%)     1000Mi (1%)       1500Mi (2%)
k8spacket-bswbl                                                 250m (3%)      500m (6%)     1000Mi (1%)       1500Mi (2%)
k8spacket-kkflx                                                 250m (3%)      500m (6%)     1000Mi (1%)       1500Mi (2%)

is a request and limit definition only. Please, use kube-capacity tool to see current CPU and memory usage. Or you can find it in Grafana dashboards.

image

It's hard to investigate the problem without seeing it. If there is a possibility to prepare another k8s cluster with access, I could see it and find remedy.

bobertrublik commented 2 years ago

Right sorry, here we go:

kube-capacity --pods -u | grep k8spacket

POD         CPU REQUESTS   CPU LIMITS    CPU UTIL      MEMORY REQUESTS   MEMORY LIMITS   MEMORY UTIL
k8spacket-w6dzp   250m (3%)      500m (6%)     19m (0%)      1000Mi (1%)       1500Mi (2%)     157Mi (0%)
k8spacket-f28pn    250m (3%)      500m (6%)     25m (0%)      1000Mi (1%)       1500Mi (2%)     113Mi (0%)
k8spacket-ts6hw    250m (3%)      500m (6%)     17m (0%)      1000Mi (1%)       1500Mi (2%)     132Mi (0%)

I'll try to investigate if the problem is because of Calico. Anyway, thank you very much for your help!

k8spacket commented 2 years ago

Last chance:

bobertrublik commented 2 years ago

Yes the curls appear in the logs

Screenshot 2022-08-11 at 18 32 59

Looking at the code I curled the endpoint http://%s:8080/connections?%s and had the same recurring problem with the call timing out most of the time.

I noticed that when the connection times out the logs have a value of 0 for bytesReceived and bytesSent.

logs

bobertrublik commented 2 years ago

Hey @k8spacket I think I found out the issue. I noticed that my Calico interfaces have an MTU of 1440 because 60 bytes are used for the header. (https://projectcalico.docs.tigera.io/networking/mtu#determine-mtu-size)

Meanwhile the eth0 interface has an MTU of 1500 because the daemonset sets hostNetwork: true and there it apparently uses an MTU of 1500. Do you know if there is an application side fix? Otherwise the issue can be closed :)

k8spacket commented 2 years ago

@bobertrublik As far as I see there is an option to change MTU for calico network interfaces. I prepared my cluster to have various MTU for eth0 and calico interfaces, but still, no luck repeating your problem. Did you manage it somehow?