Closed crujzo closed 2 years ago
@crujzo it's bad form to ping so many people on a github issue like this. Please don't do that in the future.
Felix Prometheus metrics are not enabled by default, and are documented in the following locations:
Please read through those. If you're still having trouble, you'll need to update this issue to include the steps you took to enable Felix metrics so that I can help troubleshoot.
@caseydavenport - I have updated the problem statement with more information. Please have a look at it and let me know if any specific information is required.
PS: I agree with your point of tagging so many people but I was not aware who would be the right person to tag. I will abstain to tag so many members in future :)
curl http://localhost:9091/metrics
curl: (7) Failed to connect to ::1: No route to host
Looks like your host's definition of localhost is ipv6, but you don't have ipv6 enabled. Either fix that or use the node's IP instead.
I agree with your point of tagging so many people but I was not aware who would be the right person to tag
We monitor GitHub issues proactively, so no need to tag anyone when raising an issue.
@caseydavenport @lwr20 - Thanks for your replies.
Apparently, it is not the ipv6 issue. I am able to get the kube-calico-metrics on port 9094 with localhost, below is my node status and output of metrics
sh-4.2# calicoctl node status
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+-------------+
| 10.100.114.41 | node-to-node mesh | up | 19:40:32 | Established |
| 10.100.114.43 | node-to-node mesh | up | 19:40:33 | Established |
+---------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
sh-4.2# curl http://localhost:9091/metrics
curl: (7) Failed to connect to ::1: No route to host
sh-4.2#
sh-4.2# curl http://localhost:9094/metrics
# HELP go_gc_cycles_automatic_gc_cycles_total Count of completed GC cycles generated by the Go runtime.
# TYPE go_gc_cycles_automatic_gc_cycles_total counter
go_gc_cycles_automatic_gc_cycles_total 12
# HELP go_gc_cycles_forced_gc_cycles_total Count of completed GC cycles forced by the application.
# TYPE go_gc_cycles_forced_gc_cycles_total counter
go_gc_cycles_forced_gc_cycles_total 0
# HELP go_gc_cycles_total_gc_cycles_total Count of all completed GC cycles.
# TYPE go_gc_cycles_total_gc_cycles_total counter
go_gc_cycles_total_gc_cycles_total 12
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.000147715
go_gc_duration_seconds{quantile="0.25"} 0.000192316
go_gc_duration_seconds{quantile="0.5"} 0.000221822
go_gc_duration_seconds{quantile="0.75"} 0.000270066
go_gc_duration_seconds{quantile="1"} 0.000780084
go_gc_duration_seconds_sum 0.003167649
go_gc_duration_seconds_count 12
..................
..............
also I tried with node ip, below is the o/p of the same:
sh-4.2# hostname -i
10.100.114.42
sh-4.2#
sh-4.2# curl http://10.100.114.42:9091/metrics
curl: (7) Failed connect to 10.100.114.42:9091; Connection refused
below few other details:
sh-4.2# pwd
/etc/cni/net.d
sh-4.2# ls -lrth
drwxr-xr-x 2 root root 4.0K Jul 27 01:10 calico-tls
-rw-r--r-- 1 root root 887 Jul 27 01:10 10-calico.conflist
-rw------- 1 root root 2.6K Jul 27 01:10 calico-kubeconfig
10-calico.conflist
sh-4.2# cat 10-calico.conflist
{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"log_level": "info",
"log_file_path": "/var/log/calico/cni/cni.log",
"etcd_endpoints": "https://10.100.114.81:2379,https://10.100.114.82:2379,https://10.100.114.83:2379",
"etcd_key_file": "/etc/cni/net.d/calico-tls/etcd-key",
"etcd_cert_file": "/etc/cni/net.d/calico-tls/etcd-cert",
"etcd_ca_cert_file": "/etc/cni/net.d/calico-tls/etcd-ca",
"mtu": 0,
"ipam": {
"type": "calico-ipam"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
},
{
"type": "bandwidth",
"capabilities": {"bandwidth": true}
}
]
}
Also in the calico-node DaemonSet manifest , I have the below config for ipv6
- name: FELIX_DEFAULTENDPOINTTOHOSTACTION
value: "ACCEPT"
- name: FELIX_IPV6SUPPORT
value: "false"
- name: FELIX_HEALTHENABLED
value: "true"
Please help!
am able to get the kube-calico-metrics on port 9094 with localhost,
Calico kube-controllers doens't run with host networking, so you need to run this from within the kube-controllers pod, which will likely have a different result for localhost
.
as we can see NO felixconfigurations object is created as nothing no object related to felixconfiguration is given in the default manifest file calico with etcd datastore When I created felixconfiguration custom resource definition as it is created in case-1 with below yaml
I just realized this part of your post - Calico CustomResourceDefinitions aren't used in etcd mode. That's the entire point of etcd mode - it reads data from etcd, and not from CRDs. So creating the felixconfiguration CRD is expected to have effect. You need to use calicoctl to write a FelixConfiguration into etcd directly, or use environment variables.
General Summary: When calico is installed with etcd datastore with the following manifest file calico with etcd datastore I am able to view the published calico-kube-controllers metrics on port 9094 but Felix metrics are not getting published on port 9091
Whereas, when we install calico without etcd datastore with this manifest calico-typha I am able to view Felix metics published on port 9091
CASE-1
When we install calico with calico-typha following objects gets created as present in the manifest file
and when we apply
kubectl patch felixconfiguration default --type merge --patch '{"spec":{"prometheusMetricsEnabled": true}}'
metrics related to felix are getting published on each calico node and can be seen withhttp://localhost:9091/metrics
on any calico node.CASE-2
But, when we install calico with calico with etcd datastore the objects which get created are
as we can see NO felixconfigurations object is created as nothing no object related to felixconfiguration is given in the default manifest file calico with etcd datastore
When I created felixconfiguration custom resource definition as it is created in case-1 with below yaml
and felixconfiguration yaml given below:
the felixconfiguration resource was created as given below:
Even with Felixconfiguration done the felix metrics are not getting published
Please help with this.
Is it an expected behaviour? How can I get felix metrics with etcd datastore?
@lwr20 @caseydavenport @doucol @sethmccombs @electricjesus @fasaxc @lmm @lxpollitt @matthewdupre @mazdakn @mgleung @mikestephen @neiljerram @ozdanborne @penkeysuresh @peterkellydev @song-jiang @tmjd
PLEASE HELP!