projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.01k stars 1.34k forks source link

Felix metrics not getting publish when calico is setup with etcd datastore #6428

Closed crujzo closed 2 years ago

crujzo commented 2 years ago

General Summary: When calico is installed with etcd datastore with the following manifest file calico with etcd datastore I am able to view the published calico-kube-controllers metrics on port 9094 but Felix metrics are not getting published on port 9091

Whereas, when we install calico without etcd datastore with this manifest calico-typha I am able to view Felix metics published on port 9091

CASE-1

When we install calico with calico-typha following objects gets created as present in the manifest file

  1. configmap/calico-config
  2. customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org
  3. customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org
  4. customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org
  5. customresourcedefinition.apiextensions.k8s.io/caliconodestatuses.crd.projectcalico.org
  6. customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org
  7. customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org configured
  8. customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org
  9. customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org
  10. customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org
  11. customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org
  12. customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org
  13. customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org
  14. customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org
  15. customresourcedefinition.apiextensions.k8s.io/ipreservations.crd.projectcalico.org
  16. customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org
  17. customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org
  18. customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org
  19. clusterrole.rbac.authorization.k8s.io/calico-kube-controllers
  20. clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers
  21. clusterrole.rbac.authorization.k8s.io/calico-node
  22. clusterrolebinding.rbac.authorization.k8s.io/calico-node
  23. service/calico-typha
  24. deployment.apps/calico-typha
  25. poddisruptionbudget.policy/calico-typha
  26. daemonset.apps/calico-node
  27. serviceaccount/calico-node
  28. deployment.apps/calico-kube-controllers
  29. serviceaccount/calico-kube-controllers
  30. poddisruptionbudget.policy/calico-kube-controllers

and when we apply kubectl patch felixconfiguration default --type merge --patch '{"spec":{"prometheusMetricsEnabled": true}}' metrics related to felix are getting published on each calico node and can be seen with http://localhost:9091/metrics on any calico node.

CASE-2

But, when we install calico with calico with etcd datastore the objects which get created are

  1. secret "calico-etcd-secrets"
  2. configmap "calico-config"
  3. clusterrole.rbac.authorization.k8s.io "calico-kube-controllers"
  4. clusterrolebinding.rbac.authorization.k8s.io "calico-kube-controllers"
  5. clusterrole.rbac.authorization.k8s.io "calico-node"
  6. clusterrolebinding.rbac.authorization.k8s.io "calico-node"
  7. daemonset.apps "calico-node"
  8. serviceaccount "calico-node"
  9. deployment.apps "calico-kube-controllers"
  10. serviceaccount "calico-kube-controllers"
  11. poddisruptionbudget.policy "calico-kube-controllers"

as we can see NO felixconfigurations object is created as nothing no object related to felixconfiguration is given in the default manifest file calico with etcd datastore

When I created felixconfiguration custom resource definition as it is created in case-1 with below yaml

apiVersion: apiextensions.k8s.io/v1 ##$$
kind: CustomResourceDefinition
metadata:
  name: felixconfigurations.crd.projectcalico.org
spec:
  group: crd.projectcalico.org
  names:
    kind: FelixConfiguration
    listKind: FelixConfigurationList
    plural: felixconfigurations
    singular: felixconfiguration
  scope: Cluster
  versions:
  - name: v1
    schema:
      openAPIV3Schema:
        description: Felix Configuration contains the configuration for Felix.
        properties:
          apiVersion:
            description: 'APIVersion defines the versioned schema of this representation
              of an object. Servers should convert recognized schemas to the latest
              internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
            type: string
          kind:
            description: 'Kind is a string value representing the REST resource this
              object represents. Servers may infer this from the endpoint the client
              submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
            type: string
          metadata:
            type: object
          spec:
            description: FelixConfigurationSpec contains the values of the Felix configuration.
            properties:
              allowIPIPPacketsFromWorkloads:
                description: 'AllowIPIPPacketsFromWorkloads controls whether Felix
                  will add a rule to drop IPIP encapsulated traffic from workloads
                  [Default: false]'
                type: boolean
              allowVXLANPacketsFromWorkloads:
                description: 'AllowVXLANPacketsFromWorkloads controls whether Felix
                  will add a rule to drop VXLAN encapsulated traffic from workloads
                  [Default: false]'
                type: boolean
              awsSrcDstCheck:
                description: 'Set source-destination-check on AWS EC2 instances. Accepted
                  value must be one of "DoNothing", "Enable" or "Disable". [Default:
                  DoNothing]'
                enum:
                - DoNothing
                - Enable
                - Disable
                type: string
              bpfConnectTimeLoadBalancingEnabled:
                description: 'BPFConnectTimeLoadBalancingEnabled when in BPF mode,
                  controls whether Felix installs the connection-time load balancer.  The
                  connect-time load balancer is required for the host to be able to
                  reach Kubernetes services and it improves the performance of pod-to-service
                  connections.  The only reason to disable it is for debugging purposes.  [Default:
                  true]'
                type: boolean
              bpfDataIfacePattern:
                description: BPFDataIfacePattern is a regular expression that controls
                  which interfaces Felix should attach BPF programs to in order to
                  catch traffic to/from the network.  This needs to match the interfaces
                  that Calico workload traffic flows over as well as any interfaces
                  that handle incoming traffic to nodeports and services from outside
                  the cluster.  It should not match the workload interfaces (usually
                  named cali...).
                type: string
              bpfDisableUnprivileged:
                description: 'BPFDisableUnprivileged, if enabled, Felix sets the kernel.unprivileged_bpf_disabled
                  sysctl to disable unprivileged use of BPF.  This ensures that unprivileged
                  users cannot access Calico''s BPF maps and cannot insert their own
                  BPF programs to interfere with Calico''s. [Default: true]'
                type: boolean
              bpfEnabled:
                description: 'BPFEnabled, if enabled Felix will use the BPF dataplane.
                  [Default: false]'
                type: boolean
              bpfEnforceRPF:
                description: 'BPFEnforceRPF enforce strict RPF on all interfaces with
                  BPF programs regardless of what is the per-interfaces or global
                  setting. Possible values are Disabled or Strict. [Default: Strict]'
                type: string
              bpfExtToServiceConnmark:
                description: 'BPFExtToServiceConnmark in BPF mode, control a 32bit
                  mark that is set on connections from an external client to a local
                  service. This mark allows us to control how packets of that connection
                  are routed within the host and how is routing intepreted by RPF
                  check. [Default: 0]'
                type: integer
              bpfExternalServiceMode:
                description: 'BPFExternalServiceMode in BPF mode, controls how connections
                  from outside the cluster to services (node ports and cluster IPs)
                  are forwarded to remote workloads.  If set to "Tunnel" then both
                  request and response traffic is tunneled to the remote node.  If
                  set to "DSR", the request traffic is tunneled but the response traffic
                  is sent directly from the remote node.  In "DSR" mode, the remote
                  node appears to use the IP of the ingress node; this requires a
                  permissive L2 network.  [Default: Tunnel]'
                type: string
              bpfKubeProxyEndpointSlicesEnabled:
                description: BPFKubeProxyEndpointSlicesEnabled in BPF mode, controls
                  whether Felix's embedded kube-proxy accepts EndpointSlices or not.
                type: boolean
              bpfKubeProxyIptablesCleanupEnabled:
                description: 'BPFKubeProxyIptablesCleanupEnabled, if enabled in BPF
                  mode, Felix will proactively clean up the upstream Kubernetes kube-proxy''s
                  iptables chains.  Should only be enabled if kube-proxy is not running.  [Default:
                  true]'
                type: boolean
              bpfKubeProxyMinSyncPeriod:
                description: 'BPFKubeProxyMinSyncPeriod, in BPF mode, controls the
                  minimum time between updates to the dataplane for Felix''s embedded
                  kube-proxy.  Lower values give reduced set-up latency.  Higher values
                  reduce Felix CPU usage by batching up more work.  [Default: 1s]'
                type: string
              bpfLogLevel:
                description: 'BPFLogLevel controls the log level of the BPF programs
                  when in BPF dataplane mode.  One of "Off", "Info", or "Debug".  The
                  logs are emitted to the BPF trace pipe, accessible with the command
                  `tc exec bpf debug`. [Default: Off].'
                type: string
              bpfMapSizeConntrack:
                description: 'BPFMapSizeConntrack sets the size for the conntrack
                  map.  This map must be large enough to hold an entry for each active
                  connection.  Warning: changing the size of the conntrack map can
                  cause disruption.'
                type: integer
              bpfMapSizeIPSets:
                description: BPFMapSizeIPSets sets the size for ipsets map.  The IP
                  sets map must be large enough to hold an entry for each endpoint
                  matched by every selector in the source/destination matches in network
                  policy.  Selectors such as "all()" can result in large numbers of
                  entries (one entry per endpoint in that case).
                type: integer
              bpfMapSizeNATAffinity:
                type: integer
              bpfMapSizeNATBackend:
                description: BPFMapSizeNATBackend sets the size for nat back end map.
                  This is the total number of endpoints. This is mostly more than
                  the size of the number of services.
                type: integer
              bpfMapSizeNATFrontend:
                description: BPFMapSizeNATFrontend sets the size for nat front end
                  map. FrontendMap should be large enough to hold an entry for each
                  nodeport, external IP and each port in each service.
                type: integer
              bpfMapSizeRoute:
                description: BPFMapSizeRoute sets the size for the routes map.  The
                  routes map should be large enough to hold one entry per workload
                  and a handful of entries per host (enough to cover its own IPs and
                  tunnel IPs).
                type: integer
              bpfPSNATPorts:
                anyOf:
                - type: integer
                - type: string
                description: 'BPFPSNATPorts sets the range from which we randomly
                  pick a port if there is a source port collision. This should be
                  within the ephemeral range as defined by RFC 6056 (1024–65535) and
                  preferably outside the  ephemeral ranges used by common operating
                  systems. Linux uses 32768–60999, while others mostly use the IANA
                  defined range 49152–65535. It is not necessarily a problem if this
                  range overlaps with the operating systems. Both ends of the range
                  are inclusive. [Default: 20000:29999]'
                pattern: ^.*
                x-kubernetes-int-or-string: true
              chainInsertMode:
                description: 'ChainInsertMode controls whether Felix hooks the kernel''s
                  top-level iptables chains by inserting a rule at the top of the
                  chain or by appending a rule at the bottom. insert is the safe default
                  since it prevents Calico''s rules from being bypassed. If you switch
                  to append mode, be sure that the other rules in the chains signal
                  acceptance by falling through to the Calico rules, otherwise the
                  Calico policy will be bypassed. [Default: insert]'
                type: string
              dataplaneDriver:
                description: DataplaneDriver filename of the external dataplane driver
                  to use.  Only used if UseInternalDataplaneDriver is set to false.
                type: string
              dataplaneWatchdogTimeout:
                description: 'DataplaneWatchdogTimeout is the readiness/liveness timeout
                  used for Felix''s (internal) dataplane driver. Increase this value
                  if you experience spurious non-ready or non-live events when Felix
                  is under heavy load. Decrease the value to get felix to report non-live
                  or non-ready more quickly. [Default: 90s]'
                type: string
              debugDisableLogDropping:
                type: boolean
              debugMemoryProfilePath:
                type: string
              debugSimulateCalcGraphHangAfter:
                type: string
              debugSimulateDataplaneHangAfter:
                type: string
              defaultEndpointToHostAction:
                description: 'DefaultEndpointToHostAction controls what happens to
                  traffic that goes from a workload endpoint to the host itself (after
                  the traffic hits the endpoint egress policy). By default Calico
                  blocks traffic from workload endpoints to the host itself with an
                  iptables "DROP" action. If you want to allow some or all traffic
                  from endpoint to host, set this parameter to RETURN or ACCEPT. Use
                  RETURN if you have your own rules in the iptables "INPUT" chain;
                  Calico will insert its rules at the top of that chain, then "RETURN"
                  packets to the "INPUT" chain once it has completed processing workload
                  endpoint egress policy. Use ACCEPT to unconditionally accept packets
                  from workloads after processing workload endpoint egress policy.
                  [Default: Drop]'
                type: string
              deviceRouteProtocol:
                description: This defines the route protocol added to programmed device
                  routes, by default this will be RTPROT_BOOT when left blank.
                type: integer
              deviceRouteSourceAddress:
                description: This is the IPv4 source address to use on programmed
                  device routes. By default the source address is left blank, leaving
                  the kernel to choose the source address used.
                type: string
              deviceRouteSourceAddressIPv6:
                description: This is the IPv6 source address to use on programmed
                  device routes. By default the source address is left blank, leaving
                  the kernel to choose the source address used.
                type: string
              disableConntrackInvalidCheck:
                type: boolean
              endpointReportingDelay:
                type: string
              endpointReportingEnabled:
                type: boolean
              externalNodesList:
                description: ExternalNodesCIDRList is a list of CIDR's of external-non-calico-nodes
                  which may source tunnel traffic and have the tunneled traffic be
                  accepted at calico nodes.
                items:
                  type: string
                type: array
              failsafeInboundHostPorts:
                description: 'FailsafeInboundHostPorts is a list of UDP/TCP ports
                  and CIDRs that Felix will allow incoming traffic to host endpoints
                  on irrespective of the security policy. This is useful to avoid
                  accidentally cutting off a host with incorrect configuration. For
                  back-compatibility, if the protocol is not specified, it defaults
                  to "tcp". If a CIDR is not specified, it will allow traffic from
                  all addresses. To disable all inbound host ports, use the value
                  none. The default value allows ssh access and DHCP. [Default: tcp:22,
                  udp:68, tcp:179, tcp:2379, tcp:2380, tcp:6443, tcp:6666, tcp:6667]'
                items:
                  description: ProtoPort is combination of protocol, port, and CIDR.
                    Protocol and port must be specified.
                  properties:
                    net:
                      type: string
                    port:
                      type: integer
                    protocol:
                      type: string
                  required:
                  - port
                  - protocol
                  type: object
                type: array
              failsafeOutboundHostPorts:
                description: 'FailsafeOutboundHostPorts is a list of UDP/TCP ports
                  and CIDRs that Felix will allow outgoing traffic from host endpoints
                  to irrespective of the security policy. This is useful to avoid
                  accidentally cutting off a host with incorrect configuration. For
                  back-compatibility, if the protocol is not specified, it defaults
                  to "tcp". If a CIDR is not specified, it will allow traffic from
                  all addresses. To disable all outbound host ports, use the value
                  none. The default value opens etcd''s standard ports to ensure that
                  Felix does not get cut off from etcd as well as allowing DHCP and
                  DNS. [Default: tcp:179, tcp:2379, tcp:2380, tcp:6443, tcp:6666,
                  tcp:6667, udp:53, udp:67]'
                items:
                  description: ProtoPort is combination of protocol, port, and CIDR.
                    Protocol and port must be specified.
                  properties:
                    net:
                      type: string
                    port:
                      type: integer
                    protocol:
                      type: string
                  required:
                  - port
                  - protocol
                  type: object
                type: array
              featureDetectOverride:
                description: FeatureDetectOverride is used to override the feature
                  detection. Values are specified in a comma separated list with no
                  spaces, example; "SNATFullyRandom=true,MASQFullyRandom=false,RestoreSupportsLock=".
                  "true" or "false" will force the feature, empty or omitted values
                  are auto-detected.
                type: string
              floatingIPs:
                default: Disabled
                description: FloatingIPs configures whether or not Felix will program
                  floating IP addresses.
                enum:
                - Enabled
                - Disabled
                type: string
              genericXDPEnabled:
                description: 'GenericXDPEnabled enables Generic XDP so network cards
                  that don''t support XDP offload or driver modes can use XDP. This
                  is not recommended since it doesn''t provide better performance
                  than iptables. [Default: false]'
                type: boolean
              healthEnabled:
                type: boolean
              healthHost:
                type: string
              healthPort:
                type: integer
              interfaceExclude:
                description: 'InterfaceExclude is a comma-separated list of interfaces
                  that Felix should exclude when monitoring for host endpoints. The
                  default value ensures that Felix ignores Kubernetes'' IPVS dummy
                  interface, which is used internally by kube-proxy. If you want to
                  exclude multiple interface names using a single value, the list
                  supports regular expressions. For regular expressions you must wrap
                  the value with ''/''. For example having values ''/^kube/,veth1''
                  will exclude all interfaces that begin with ''kube'' and also the
                  interface ''veth1''. [Default: kube-ipvs0]'
                type: string
              interfacePrefix:
                description: 'InterfacePrefix is the interface name prefix that identifies
                  workload endpoints and so distinguishes them from host endpoint
                  interfaces. Note: in environments other than bare metal, the orchestrators
                  configure this appropriately. For example our Kubernetes and Docker
                  integrations set the ''cali'' value, and our OpenStack integration
                  sets the ''tap'' value. [Default: cali]'
                type: string
              interfaceRefreshInterval:
                description: InterfaceRefreshInterval is the period at which Felix
                  rescans local interfaces to verify their state. The rescan can be
                  disabled by setting the interval to 0.
                type: string
              ipipEnabled:
                description: 'IPIPEnabled overrides whether Felix should configure
                  an IPIP interface on the host. Optional as Felix determines this
                  based on the existing IP pools. [Default: nil (unset)]'
                type: boolean
              ipipMTU:
                description: 'IPIPMTU is the MTU to set on the tunnel device. See
                  Configuring MTU [Default: 1440]'
                type: integer
              ipsetsRefreshInterval:
                description: 'IpsetsRefreshInterval is the period at which Felix re-checks
                  all iptables state to ensure that no other process has accidentally
                  broken Calico''s rules. Set to 0 to disable iptables refresh. [Default:
                  90s]'
                type: string
              iptablesBackend:
                description: IptablesBackend specifies which backend of iptables will
                  be used. The default is legacy.
                type: string
              iptablesFilterAllowAction:
                type: string
              iptablesLockFilePath:
                description: 'IptablesLockFilePath is the location of the iptables
                  lock file. You may need to change this if the lock file is not in
                  its standard location (for example if you have mapped it into Felix''s
                  container at a different path). [Default: /run/xtables.lock]'
                type: string
              iptablesLockProbeInterval:
                description: 'IptablesLockProbeInterval is the time that Felix will
                  wait between attempts to acquire the iptables lock if it is not
                  available. Lower values make Felix more responsive when the lock
                  is contended, but use more CPU. [Default: 50ms]'
                type: string
              iptablesLockTimeout:
                description: 'IptablesLockTimeout is the time that Felix will wait
                  for the iptables lock, or 0, to disable. To use this feature, Felix
                  must share the iptables lock file with all other processes that
                  also take the lock. When running Felix inside a container, this
                  requires the /run directory of the host to be mounted into the calico/node
                  or calico/felix container. [Default: 0s disabled]'
                type: string
              iptablesMangleAllowAction:
                type: string
              iptablesMarkMask:
                description: 'IptablesMarkMask is the mask that Felix selects its
                  IPTables Mark bits from. Should be a 32 bit hexadecimal number with
                  at least 8 bits set, none of which clash with any other mark bits
                  in use on the system. [Default: 0xff000000]'
                format: int32
                type: integer
              iptablesNATOutgoingInterfaceFilter:
                type: string
              iptablesPostWriteCheckInterval:
                description: 'IptablesPostWriteCheckInterval is the period after Felix
                  has done a write to the dataplane that it schedules an extra read
                  back in order to check the write was not clobbered by another process.
                  This should only occur if another application on the system doesn''t
                  respect the iptables lock. [Default: 1s]'
                type: string
              iptablesRefreshInterval:
                description: 'IptablesRefreshInterval is the period at which Felix
                  re-checks the IP sets in the dataplane to ensure that no other process
                  has accidentally broken Calico''s rules. Set to 0 to disable IP
                  sets refresh. Note: the default for this value is lower than the
                  other refresh intervals as a workaround for a Linux kernel bug that
                  was fixed in kernel version 4.11. If you are using v4.11 or greater
                  you may want to set this to, a higher value to reduce Felix CPU
                  usage. [Default: 10s]'
                type: string
              ipv6Support:
                description: IPv6Support controls whether Felix enables support for
                  IPv6 (if supported by the in-use dataplane).
                type: boolean
              kubeNodePortRanges:
                description: 'KubeNodePortRanges holds list of port ranges used for
                  service node ports. Only used if felix detects kube-proxy running
                  in ipvs mode. Felix uses these ranges to separate host and workload
                  traffic. [Default: 30000:32767].'
                items:
                  anyOf:
                  - type: integer
                  - type: string
                  pattern: ^.*
                  x-kubernetes-int-or-string: true
                type: array
              logDebugFilenameRegex:
                description: LogDebugFilenameRegex controls which source code files
                  have their Debug log output included in the logs. Only logs from
                  files with names that match the given regular expression are included.  The
                  filter only applies to Debug level logs.
                type: string
              logFilePath:
                description: 'LogFilePath is the full path to the Felix log. Set to
                  none to disable file logging. [Default: /var/log/calico/felix.log]'
                type: string
              logPrefix:
                description: 'LogPrefix is the log prefix that Felix uses when rendering
                  LOG rules. [Default: calico-packet]'
                type: string
              logSeverityFile:
                description: 'LogSeverityFile is the log severity above which logs
                  are sent to the log file. [Default: Info]'
                type: string
              logSeverityScreen:
                description: 'LogSeverityScreen is the log severity above which logs
                  are sent to the stdout. [Default: Info]'
                type: string
              logSeveritySys:
                description: 'LogSeveritySys is the log severity above which logs
                  are sent to the syslog. Set to None for no logging to syslog. [Default:
                  Info]'
                type: string
              maxIpsetSize:
                type: integer
              metadataAddr:
                description: 'MetadataAddr is the IP address or domain name of the
                  server that can answer VM queries for cloud-init metadata. In OpenStack,
                  this corresponds to the machine running nova-api (or in Ubuntu,
                  nova-api-metadata). A value of none (case insensitive) means that
                  Felix should not set up any NAT rule for the metadata path. [Default:
                  127.0.0.1]'
                type: string
              metadataPort:
                description: 'MetadataPort is the port of the metadata server. This,
                  combined with global.MetadataAddr (if not ''None''), is used to
                  set up a NAT rule, from 169.254.169.254:80 to MetadataAddr:MetadataPort.
                  In most cases this should not need to be changed [Default: 8775].'
                type: integer
              mtuIfacePattern:
                description: MTUIfacePattern is a regular expression that controls
                  which interfaces Felix should scan in order to calculate the host's
                  MTU. This should not match workload interfaces (usually named cali...).
                type: string
              natOutgoingAddress:
                description: NATOutgoingAddress specifies an address to use when performing
                  source NAT for traffic in a natOutgoing pool that is leaving the
                  network. By default the address used is an address on the interface
                  the traffic is leaving on (ie it uses the iptables MASQUERADE target)
                type: string
              natPortRange:
                anyOf:
                - type: integer
                - type: string
                description: NATPortRange specifies the range of ports that is used
                  for port mapping when doing outgoing NAT. When unset the default
                  behavior of the network stack is used.
                pattern: ^.*
                x-kubernetes-int-or-string: true
              netlinkTimeout:
                type: string
              openstackRegion:
                description: 'OpenstackRegion is the name of the region that a particular
                  Felix belongs to. In a multi-region Calico/OpenStack deployment,
                  this must be configured somehow for each Felix (here in the datamodel,
                  or in felix.cfg or the environment on each compute node), and must
                  match the [calico] openstack_region value configured in neutron.conf
                  on each node. [Default: Empty]'
                type: string
              policySyncPathPrefix:
                description: 'PolicySyncPathPrefix is used to by Felix to communicate
                  policy changes to external services, like Application layer policy.
                  [Default: Empty]'
                type: string
              prometheusGoMetricsEnabled:
                description: 'PrometheusGoMetricsEnabled disables Go runtime metrics
                  collection, which the Prometheus client does by default, when set
                  to false. This reduces the number of metrics reported, reducing
                  Prometheus load. [Default: true]'
                type: boolean
              prometheusMetricsEnabled:
                description: 'PrometheusMetricsEnabled enables the Prometheus metrics
                  server in Felix if set to true. [Default: false]'
                type: boolean
              prometheusMetricsHost:
                description: 'PrometheusMetricsHost is the host that the Prometheus
                  metrics server should bind to. [Default: empty]'
                type: string
              prometheusMetricsPort:
                description: 'PrometheusMetricsPort is the TCP port that the Prometheus
                  metrics server should bind to. [Default: 9091]'
                type: integer
              prometheusProcessMetricsEnabled:
                description: 'PrometheusProcessMetricsEnabled disables process metrics
                  collection, which the Prometheus client does by default, when set
                  to false. This reduces the number of metrics reported, reducing
                  Prometheus load. [Default: true]'
                type: boolean
              prometheusWireGuardMetricsEnabled:
                description: 'PrometheusWireGuardMetricsEnabled disables wireguard
                  metrics collection, which the Prometheus client does by default,
                  when set to false. This reduces the number of metrics reported,
                  reducing Prometheus load. [Default: true]'
                type: boolean
              removeExternalRoutes:
                description: Whether or not to remove device routes that have not
                  been programmed by Felix. Disabling this will allow external applications
                  to also add device routes. This is enabled by default which means
                  we will remove externally added routes.
                type: boolean
              reportingInterval:
                description: 'ReportingInterval is the interval at which Felix reports
                  its status into the datastore or 0 to disable. Must be non-zero
                  in OpenStack deployments. [Default: 30s]'
                type: string
              reportingTTL:
                description: 'ReportingTTL is the time-to-live setting for process-wide
                  status reports. [Default: 90s]'
                type: string
              routeRefreshInterval:
                description: 'RouteRefreshInterval is the period at which Felix re-checks
                  the routes in the dataplane to ensure that no other process has
                  accidentally broken Calico''s rules. Set to 0 to disable route refresh.
                  [Default: 90s]'
                type: string
              routeSource:
                description: 'RouteSource configures where Felix gets its routing
                  information. - WorkloadIPs: use workload endpoints to construct
                  routes. - CalicoIPAM: the default - use IPAM data to construct routes.'
                type: string
              routeTableRange:
                description: Deprecated in favor of RouteTableRanges. Calico programs
                  additional Linux route tables for various purposes. RouteTableRange
                  specifies the indices of the route tables that Calico should use.
                properties:
                  max:
                    type: integer
                  min:
                    type: integer
                required:
                - max
                - min
                type: object
              routeTableRanges:
                description: Calico programs additional Linux route tables for various
                  purposes. RouteTableRanges specifies a set of table index ranges
                  that Calico should use. Deprecates`RouteTableRange`, overrides `RouteTableRange`.
                items:
                  properties:
                    max:
                      type: integer
                    min:
                      type: integer
                  required:
                  - max
                  - min
                  type: object
                type: array
              serviceLoopPrevention:
                description: 'When service IP advertisement is enabled, prevent routing
                  loops to service IPs that are not in use, by dropping or rejecting
                  packets that do not get DNAT''d by kube-proxy. Unless set to "Disabled",
                  in which case such routing loops continue to be allowed. [Default:
                  Drop]'
                type: string
              sidecarAccelerationEnabled:
                description: 'SidecarAccelerationEnabled enables experimental sidecar
                  acceleration [Default: false]'
                type: boolean
              usageReportingEnabled:
                description: 'UsageReportingEnabled reports anonymous Calico version
                  number and cluster size to projectcalico.org. Logs warnings returned
                  by the usage server. For example, if a significant security vulnerability
                  has been discovered in the version of Calico being used. [Default:
                  true]'
                type: boolean
              usageReportingInitialDelay:
                description: 'UsageReportingInitialDelay controls the minimum delay
                  before Felix makes a report. [Default: 300s]'
                type: string
              usageReportingInterval:
                description: 'UsageReportingInterval controls the interval at which
                  Felix makes reports. [Default: 86400s]'
                type: string
              useInternalDataplaneDriver:
                description: UseInternalDataplaneDriver, if true, Felix will use its
                  internal dataplane programming logic.  If false, it will launch
                  an external dataplane driver and communicate with it over protobuf.
                type: boolean
              vxlanEnabled:
                description: 'VXLANEnabled overrides whether Felix should create the
                  VXLAN tunnel device for VXLAN networking. Optional as Felix determines
                  this based on the existing IP pools. [Default: nil (unset)]'
                type: boolean
              vxlanMTU:
                description: 'VXLANMTU is the MTU to set on the IPv4 VXLAN tunnel
                  device. See Configuring MTU [Default: 1410]'
                type: integer
              vxlanMTUV6:
                description: 'VXLANMTUV6 is the MTU to set on the IPv6 VXLAN tunnel
                  device. See Configuring MTU [Default: 1390]'
                type: integer
              vxlanPort:
                type: integer
              vxlanVNI:
                type: integer
              wireguardEnabled:
                description: 'WireguardEnabled controls whether Wireguard is enabled.
                  [Default: false]'
                type: boolean
              wireguardHostEncryptionEnabled:
                description: 'WireguardHostEncryptionEnabled controls whether Wireguard
                  host-to-host encryption is enabled. [Default: false]'
                type: boolean
              wireguardInterfaceName:
                description: 'WireguardInterfaceName specifies the name to use for
                  the Wireguard interface. [Default: wg.calico]'
                type: string
              wireguardKeepAlive:
                description: 'WireguardKeepAlive controls Wireguard PersistentKeepalive
                  option. Set 0 to disable. [Default: 0]'
                type: string
              wireguardListeningPort:
                description: 'WireguardListeningPort controls the listening port used
                  by Wireguard. [Default: 51820]'
                type: integer
              wireguardMTU:
                description: 'WireguardMTU controls the MTU on the Wireguard interface.
                  See Configuring MTU [Default: 1420]'
                type: integer
              wireguardRoutingRulePriority:
                description: 'WireguardRoutingRulePriority controls the priority value
                  to use for the Wireguard routing rule. [Default: 99]'
                type: integer
              workloadSourceSpoofing:
                description: WorkloadSourceSpoofing controls whether pods can use
                  the allowedSourcePrefixes annotation to send traffic with a source
                  IP address that is not theirs. This is disabled by default. When
                  set to "Any", pods can request any prefix.
                type: string
              xdpEnabled:
                description: 'XDPEnabled enables XDP acceleration for suitable untracked
                  incoming deny rules. [Default: true]'
                type: boolean
              xdpRefreshInterval:
                description: 'XDPRefreshInterval is the period at which Felix re-checks
                  all XDP state to ensure that no other process has accidentally broken
                  Calico''s BPF maps or attached programs. Set to 0 to disable XDP
                  refresh. [Default: 90s]'
                type: string
            type: object
        type: object
    served: true
    storage: true
status:
  acceptedNames:
    kind: ""
    plural: ""
  conditions: []
  storedVersions: []

and felixconfiguration yaml given below:

apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
  name: default
spec:
  ipv6Support: false
  ipipMTU: 1400
  chainInsertMode: Append
  prometheusMetricsEnabled: true

the felixconfiguration resource was created as given below:

apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
  creationTimestamp: "2022-07-26T04:49:16Z"
  name: default
  resourceVersion: "23415834"
  uid: e5ee2faa-62a1-44ff-83a8-3019023e9ba5
spec:
  bpfLogLevel: ""
  chainInsertMode: Append
  floatingIPs: Disabled
  ipipMTU: 1400
  ipv6Support: false
  logSeverityScreen: Info
  prometheusMetricsEnabled: true
  reportingInterval: 5s

Even with Felixconfiguration done the felix metrics are not getting published

curl http://localhost:9091/metrics
curl: (7) Failed to connect to ::1: No route to host

Please help with this.

Is it an expected behaviour? How can I get felix metrics with etcd datastore?

@lwr20 @caseydavenport @doucol @sethmccombs @electricjesus @fasaxc @lmm @lxpollitt @matthewdupre @mazdakn @mgleung @mikestephen @neiljerram @ozdanborne @penkeysuresh @peterkellydev @song-jiang @tmjd

PLEASE HELP!

caseydavenport commented 2 years ago

@crujzo it's bad form to ping so many people on a github issue like this. Please don't do that in the future.

caseydavenport commented 2 years ago

Felix Prometheus metrics are not enabled by default, and are documented in the following locations:

Please read through those. If you're still having trouble, you'll need to update this issue to include the steps you took to enable Felix metrics so that I can help troubleshoot.

crujzo commented 2 years ago

@caseydavenport - I have updated the problem statement with more information. Please have a look at it and let me know if any specific information is required.

PS: I agree with your point of tagging so many people but I was not aware who would be the right person to tag. I will abstain to tag so many members in future :)

lwr20 commented 2 years ago
curl http://localhost:9091/metrics
curl: (7) Failed to connect to ::1: No route to host

Looks like your host's definition of localhost is ipv6, but you don't have ipv6 enabled. Either fix that or use the node's IP instead.

caseydavenport commented 2 years ago

I agree with your point of tagging so many people but I was not aware who would be the right person to tag

We monitor GitHub issues proactively, so no need to tag anyone when raising an issue.

crujzo commented 2 years ago

@caseydavenport @lwr20 - Thanks for your replies.

Apparently, it is not the ipv6 issue. I am able to get the kube-calico-metrics on port 9094 with localhost, below is my node status and output of metrics

sh-4.2# calicoctl node status
Calico process is running.

IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS  |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+---------------+-------------------+-------+----------+-------------+
| 10.100.114.41 | node-to-node mesh | up    | 19:40:32 | Established |
| 10.100.114.43 | node-to-node mesh | up    | 19:40:33 | Established |
+---------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.
sh-4.2# curl http://localhost:9091/metrics
curl: (7) Failed to connect to ::1: No route to host
sh-4.2#
sh-4.2# curl http://localhost:9094/metrics
# HELP go_gc_cycles_automatic_gc_cycles_total Count of completed GC cycles generated by the Go runtime.
# TYPE go_gc_cycles_automatic_gc_cycles_total counter
go_gc_cycles_automatic_gc_cycles_total 12
# HELP go_gc_cycles_forced_gc_cycles_total Count of completed GC cycles forced by the application.
# TYPE go_gc_cycles_forced_gc_cycles_total counter
go_gc_cycles_forced_gc_cycles_total 0
# HELP go_gc_cycles_total_gc_cycles_total Count of all completed GC cycles.
# TYPE go_gc_cycles_total_gc_cycles_total counter
go_gc_cycles_total_gc_cycles_total 12
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.000147715
go_gc_duration_seconds{quantile="0.25"} 0.000192316
go_gc_duration_seconds{quantile="0.5"} 0.000221822
go_gc_duration_seconds{quantile="0.75"} 0.000270066
go_gc_duration_seconds{quantile="1"} 0.000780084
go_gc_duration_seconds_sum 0.003167649
go_gc_duration_seconds_count 12 
..................
..............

also I tried with node ip, below is the o/p of the same:

sh-4.2# hostname -i
10.100.114.42
sh-4.2#
sh-4.2# curl http://10.100.114.42:9091/metrics
curl: (7) Failed connect to 10.100.114.42:9091; Connection refused

below few other details:

sh-4.2# pwd
/etc/cni/net.d
sh-4.2# ls -lrth
drwxr-xr-x 2 root root 4.0K Jul 27 01:10 calico-tls
-rw-r--r-- 1 root root  887 Jul 27 01:10 10-calico.conflist
-rw------- 1 root root 2.6K Jul 27 01:10 calico-kubeconfig

10-calico.conflist

sh-4.2# cat 10-calico.conflist
{
  "name": "k8s-pod-network",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "calico",
      "log_level": "info",
      "log_file_path": "/var/log/calico/cni/cni.log",
      "etcd_endpoints": "https://10.100.114.81:2379,https://10.100.114.82:2379,https://10.100.114.83:2379",
      "etcd_key_file": "/etc/cni/net.d/calico-tls/etcd-key",
      "etcd_cert_file": "/etc/cni/net.d/calico-tls/etcd-cert",
      "etcd_ca_cert_file": "/etc/cni/net.d/calico-tls/etcd-ca",
      "mtu": 0,
      "ipam": {
          "type": "calico-ipam"
      },
      "policy": {
          "type": "k8s"
      },
      "kubernetes": {
          "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
      }
    },
    {
      "type": "portmap",
      "snat": true,
      "capabilities": {"portMappings": true}
    },
    {
      "type": "bandwidth",
      "capabilities": {"bandwidth": true}
    }
  ]
}

Also in the calico-node DaemonSet manifest , I have the below config for ipv6

- name: FELIX_DEFAULTENDPOINTTOHOSTACTION
   value: "ACCEPT"
- name: FELIX_IPV6SUPPORT
   value: "false"
- name: FELIX_HEALTHENABLED
   value: "true"

Please help!

caseydavenport commented 2 years ago

am able to get the kube-calico-metrics on port 9094 with localhost,

Calico kube-controllers doens't run with host networking, so you need to run this from within the kube-controllers pod, which will likely have a different result for localhost.

as we can see NO felixconfigurations object is created as nothing no object related to felixconfiguration is given in the default manifest file calico with etcd datastore When I created felixconfiguration custom resource definition as it is created in case-1 with below yaml

I just realized this part of your post - Calico CustomResourceDefinitions aren't used in etcd mode. That's the entire point of etcd mode - it reads data from etcd, and not from CRDs. So creating the felixconfiguration CRD is expected to have effect. You need to use calicoctl to write a FelixConfiguration into etcd directly, or use environment variables.