projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
5.87k stars 1.31k forks source link

Calico potentially losing track of state intermittently? #8942

Closed henryzhao95 closed 3 weeks ago

henryzhao95 commented 2 months ago

Expected Behavior

We have Argo CD running in numerous Kubernetes clusters. This includes:

We have Calico NetworkPolicies in place to allow the ingress to these ports, for example:

  ingress:
    - action: Allow
      destination:
        ports:
          - 26379
          - 6379
      protocol: TCP
      source:
        namespaceSelector: name == 'argocd'
        selector: >-
          app.kubernetes.io/name in {'argocd-redis-ha',
          'argocd-redis-ha-haproxy', 'argocd-server', 'argocd-repo-server',
          'argocd-application-controller'}
  order: 150
  selector: app.kubernetes.io/name in {'argocd-redis-ha', 'argocd-redis-ha-haproxy'}
  types:
    - Ingress

And so we expect Argo to work, with nothing being denied. (We have a log & deny all rule at the end too.)

Current Behavior

From time to time (like once a month for a cluster), randomly, on rare occasions not coinciding with new calico-node or Argo pods, we will see a burst of 3 of blocked Argo flows spaced roughly 100 seconds apart e.g. 1 at 4:57:39 pm, 1 at 4:59:19 pm, 1 at 5:01:00 pm.

These blocked flows report the inverse of the flow we'd normally expect. e.g. Blocked: argocd-redis-ha-server:26379 --> argocd-redis-ha-haproxy:40962 Expected flow: argocd-redis-ha-haproxy:40962 --> argocd-redis-ha-server:26379

e.g. Blocked: argocd-redis-ha-server:6379 --> argocd-redis-ha-proxy:51418 Expected flow: argocd-redis-ha-proxy:51418 --> argocd-redis-ha-server:6379

I don't see anything in the Calico pod logs out of the ordinary. My understanding of networking is weak, but it feels like Calico which should be stateful, is potentially losing track of the state of the network flows? Is that possible? Or are there any other theories?

Possible Solution

Steps to Reproduce (for bugs)

1. 2. 3. 4.

Context

Your Environment

fasaxc commented 2 months ago

Yes, Calico is a stateful firewall, we track connections in the kernel's connection tracking "conntrack" table. You can see conntrack entries with conntrack -L to list all or conntrack -E to watch for changes.

The fact that the denied packets are in the reverse direction suggests that there was a previous connection that was being tracked but it was cleaned up. This could be for a few reasons:

sridhartigera commented 3 weeks ago

@henryzhao95 Closing this issue. Reach out if you have more information.