[BUG] TCP DNS Traffic Blocked Despite Security Group Rule Allowing Egress to DNS Service

wfnuser commented 2 months ago

Kube-OVN Version

v1.12.8

Kubernetes Version

Server Version: v1.26.9

Operation-system/Kernel Version

"Ubuntu 22.04.2 LTS"

Description

We have encountered an issue in our Kubernetes cluster managed by Kube-OVN where a security group (SG) rule is configured to allow egress traffic from a specific pod to the DNS service at the Cluster IP 10.96.0.10. According to our configuration, this rule should permit all traffic to the DNS service. However, we are observing unexpected behavior with different protocols:

When accessing the DNS service using UDP (e.g., with a standard DNS query), the traffic passes without any issues, which is the expected behavior.
Conversely, when we attempt to access the DNS service using TCP (e.g., using dig +tcp), the traffic is blocked, which contradicts our SG rule configuration.

Our sg looks like:

apiVersion: kubeovn.io/v1
kind: SecurityGroup
metadata:
  creationTimestamp: "2024-05-07T09:01:09Z"
  generation: 30
  name: user-8281-sg
  resourceVersion: "645915870"
  uid: cfeb4eea-18fb-4d65-9b89-8befd946dd3e
spec:
  allowSameGroupTraffic: true
  egressRules:
  - ipVersion: ipv4
    policy: allow
    priority: 30
    protocol: all
    remoteAddress: 10.96.0.10
    remoteType: address
  - ipVersion: ipv4
    policy: deny
    priority: 31
    protocol: all
    remoteAddress: 10.0.0.0/8
    remoteType: address
  - ipVersion: ipv4
    policy: allow
    priority: 200
    protocol: all
    remoteAddress: 0.0.0.0/0
    remoteType: address

And if we add one more rule for the pod IP () behind dns service (10.96.0.10)

  - ipVersion: ipv4
    policy: allow
    priority: 30
    protocol: all
    remoteAddress: 10.16.41.31
    remoteType: address

The dns will work again.

I think the real problem is not related to dns. If there are other pod ip behind service ip, you set allow rules only for service ip. It seems simply not working. You have to set allow rules for pod ip too.

Steps To Reproduce

Create sg like following:

apiVersion: kubeovn.io/v1
kind: SecurityGroup
metadata:
  creationTimestamp: "2024-05-07T09:01:09Z"
  generation: 30
  name: user-8281-sg
  resourceVersion: "645915870"
  uid: cfeb4eea-18fb-4d65-9b89-8befd946dd3e
spec:
  allowSameGroupTraffic: true
  egressRules:
  - ipVersion: ipv4
    policy: allow
    priority: 30
    protocol: all
    remoteAddress: 10.96.0.10 (dns service cluster ip)
    remoteType: address
  - ipVersion: ipv4
    policy: deny
    priority: 31
    protocol: all
    remoteAddress: 10.0.0.0/8
    remoteType: address
  - ipVersion: ipv4
    policy: allow
    priority: 200
    protocol: all
    remoteAddress: 0.0.0.0/0
    remoteType: address

bind it to some pod.

It's very intereting that you can ping and even dig rds-3r4ybkarqxwg-pxc.user-1993.svc.cluster.local srv successfuly.

Current Behavior

Cannot access dns without adding pod ip to sg.

Expected Behavior

Can access dns with only service ip in sg.

wfnuser commented 2 months ago

This is the logs for acl rules :

from-lport  2270 (inport == @ovn.sg.user.8281.sg && ip4 && ip4.dst == 10.100.27.20) allow-related log(severity=info)
from-lport  2270 (inport == @ovn.sg.user.8281.sg && ip4 && ip4.dst == 10.16.61.90) allow-related log(severity=info)

And the command I run is:

curl 10.100.27.20

As you can see, from ACL's perspective, only the SYNC TCP packet's dst ip is 10.100.27.20, which is the cluster ip. All the following TCP packet's dst ip is somehow converted to 10.16.61.90 which is the pod ip.

github-actions[bot] commented 2 weeks ago

Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.

wfnuser commented 2 weeks ago

Any update on this issue?

bobz965 commented 2 weeks ago

Any update on this issue?

Sorry, too busy to fix this.

bobz965 commented 2 weeks ago

you are using default vpc ovn-cluster ?

how about setting ENABLE_LB false ?

wfnuser commented 2 weeks ago

you are using default vpc ovn-cluster ?

how about setting ENABLE_LB false ? Yep. Default vpc.

We have find another way to work around. Haha. I just comment to remind you there is a issue, maybe you can check it out when you have time. It seems github will automatically close this issue if I don't.

bobz965 commented 2 weeks ago

In my opinion:

when enabling lb, the vip could nated as its backend IP by switch lb. so the traffic blocked.

if you disable lb, the traffic to the VIP will go through the node, nated by ipvs

kubeovn / kube-ovn