harvester / harvester

Open source hyperconverged infrastructure (HCI) software
https://harvesterhci.io/
Apache License 2.0
3.83k stars 321 forks source link

[BUG] Neuvector cannot learn network policy for VM traffic when Harvester VM network type is bridge #6284

Open mingshuoqiu opened 2 months ago

mingshuoqiu commented 2 months ago

Describe the bug The VM traffic can be learned when VM's NIC in masquerade mode, but can not be learned when the NIC is bridge mode

To Reproduce Steps to reproduce the behavior:

  1. execute egress test and ingress test for Harvester VM 
    egress test: VM access google.com
    Ingress test: harvester node access specific service in VM (eg:  nodejs service in VM)
  2. 
    ### for bridge  type network:
    Egress test: Neuvector cannot learn network policy and cannot create conversation -- failed

Ingress test: Neuvector cannot learn network policy and cannot create conversation -- failed

for masquerade type network

Egress test: Neuvector can learn network policy and create conversation -- pass There are two network rules, one is from VM group to external, another is from workload:ip to VM group (workload ip is VM's internal IP such as 10.0.2.2), not sure the second rule is necessary or not. Need to confirm with developer

Ingress test: Neuvector cannot learn network policy, conversation can be generated (action is open) -- failed (Neuvector should learned a network policy from nodes -> VM group)


**Expected behavior**
VM traffic should be learned by NeuVector when VM's NIC is either masquerade mode or bridge mode 
innobead commented 2 months ago

cc @starbops @rrajendran17

mingshuoqiu commented 2 months ago

Summary of the debug from NV team:

## Debug steps for all type of VMs:
1. Login in VM network namesapce via "nsenter -t pid -n" command, check interfaces 
2. Run "tcpdump -i interface -nnvvSe" in VM network namesapce to start to monitor packet in specific interface
3. Login in VM console, send egress or ingress traffic
4. After step 3, check packets in VM network namesapce

###  result for masquerade type VM (its network is management Network):
1. Neuvector can learn network policy 
2. There are eth0/k6t-eth0/tap0 interfaces in VM, traffic is in/out in eth0 interface. 
In the output of "tcpdump", the mac address used for in/out traffic is eth0's mac address

###  result for bridge type VM (its network is management Network):
1. Neuvector can NOT learn network policy for 
2. There are eth0/eth0-nic/k6t-eth0/tap0 interfaces in VM
3. In the output of "tcpdump", traffic is in/out in eth0-nic interface. 
But the mac address used for in/out traffic is NOT eth0-nic's mac address, and it is not eth0/k6t-eth0/tap0 interface's mac address.
The mac address used for in/out traffic is enpls0's mac address (this interface is displayed in VM console)
(we thought it maybe the root cause why neuvector cannot learn network policy)

###  result for bridge type VM (its network is a VM Network):
1. Neuvector can NOT learn network policy 
2. There are eth0/37a8eec1ce1-nic/k6t-37a8eec1ce1/tap37a8eec1ce1/pod37a8eec1ce1 interfaces in VM
3. In the output of "tcpdump", traffic is in/out in 37a8eec1ce1-nic interface. 
But the mac address used for in/out traffic is NOT 37a8eec1ce1-nic mac address, it is pod37a8eec1ce1's mac address
mingshuoqiu commented 2 months ago

Just confirmed that there's no ARP cache at all when all NICs of the VM are created with bridge mode even we can see ARP request/reply from the pod network. So we can't use the arp cache to find out the mac address of the NIC on the VM from ip neigh like we do the same on the VM with masquerade mode NICs. k6t-bridge-nic

We need the fdb entry of the netns where the VM belongs to find out the source mac address of particular traffic flow from/to the VM.

mingshuoqiu commented 2 months ago

To find out the matching mac address of the bridge mode NICs on the VM from the VM traffic, I do the following experiment

  1. Create 2 NICs of bridge mode. The first one connect the the mgmt network, the other connects the the VM network 1_different_bridge_mode
  2. Get the real IP/MAC address of 2 NICs on the VM 2_different_NICs_addr
  3. On the mgmt node, use ip nets xxx ip link show to find out the interfaces created by multus-cni of the pod network. 3_NICs_on_the_pod_network_of_the_VM It should matches the topology of the Harvester Network Deep Dive. https://docs.harvesterhci.io/assets/images/topology-92ab59d983544bad738764a2105c9a06.png
  4. There's no entry for the VM's NIC when no traffic pumped since the bridge didn't not learn from any traffic yet. 4_no_fdb_when_no_traffic
  5. simple ping from both NICs on the VM and the bridge should learned the mac address between k6t-interfaces and VM's NIC 5_fdb_entry_after_pump_traffic

Then we can use this information for NeuVector to 1-1 map the pod's NIC to the VM's NIC.

rrajendran17 commented 2 months ago

1.When you say "Neuvector cannot learn network policy for VM traffic", are you checking a particular output of a command ? can you post the command you are checking ? 2.Do we have the output of "bridge fdb show" and "bridge vlan show" during the issue time ? 3.Are there any vlan tagged traffic configured for the vlan vm networks?

mingshuoqiu commented 2 months ago
  1. They use NeuVector and trigger a connection between VM and external network. NeuVector would draw a line on UI to mark source and destination, but the source on bridge mode are always incorrect.
  2. We will have the bridge vlan show output which has the port information of the pod network. And bridge fdb show will also show permanent entries for mcast addr and port addr
  3. Not in this test case.
rrajendran17 commented 2 months ago

@mingshuoqiu Trying to get more understanding on the issue, 1.Is the communication/ping working to the external environment but there is only discrepancy in the source address shown in UI? or ping is failing to external ?

2.My understanding is, there will be a veth pair connected to the vm interface (created by bridge) and we need to look into the src mac learnt on veth interface corresponding to the vm interface in the bridge/vlan network in "bridge fdb show". Please correct me if I am wrong here. I do not understand in what cases we need to check bridge entries under k6-t and tap interfaces.

mingshuoqiu commented 2 months ago
  1. Yes, the ping/curl between VM and external network is OK, but the source IP can't show correctly because UI can't tell from the mac address.
  2. The veth pair in the bridge mode can't be used to identify as the source of the traffic flow. And Neuvector running in the pod has no way to know the NIC's mac address of a specific VM. So they need a method to get the network interface information from VM.
mingshuoqiu commented 1 month ago

NV team, could you share some feedback on what we suggest? Does the bridge fdb show command helps solve this problem or not?

esther-suse commented 1 month ago

If there is the traffic to/from the VM, we can see the mac address in bridge fdb show. But this mac address entry is not permanently.

When a POD is deployed, enforcer go through all the interfaces of a POD and get its MAC address as well as IP address per interface. We create a socket and bind to interface to sniff packet, based on packet’s MAC address we decide whether packet is to/from POD. fdb is time sensitive and packet triggerred, it is unreliable to use fdb to get mac address for interface. Is it possible that harvester team can attach the real MAC address to interface? Thanks.

mingshuoqiu commented 1 month ago

@rrajendran17 any better idea?

mingshuoqiu commented 1 month ago

If there is the traffic to/from the VM, we can see the mac address in bridge fdb show. But this mac address entry is not permanently.

When a POD is deployed, enforcer go through all the interfaces of a POD and get its MAC address as well as IP address per interface. We create a socket and bind to interface to sniff packet, based on packet’s MAC address we decide whether packet is to/from POD. fdb is time sensitive and packet triggerred, it is unreliable to use fdb to get mac address for interface. Is it possible that harvester team can attach the real MAC address to interface? Thanks.

I think the fdb entry has its life cycle and it should be long enough to record the mapping inside NV when traffic flows. The mapping could exist until user do the VM migration. Or could you point out the better option that you'd like Harvester to offer?

gfsuse commented 1 month ago

we need to have a correct mac address attached to POD interface(s), this mac address should be used as a SRC/DST ethernet MAC address, we rely on this mac address to decide whether packet is for/from this POD, we only monitor POD's interface for a period of time after POD is brought up or Neuvector enforcer is just deployed. In Harvester bridge mode's case, MAC is only available when traffic is generated which requires infinite monitoring on every POD which is resource consuming, and it is only available through 'bridge fdb show', so it is not feasible foor us to constantly monitor bridge fdb to search for a MAC. The best way is to have a correct MAC address attached to one of POD interfaces, thanks

mingshuoqiu commented 1 month ago

@starbops do we have the NIC's mac address of the VM which we can get from VMI or other information?

ibrokethecloud commented 1 month ago

the multus annotation on the launcher pod records the mac address, for example

    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "k8s-pod-network",
          "ips": [
              "10.52.2.87"
          ],
          "default": true,
          "dns": {}
      },{
          "name": "default/workload",
          "interface": "pod37a8eec1ce1",
          "mac": "c2:f0:c8:37:e4:f4",
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks: '[{"name":"workload","namespace":"default","mac":"c2:f0:c8:37:e4:f4","interface":"pod37a8eec1ce1"}]'
ibrokethecloud commented 1 month ago

the associated VirtualMachine and VirtualMachineInstance will also contain info about the mac addresses

gfsuse commented 1 month ago

with multus type bridge network, we can figure out MAC address even through 'ip address' command, one of the DOWN interface has the MAC address. But another bridge type network just use the default management network, in this case the MAC address is not available.

mingshuoqiu commented 1 month ago

@ibrokethecloud the case George mentioned is the NIC created by the following method

network_bridge

Can we have the mac address information for this case?

rrajendran17 commented 1 month ago

@gfsuse @mingshuoqiu The traffic from the vm will use mac address configured on that interface in the vm guest os (eg, enp1s0 interface mac). The mac address on the interface in the vm pod will be copied from specific interface from the pod vm based on the network type (masquerade,bridge)

case 1: vm in mgmt network with type masquerade eth0 mac from vm pod is copied to the enp1s0 interface of vm guest os

case 2: vm in mgmt network with type bridge eth0 mac from vm pod is copied to the enp1s0 interface of vm guest os

case 3:vm in vm vlan network with single nic pod interface mac from vm pod is copied the enp1s0 interface of vm guest os

case 4: vm in mgmt network nic-1 and vm in vlan vm network nic-2 eth0 mac from vm pod is copied to the enp1s0 interface of vm guest os pod interface mac from vm pod is copied the enp2s0 interface of vm guest os

Note:The number of pod interfaces on a vm pod will be equal to the number of bridge type vlan vm networks created for a vm

@gfsuse When you scan for macs on vm pod after its deployed, can you scan eth0 interfaces + all pod interfaces created on the vm pod. By this way, when the traffic comes from a vm, you can map it to a particular pod.(both mgmt and bridge type interfaces) One drawback of this is, you could get only mac addresses from this method if vm interface is bridge type.

If you want to get ip addresses also from the interfaces from the vm (secondary interfaces), then you could use the following commands a.kubectl get vmis b.kubectl get vmi -o yaml The step b, will you an output where you could parse for "interfaces" which will give details of both mac and ip for all the interfaces present in that vm pod.

Example output of step b, (I created two interfaces in the vm, one in mgmt and other in bridge) interfaces:

I feel the second method "kubectl get vmi -o yaml will be the easiest way to get all interface information(mac + ip) from a vm pod which you could later use to map the traffic to particular pod when you receive traffic.

mingshuoqiu commented 2 weeks ago

@gfsuse @esther-suse any update for the bridge mode on management network?

gfsuse commented 1 week ago

currently we don't have function API to get kubernetes VMI resources on our enforcer, need to explore the kubernetes API to see how to get VMI resourse. enforcer can only scan pod/container through runtime on each worker node, best way is stilll that harvester can reflect its VM's interface/mac address correctly on corresponding pod

Faker523 commented 5 days ago

currently we don't have function API to get kubernetes VMI resources on our enforcer, need to explore the kubernetes API to see how to get VMI resourse.

do you have plan to achive this ?