hellt / vrnetlab

Make VM-based Network OSes run in Containerlab
https://containerlab.dev
MIT License
129 stars 88 forks source link

Pass-through/transparent management interfaces #268

Closed vista- closed 2 weeks ago

vista- commented 1 month ago

This PR adds pass-through as a mode for management interfaces. The default mode for management interfaces remains host-forwarded (which is the previous and only mode supported for management interfaces).

The pass-through mode functions by simply creating yet another tc mirred interface for the NOS VM management NIC instead of binding it to a user-mode, host-forwarded one.

The main advantages of transparently passing management traffic is:

The downside of this approach is that it is no longer possible to pass traffic directly from the vrnetlab container, which has two main implications, which means you cannot install/update packages, download files via curl, etc in the container. Pre-defined exception traffic originating from outside the container (e.g. to the QEMU serial port listening on port 5000) can be selectively directed to the container.

With this change, three NOSes default to pass-through/transparent management:

All other NOS types should not be impacted by this change.

The management interface mode can be overridden by passing the envvar CLAB_MGMT_PASSTHROUGH (true/false).

[!TIP] For developers:

The __init__ constructor of vrnetlab.VM now has an additional parameter mgmt_passthrough, which defaults to False. Setting this to True creates a tc mirred interface for the management/first NIC instead of the host-forwarded type. This can be overridden by the envvar described above, and the resolved value can be accessed through the self.mgmt_nic_passthrough attribute of vrnetlab.VM.

Two additional convenience attributes are also available for automatically generating startup configs:

  • self.mgmt_address_ipv4 contains the management IP of the node in CIDR format. This is either 10.0.0.15/24 for host-forwarded mode, or the actual management IP for pass-through mode.
  • self.mgmt_gw_ipv4 contains the management default gateway for the node. 10.0.0.2 for host-forwarded mode, or the actual management gateway IP for pass-through mode.

To add new exception traffic that should be directed towards the VM, you need to add a new tc filter rule. An example rule is the following, for the QEMU serial port listening on TCP port 5000-5007.

tc filter add dev eth0 ingress prio 1 protocol ip flower ip_proto tcp dst_port 5000-5007 action pass

The exception traffic filters should be added first (lower priority numbers), before the eth0 mirred redirect rule, which should be the last.

NOS Implementations

SR OS

The work is done in #272

plajjan commented 1 month ago

Very cool!

As an anecdote, in the early days, probably in the first year after I wrote vrnetlab, I presented to a few NTT guys and they were curious if they could bridge into the mgmt interfaces instead of doing NAT. At the time, the whole networking story in docker looked quite different and doing what we do now in containerlab wasn't easy, which is why vr-xcon was built in the first place. We also did a lot of work around SnabbSwitch (now Snabb - https://github.com/snabbco/snabb) and I had these ideas on using it to apply flexible forwarding rules, like take port 5000 (serial console) and send to the container while the rest could be forwarded transparently to the VM. We never had time / it was never important enough to warrant the work. But here we are, a lot later, and this PR is looking really good, very cool :)

I can't help but wonder if we can make the tc rules more specific and only redirect specific traffic, and that way, we could still have some ports be forwarded to the container so we can reach the serial port!? WDYT?

vista- commented 1 month ago

Thanks for the review @plajjan!

I pushed two commits to address your comments, which should work in theory, but they're still untested at the moment. I'm not sure if I can run tests today, but I'll try testing the changes over the weekend.

vista- commented 1 month ago

Over the weekend I tried several tc filters for creating the port 5000 exception, but sadly, none of the approaches worked and I couldn't see a TCP SYNACK being sent by the container network stack:

I also set a static neighbor entry for the container to be able look up a destination MAC; this would be solved by yet another tc filter rule mirroring ARP responses inbound to the container -- this didn't help either, sadly. The rule I tried for that is the following:

along with setting sysctl net.ipv4.conf.eth0.arp_accept=1 and setting the eth0 promisc mode to on.

If you have any suggestions, please let me know! At this point, there's still a way to get the serial console working, which is still to invoke telnet within the container, like so:

docker exec -it <container_name> telnet 127.0.0.1 5000

However, it would be great if we could find a working solution that allows remote endpoints to also connect to the serial console of a given vrnetlab-based node.

vista- commented 1 month ago

Correction to the post above: if you clone the management MAC of the VM, and mirror both ARP requests and responses incoming to the management interface of the VM, the serial port workaround now works! I'm marking the PR as ready for review.

michelredondo commented 1 month ago

Looks great! Could you also please add get_mgmt_address_ipv6/get_mgmt_gw_ipv6 so it's v4/v6 ready? Thanks

Based on your ideas I have added support for SROS in https://github.com/hellt/vrnetlab/pull/272 In this case I'm using tc to rewrite traffic from VM so it can also access the container (in clab SROS we use a tftp server to download the license and save config).

vista- commented 1 month ago

@michelredondo I added IPv6 support to the management address/gw helper functions. Do note that v4/v6 has been combined into the same function returning a tuple.

hellt commented 2 weeks ago

@vista- I am changing the base for this PR to be the transparent-mgmt-intfs-dev branch that I just created. I want to merge your PR into this base branch and then target #272 to this dev branch as well.

In other words, your branch becomes the base for other systems to test/align/implement this method.

michelredondo commented 1 week ago

One important thing to consider is that docker implements POSTROUTING MASQUERADE rules for mgmt. prefixes:

Chain POSTROUTING (policy ACCEPT 1025 packets, 256K bytes)
 pkts bytes target     prot opt in     out     source               destination
  172 12925 MASQUERADE  all  --  *      !br-inline-mgmt  100.103.2.0/24       0.0.0.0/0
    6   392 MASQUERADE  all  --  *      !br-91cef2fdceb8  172.16.172.0/24      0.0.0.0/0
    0     0 MASQUERADE  all  --  *      !br-8671008f2987  192.168.121.0/24     0.0.0.0/0
  16M 1216M MASQUERADE  all  --  *      !br-104b3e59e170  172.80.80.0/24       0.0.0.0/0

So all traffic leaving the node will be source-nated to the same IP address , which contradicts the whole purpose of transparency. A workaround is to insert a rule that just accepts the traffic before the masquerade kicks in.