linux-test-project / ltp

Linux Test Project (mailing list: https://lists.linux.it/listinfo/ltp)
https://linux-test-project.readthedocs.io/
GNU General Public License v2.0
2.28k stars 1k forks source link

IPsec xfrm MTU discovery related testing (3 hosts) #920

Open pevik opened 2 years ago

pevik commented 2 years ago

Based on report from Jiri Bohac: ipv6ready/USGv6 testsuite often find bug that IPsec broken by two types of MTU related bugs with wrongly calculated packet size:

Testing should be: 1) sender SUT sents big packate 2) gets from receiver error ICMPv6 destination unreachable / packet too big (PTB) / MTU=1280 3) sender SUT modify route cache PMTU size for receiver and sends other packets with smaller size (correctly set TCP MSS) or fragmented (or in IPsec-tunnel mode). It's needed to test both IPsec-transport and IPsec-tunnel

To achieve ICMPv6-PTB is possible with 3 host setup (sender, receiver and router in the middle; router has set on sending iface MTU 1280 - ICMPv6-PTB then send router's kernel). tst_net.sh supports more than 2 links, but real hosts are needed (i.e. using SSH, likely netns based setup which is much easier to setup will not work).

It should be possible also to somehow generate ICMPv6-PTB, then only two hosts would be needed and it would work on netns based setup. Not sure if this catches all possible bugs.

NOTE: existing IPsec tests are using ping, which works (sometimes TCP is broken, ping uses ICMP over UDP). LTP has also MTU based tests (if-mtu-change.sh), but they lack IPsec.

The test topology for 3 SUT is described in https://www.ipv6ready.org/docs/Phase2_IPsec_Interoperability_Latest.pdf in chapter 2, "For End-Node vs. End-Node Tr anspor t/Tunnel Mode Test":

There are two network links connected with a router (REF_ROUTER1). The network prefixes on these two links are:

PF0=2001:0db8:ffff:0000::/64
PF1=2001:0db8:ffff:0001::/64

The machine where the problem is observed is TGT_HOST1. TGT_HOST2 is a random machine on the other link.

TGT_HOST1_Link0=PF0::1
< connected by means of Link0=PF0 to > 
REF_ROUTER1_Link0=PF0::f
REF_ROUTER1_Link1=PF1::f
< connected by means of Link1=PF1 to >
TGT_HOST2_Link1=PF1::1

Bug can be reproduced with three KVM VMs and two linux bridges:

Network setup on the KVM host:

U=jbohac
ip l a l0 type bridge 
ip l s up l0
ip a a 2001:0db8:ffff:0000::100/64 dev l0

ip l a l1 type bridge 
ip l s up l1
ip a a 2001:0db8:ffff:0001::100/64 dev l1

ip tuntap add mode tap user $U dev h1l0
ip link set h1l0 master l0
ip link set up h1l0

ip tuntap add mode tap user $U dev h2l1
ip link set h2l1 master l1
ip link set up h2l1

ip tuntap add mode tap user $U dev r1l0
ip link set r1l0 master l0
ip link set up r1l0

ip tuntap add mode tap user $U dev r1l1
ip link set r1l1 master l1
ip link set up r1l1

VMs are started with:

qemu-kvm -nic tap,ifname=h1l0,script=no,downscript=no,mac=02:00:00:00:01:00,model=rtl8139 -m 1024 -nographic h1.img
qemu-kvm -nic tap,ifname=h2l1,script=no,downscript=no,mac=02:00:00:00:02:00,model=rtl8139 -m 1024 -nographic h2.img
qemu-kvm \
        -nic tap,ifname=r1l0,script=no,downscript=no,mac=02:00:00:00:03:00,model=rtl8139 \
        -nic tap,ifname=r1l1,script=no,downscript=no,mac=02:00:00:00:03:01,model=rtl8139 \
        -m 1024 -nographic r1.img

Configuration on router (r1):

set eth0 IP address to 2001:db8:ffff:0::f/64 
set eth1 IP address to 2001:db8:ffff:1::f/64
sysctl net.ipv6.conf.all.forwarding=1

Configuration on h1:

set eth0 IP address to 2001:db8:ffff:0::1/64

ip x p f
ip x s f
ip -6 r f c
KEY=0x0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
ip xfrm state add src 2001:db8:ffff:1::1/64 dst 2001:db8:ffff:0::1/64 proto esp spi 2 reqid 2 mode tunnel auth sha256 $KEY enc aes $KEY
ip xfrm state add src 2001:db8:ffff:0::1/64 dst 2001:db8:ffff:1::1/64 proto esp spi 1 reqid 1 mode tunnel auth sha256 $KEY enc aes $KEY
ip xfrm policy add src 2001:db8:ffff:0::/64 dst 2001:db8:ffff:1::/64 dir out tmpl src 2001:db8:ffff:0::1/64 dst 2001:db8:ffff:1::1/64 proto esp reqid 1 mode tunnel
ip xfrm policy add dst 2001:db8:ffff:0::/64 src 2001:db8:ffff:1::/64 dir in tmpl src 2001:db8:ffff:1::1/64 dst 2001:db8:ffff:0::1/64 proto esp reqid 2 mode tunnel

Configuration on h2:

set eth0 IP address to 2001:db8:ffff:1::1/64

ip x s f
ip x p f
KEY=0x0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
ip xfrm state add src 2001:db8:ffff:1::1/64 dst 2001:db8:ffff:0::1/64 proto esp spi 2 reqid 2 mode tunnel auth sha256 $KEY enc aes $KEY
ip xfrm state add src 2001:db8:ffff:0::1/64 dst 2001:db8:ffff:1::1/64 proto esp spi 1 reqid 1 mode tunnel auth sha256 $KEY enc aes $KEY
ip xfrm policy add src 2001:db8:ffff:0::/64 dst 2001:db8:ffff:1::/64 dir in tmpl src 2001:db8:ffff:0::1/64 dst 2001:db8:ffff:1::1/64 proto esp reqid 1 mode tunnel
ip xfrm policy add dst 2001:db8:ffff:0::/64 src 2001:db8:ffff:1::/64 dir out tmpl src 2001:db8:ffff:1::1/64 dst 2001:db8:ffff:0::1/64 proto esp reqid 2 mode tunnel

Verify the setup: on h1: ping6 -s 1300 2001:db8:ffff:1::1

You should see replies and can observe ESP encrypted packets with tcpdump on r1

Reproduce the bug: on r1: ip l s mtu 1300 dev eth1

ping from h1 will now stop working; you will see ping: sendmsg: Invalid argument instead of replies From now on not even small packets pass from h1 to h2: ping6 -s 1 2001:db8:ffff:1::1 fails as well Problem will be fixed by flushing the routing cache on h1: ip -6 route flush cache.

coolgw commented 11 months ago
                                          Control Link
            +-------------------------------------------------------------------------+
            |                                                                         |
    +-------+-------+                                                         +-------+-------+
    |               +--Leth0(xx::0::1/64)-- Test Link 0 --Reth0(xx::0::2/64)--+               |
    |  Local Host   +                            :                            +  Remote Host  |
    |               +--Leth1(xx::1::1/64)-- Test Link 1 --Reth1(xx::1::2/64)--+               |
    +---------------+                                                         +---------------+

I try to use ssh setup in LTP like above, i would like setup Remote Host as Router , Local Host Leth0 as Host1 and Leth1 as Host2. And TP1(traffic path 1) is Leth0->Reth0->Reth1->Leth1. (Ping xx::1::1/64 from Local Host)

But from Local Host view, if you ping xx::1::1/64, the TP(traffic path 2) will directly from Leth0 -> Leth1.

How to Make TP1 instead of TP2 in this LTP setup?

Correct me if any misunderstanding and welcome any comments :) .

coolgw commented 11 months ago
                                          Control Link
        +-------------------------------------------------------------------------+
        |                                                                         |
+-------+-------+                                                         +-------+-------+
|               +--Leth0(xx::0::1/64)-- Test Link 0 --Reth0(xx::0::2/64)--+               |
|  Local Host   +                                                         +  Remote Host  |
|               +                                                         +               |
+---------------+                                                         +---------------+

+------------------Remote Host---------------------------+                                       
|                        +-----  NS LTP------+           |                                       
|  Veth1(xx::1::1/64)--  + Veth2(xx::1::2/64)+           |                                       
|                        +-------------------+           |                                                 
+--------------------------------------------------------+                                                 

I found another solution is we use ssh scenario combine with network namesapce in LTP setup. Create network namespace LTP within Remote Host, Veth2 in NS LTP. Veth1(In gloable Name space) connect with Veth2(In LTP name space). Traffic Path is Leth0->Reth0->Veth1->Veth2. BTW: Remote Host need enable ip forwarding.

pevik commented 11 months ago

In the original talk Jiri Bohac did recommend not to use network namespaces. In this case it might be easier to start something from scratch (not to use tst_net.sh). Maybe using C API and get inspiration of lib/tst_net*.c?

coolgw commented 11 months ago

Network Design

                                          Control Link
        +-----------------------------------------------------------------------------------+
        |                                                                                   |
+-------+-------+                                                                   +-------+-------+
|               +--Leth0(fd00:2:1:1::2/64)-- Test Link 0 --Reth0(fd00:2:1:1::1/64)--+               |
|  Local Host   +                                                                   +  Remote Host  |
|               +                                                                   +               |
+---------------+                                                                   +---------------+

+------------------Remote Host----------------------------------------------------------+                                       
|                                    +----------------  NS LTP--------------+           |                                       
|  ltp_ns_veth2(fd00:1:1:1::2/64)--  + ltp_ns_veth1(fd00:1:1:1::1/64)       +           |                                       
|                                    +--------------------------------------+           |                                                 
+---------------------------------------------------------------------------------------+                                                 

Use ssh scenario combine with network namesapce in LTP setup. Create network namespace ltp_ns within Remote Host, Veth2 within ltp_ns. Veth1(In gloable Name space) connect with Veth2(In ltp_ns name space). Ping Traffic Path is ltp_ns_veth1->ltp_ns_veth2->Reth0->Leth0.
Change MTU size of Reth0 to check whether PTB work correct or not.

NOTE: Remote Host need enable ip forwarding.

Example for IPsec configuration:

Remote Host:192.168.10.253

sysctl -w net.ipv6.conf.all.forwarding=1

ip -6 addr add fd00:1:1:1::2/64 dev ltp_ns_veth2
ip -6 addr add fd00:2:1:1::1/64 dev eth0 #setup Reth0 ip address, replace name base your setup

ip netns exec ip -6 addr add fd00:1:1:1::1/64 dev ltp_ns_veth1
ip netns exec ltp_ns ip -6 route add default via fd00:1:1:1::2

Local Host: 192.168.10.254

ip -6 addr add fd00:2:1:1::2/64 dev eth1 #setup Leth0, replace name base your setup
ip -6 route add fd00:1:1:1::/64 via fd00:2:1:1::1 #must add this route, otherwise will match default route with next hop FE80::xx

Run following script on Remote Host (Setup IPsec tunnel)

./ipsec.sh fd00:1:1:1::1 fd00:2:1:1::2 fd00:1:1:1::/64 fd00:1:1:1::1 fd00:2:1:1::/64 fd00:2:1:1::2
#!/bin/sh
# manual-ipsec.sh

# Check parameters
if [ "$6" == "" ]; then
    echo "usage: $0 <local_ip> <remote_ip> <new_local_net> <new_local_ip> <new_remote_net> <new_remote_ip>"
    echo "creates an ipsec tunnel between two machines"
    exit 1
fi

SRC="$1"
DST="$2"
LOCAL="$3"
LOCAL_IP="$4"
REMOTE="$5"
REMOTE_IP="$6"

# Generate reqid and AES key
ID=0x`dd if=/dev/urandom count=4 bs=1 2> /dev/null| xxd -p -c 8`
KEY=0x`dd if=/dev/urandom count=20 bs=1 2> /dev/null| xxd -p -c 40`

ip netns exec ltp_ns  ip xfrm state flush
ip netns exec ltp_ns  ip xfrm policy flush
ip netns exec ltp_ns  ip xfrm state add src $SRC dst $DST proto esp spi $ID reqid $ID mode tunnel aead 'rfc4106(gcm(aes))' $KEY 128
ip netns exec ltp_ns  ip xfrm state add src $DST dst $SRC proto esp spi $ID reqid $ID mode tunnel aead 'rfc4106(gcm(aes))' $KEY 128
ip netns exec ltp_ns  ip xfrm policy add src $LOCAL dst $REMOTE dir out tmpl src $SRC dst $DST proto esp reqid $ID mode tunnel
ip netns exec ltp_ns  ip xfrm policy add src $REMOTE dst $LOCAL dir in tmpl src $DST dst $SRC proto esp reqid $ID mode tunnel
ip netns exec ltp_ns  ip xfrm policy add src $REMOTE dst $LOCAL dir fwd tmpl src $DST dst $SRC proto esp reqid $ID mode tunnel

# Login to the peer machine and execute the relevant commands
ssh root@192.168.10.254 /bin/bash << EOF
     ip xfrm state flush &&  ip xfrm policy flush
     ip xfrm state add src $SRC dst $DST proto esp spi $ID reqid $ID mode tunnel aead 'rfc4106(gcm(aes))' $KEY 128
     ip xfrm state add src $DST dst $SRC proto esp spi $ID reqid $ID mode tunnel aead 'rfc4106(gcm(aes))' $KEY 128
     ip xfrm policy add src $REMOTE dst $LOCAL dir out tmpl src $DST dst $SRC proto esp reqid $ID mode tunnel
     ip xfrm policy add src $LOCAL dst $REMOTE dir in tmpl src $SRC dst $DST proto esp reqid $ID mode tunnel
EOF
coolgw commented 11 months ago

In the original talk Jiri Bohac did recommend not to use network namespaces. In this case it might be easier to start something from scratch (not to use tst_net.sh). Maybe using C API and get inspiration of lib/tst_net*.c?

Thansk for your info, i will check lib/tst_net*.c try to find solution without use any network namespaces.

On latest detail design(See my latest comment) the traffic will go through from namespace host to outside real interface Reth0, and PTB will happen on Reth0(output), so my understanding(maybe wrong) it already behavior like real scenario since namespace only actor as a terminal host.

coolgw commented 11 months ago

In the original talk Jiri Bohac did recommend not to use network namespaces. In this case it might be easier to start something from scratch (not to use tst_net.sh). Maybe using C API and get inspiration of lib/tst_net*.c?

Thansk for your info, i will check lib/tst_net*.c try to find solution without use any network namespaces.

On latest detail design(See my latest comment) the traffic will go through from namespace host to outside real interface Reth0, and PTB will happen on Reth0(output), so my understanding(maybe wrong) it already behavior like real scenario since namespace only actor as a terminal host.

@pevik I have go through the tst_net*.c file and make following short summary, i guess you mean we maybe need create some new C API for support 3 host scenario, correct? lib/tst_net_iface_prefix.c use socket with AF_NETLINK get interface lib/tst_net_ip_prefix.c / lib/tst_net_vars.c set ip related ENV lib/tst_net.c tools function/get info from socket fd etc

coolgw commented 11 months ago

@pevik @metan-ucw If we can not use namespace solution, then i think we have to extend current two machines network ssh implemenation (testcases/network/stress/README) like following topology in LTP, could you help review this proposal?


                                       Control Link
        +-------------------------------------+------------------------------------+
        |                                     |                                    | 
+-------+-------+                     +-------+-------+                    +-------+-------+
+               +---- Test2 Link 0 ---|               +---- Test Link 0 ---+               |
+ Remote Host2  +         :           |  Local Host   +          :         +  Remote Host  |
+               +---- Test2 Link N ---|               +---- Test Link n ---+               |
+-------+-------+                     +---------------+                    +---------------+

Old environment: RHOST LHOST_HWADDRS RHOST_HWADDRS

New extend environment variables for Remote Host2 support: RHOST2 //ip address of Remote Host2 LHOST2_HWADDRS //HW address list of Test2 Link on Local Host side RHOST2_HWADDRS //HW address list of Test2 link on Remote Host2 side

coolgw commented 10 months ago

First kick off test in openqa env. https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/17661

coolgw commented 1 month ago

Case merged and running in Openqa env.(Openqa case will cover this feature instead of LTP test case)