Open uablrek opened 4 years ago
@nickolaev Is there a way to trig retest? First run my fault, the access to the vip address seemed to hang. When I added more printouts an unrelated test failed.
I am not aware of any way to trigger tests when you do not have admin rights, except for pushing on the PR. I guess security limitations.
Ok thanks. Pushing works for me. I will squash all commits when it works anyway.
There is some problem that I don't see when I run this locally. Access through the load-balances does not work. Access to the application-servers direct (via the bridge-domain) works. The LB seems to be configured correctly and I am currently trouble-shooting access to the application-servers via a gre tunnel, which is the only difference I can find between direct and LB access to the application-servers
It seems like traffic with gre tunnels does not work in CI environment. I added a test with a gre tunnel without the load-blancer and it fails;
==== Check GRE access
Cmd [ip tunnel add foo4 mode gre remote 10.60.1.4]
Cmd [ip addr add 10.70.0.5/32 dev foo4]
Cmd [ip link set up dev foo4]
Cmd [ip ro add 10.2.2.3/32 dev foo4]
Cmd [ping -c1 -W1 10.2.2.3]
PING 10.2.2.3 (10.2.2.3) 56(84) bytes of data.
--- 10.2.2.3 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
command terminated with exit code 1
The same test works in my environment.
Any ideas why gre traffic wouldn't work? Is it used already in the CI environment so it becomes tunnel-in-tunnel (which normally does not work)?
Can it be that the icmp-echo goes via gre, i.e has a gre header, but the icmp-echo-reply (pong) has not? Sometimes the network has stateful firewalls in place that does not permit this.
The tests are running in a kind environment on top of a single host Docker. Meaning that whatever happens is on the same host. I can't tell is there a firewall in the CircleCI environment, but I doubt it can influence this test.
Can you please rebase it and see how it goes. We are using VPP v3 now.
I got a new very weird fault; If the VIP address, e.g 10.2.2.22 is accessed with tcp from the K8s node where the load-balancer NSE process is running then the source address gets mangled. When accessed from another machine everything works!
On the K8s node where the load-balancer NSE POD is running;
$ ip ro add 10.2.2.0/24 via 11.0.2.3
$ nc -s 192.168.1.4 10.2.2.22 5001 < /dev/null
Tcpdump from within the load-balancer NSE POD;
$ tcpdump -eni vpp1host
12:40:12.688326 0a:71:16:a0:75:88 > 02:fe:70:2b:c7:88, ethertype IPv4 (0x0800), length 74: 192.168.1.4.41749 > 10.2.2.22.5001: Flags [S], seq 1542741591, win 64240, options [mss 1460,sackOK,TS val 1249805913 ecr 0,nop,wscale 7], length 0
12:40:13.711290 0a:71:16:a0:75:88 > 02:fe:70:2b:c7:88, ethertype IPv4 (0x0800), length 74: 192.168.1.4.41749 > 10.2.2.22.5001: Flags [S], seq 1542741591, win 64240, options [mss 1460,sackOK,TS val 1249806936 ecr 0,nop,wscale 7], length 0
Same stream on the receiving application-server;
$ tcpdump -ni gre0
12:40:12.690678 IP 173.58.1.4.41749 > 10.2.2.22.5001: Flags [S], seq 1542741591, win 64240, options [mss 1460,sackOK,TS val 1249805913 ecr 0,nop,wscale 7], length 0
12:40:13.718245 IP 169.59.1.4.41749 > 10.2.2.22.5001: Flags [S], seq 1542741591, win 64240, options [mss 1460,sackOK,TS val 1249806936 ecr 0,nop,wscale 7], length 0
On another machine (external or another K8s node);
$ ip ro add 10.2.2.0/24 via 192.168.1.4
$ nc 10.2.2.22 5001 < /dev/null
application-server-79cf4f5f66-mv9q4
$ nc 10.2.2.22 5001 < /dev/null
application-server-79cf4f5f66-mv9q4
$ nc 10.2.2.22 5001 < /dev/null
application-server-79cf4f5f66-qpc8w
$ nc 10.2.2.22 5001 < /dev/null
application-server-79cf4f5f66-g6l5g
I squashed the commits. And BTW tcpdump
must be installed;
apt update; apt install -y tcpdump # On the lb POD
apk add tcpdump # On the application-servers
Tests load-balancing from the NSE-pod.
Fixes #82
The vip address is accessed 30 times and the lb-data is collected and the frequence of targets computed. Example of test printout;
If only one target is found there is no load-balancing and the test fails.