Closed feynmanliang closed 3 months ago
Thanks for reporting this. I am looking into it.
I managed to reproduce it. The setup seems similar to regular mesh (in terms of iptables) though I only see ISTIO_OUTPUT, so not sure if it's really an easy port. mirrord doesn't recognize it as part of the mesh detection (no sidecar, only annotated - maybe we can detect from iptables. can we trust the annotation as way to know?)
ambient.istio.io/redirection: enabled
tbh I am not sure we can fix it as fast as we usually do (very busy period) so it'd be good to see if other users need it so we can prioritize it better.
Hey, @DmitryDodzin from our team tried to reproduce it but couldn't. I tried to re-reproduce it but seems like I'm running into other (just setting up a sample) issues. Do you mind re-testing and seeing if it somehow was fixed since you last tried?
Thanks!
No problem I will try to repro today.
I am having some issues with istio ambient 1.22.1 due to a partial implementation of PROXY protocol (https://github.com/istio/ztunnel/pull/850) - I created https://github.com/istio/ztunnel/issues/1124 and will see if I can repro in the earlier version
OK I verified that the issue is still present in 1.21.0.
I followed the guide in https://istio.io/latest/docs/ambient/getting-started/ to get set up.
Afterwards, I used mirrord
to steal traffic from the productpage-v1
. When I connect to the HTTPRoute, i expect for this traffic to be stolen and routed to my mirrord
process. Instead, it appears the traffic is not stolen and I can see the request appear at productpage-v1
.
Not sure if this is helpful or not since I am out of my depth, but FWIW in ambient mode I can see mirrord
obtaining a ztunnel
connection from logs
│
│ 2024-06-07T19:25:52.095358Z INFO xds{id=2}: ztunnel::xds::client: received response type_url="type.googleapis.com/istio.workload.Address" size=1 remove │
│ s=0 │
│ 2024-06-07T19:26:04.505215Z INFO ztunnel::inpod::statemanager: pod WorkloadUid("fe8e746c-7ca9-402e-9f61-7a97b4f4ba2d") received netns, starting proxy │
│ 2024-06-07T19:26:04.515067Z INFO ztunnel::proxy::inbound: listener established address=:15008 component="inbound" transparent=true │
│ 2024-06-07T19:26:04.525406Z INFO ztunnel::proxy::inbound_passthrough: listener established address=:15006 component="inbound plaintext" transparent=tru │
│ e │
│ 2024-06-07T19:26:04.535869Z INFO ztunnel::proxy::outbound: listener established address=:15001 component="outbound" transparent=true │
│ 2024-06-07T19:26:04.546294Z INFO ztunnel::proxy::socks5: listener established address=127.0.0.1:15080 component="socks5" │
│ 2024-06-07T19:26:05.134103Z INFO xds{id=2}: ztunnel::xds::client: received response type_url="type.googleapis.com/istio.workload.Address" size=1 remove │
│ s=0 │
Thanks! Can you share more information about the cluster? is it GKE/EKS/AKS/Local? if so what version/flavor?
I think I managed to reproduce it. We'll take it internally. Thank you. @DmitryDodzin I created a machine, installed kind + followed the guide and then run
mirrord exec -f mirrord.json --steal -t deployment/productpage-v1 -- python3 -m http.server 9080
and requests weren't stolen I'll give you machine details.
Some progress update - we invested about 1w investigating it, with no progress and decided to put it in the backlog. I will look into it again now, but please let us know if it hits you so we can know that if we need to prioritize it further.
Appreciate the update @aviramha - we've run into some other unrelated issues with Istio Ambient so we've reverted to sidecar deployment for now and mirrord is working great again :)
There's a decision we'd like to change upstream in istio to make it easier for us - https://github.com/istio/istio/issues/52309 Would appreciate your support @danielloader @feynmanliang :)
Some notes - if I change istio to use -j RETURN
we see a SYN going to our redirected port, but doesn't get to the agent.
Investigating further, I tried the following iptables
# Generated by iptables-save v1.8.9 on Wed Jul 24 17:14:53 2024
*raw
:PREROUTING ACCEPT [1833:10910497]
:OUTPUT ACCEPT [1794:5507218]
COMMIT
# Completed on Wed Jul 24 17:14:53 2024
# Generated by iptables-save v1.8.9 on Wed Jul 24 17:14:53 2024
*filter
:INPUT ACCEPT [1854:10913076]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [1812:5511092]
:MIRRORD_INPUT_RkDdj - [0:0]
-A INPUT -j MIRRORD_INPUT_RkDdj
-A MIRRORD_INPUT_RkDdj -p tcp -m connmark --mark 0x1 -j REJECT --reject-with tcp-reset
-A MIRRORD_INPUT_RkDdj -j RETURN
COMMIT
# Completed on Wed Jul 24 17:14:53 2024
# Generated by iptables-save v1.8.9 on Wed Jul 24 17:14:53 2024
*nat
:PREROUTING ACCEPT [1:60]
:INPUT ACCEPT [1:60]
:OUTPUT ACCEPT [34:2808]
:POSTROUTING ACCEPT [40:3168]
:ISTIO_OUTPUT - [0:0]
:MIRRORD_INPUT_RyGkm - [0:0]
:MIRRORD_OUTPUT_cKLus - [0:0]
-A PREROUTING -p tcp -m tcp --sport 37673 -j CONNMARK --set-xmark 0x0/0xffffffff
-A PREROUTING -p tcp -m tcp --dport 37673 -j CONNMARK --set-xmark 0x0/0xffffffff
-A PREROUTING -p tcp -m tcp --sport 37673 -j ACCEPT
-A PREROUTING -p tcp -m tcp --dport 37673 -j ACCEPT
-A PREROUTING -j MIRRORD_INPUT_RyGkm
-A PREROUTING -p tcp -m tcp --dport 37673 -j CONNMARK --set-xmark 0x0/0xffffffff
-A OUTPUT -p tcp -m tcp --sport 37673 -j ACCEPT
-A OUTPUT -p tcp -m tcp --dport 37673 -j ACCEPT
-A OUTPUT -j ISTIO_OUTPUT
-A OUTPUT -j MIRRORD_OUTPUT_cKLus
-A ISTIO_OUTPUT -d 169.254.7.127/32 -p tcp -m tcp -j RETURN
-A ISTIO_OUTPUT -p tcp -m mark --mark 0x111/0xfff -j RETURN
-A ISTIO_OUTPUT ! -d 127.0.0.1/32 -o lo -j RETURN
-A ISTIO_OUTPUT ! -d 127.0.0.1/32 -p tcp -m mark ! --mark 0x539/0xfff -j REDIRECT --to-ports 15001
-A MIRRORD_INPUT_RyGkm -p tcp -m tcp --dport 9080 -j REDIRECT --to-ports 37673
-A MIRRORD_INPUT_RyGkm -j RETURN
-A MIRRORD_OUTPUT_cKLus ! -s 10.244.2.9/32 -p tcp -m owner --gid-owner 34649 -j RETURN
-A MIRRORD_OUTPUT_cKLus -o lo -p tcp -m tcp --dport 9080 -j REDIRECT --to-ports 37673
-A MIRRORD_OUTPUT_cKLus -j RETURN
COMMIT
# Completed on Wed Jul 24 17:14:53 2024
# Generated by iptables-save v1.8.9 on Wed Jul 24 17:14:53 2024
*mangle
:PREROUTING ACCEPT [1829:10910238]
:INPUT ACCEPT [1854:10913076]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [1812:5511092]
:POSTROUTING ACCEPT [1812:5511092]
:ISTIO_OUTPUT - [0:0]
:ISTIO_PRERT - [0:0]
-A PREROUTING -p tcp -m tcp --sport 37673 -j CONNMARK --set-xmark 0x0/0xffffffff
-A PREROUTING -p tcp -m tcp --dport 37673 -j CONNMARK --set-xmark 0x0/0xffffffff
-A PREROUTING -p tcp -m tcp --sport 37673 -j ACCEPT
-A PREROUTING -p tcp -m tcp --dport 37673 -j ACCEPT
-A PREROUTING -j ISTIO_PRERT
-A OUTPUT -p tcp -m tcp --sport 37673 -j CONNMARK --set-xmark 0x0/0xffffffff
-A OUTPUT -p tcp -m tcp --dport 37673 -j CONNMARK --set-xmark 0x0/0xffffffff
-A OUTPUT -p tcp -m tcp --sport 37673 -j ACCEPT
-A OUTPUT -p tcp -m tcp --dport 37673 -j ACCEPT
-A OUTPUT -j ISTIO_OUTPUT
-A ISTIO_OUTPUT -m connmark --mark 0x111/0xfff -j CONNMARK --restore-mark --nfmask 0xffffffff --ctmask 0xffffffff
-A ISTIO_PRERT -m mark --mark 0x539/0xfff -j CONNMARK --set-xmark 0x111/0xfff
-A ISTIO_PRERT -s 169.254.7.127/32 -p tcp -m tcp -j RETURN
-A ISTIO_PRERT ! -d 127.0.0.1/32 -i lo -p tcp -j RETURN
-A ISTIO_PRERT -p tcp -m tcp --dport 15008 -m mark ! --mark 0x539/0xfff -j TPROXY --on-port 15008 --on-ip 0.0.0.0 --tproxy-mark 0x111/0xfff
-A ISTIO_PRERT -p tcp -m conntrack --ctstate RELATED,ESTABLISHED -j RETURN
-A ISTIO_PRERT ! -d 127.0.0.1/32 -p tcp -m mark ! --mark 0x539/0xfff -j TPROXY --on-port 15006 --on-ip 0.0.0.0 --tproxy-mark 0x111/0xfff
COMMIT
# Completed on Wed Jul 2
The attempt here is to make the redirect port bypass everything, alas still seeing weird stuff.
Using conntrack -E
while sending a request
root@productpage-v1-7d9fb6b899-cjgd2:/# conntrack -E │
[NEW] tcp 6 120 SYN_SENT src=10.244.1.7 dst=10.244.2.9 sport=37177 dport=15008 [UNREPLIED] src=10.244.2.9 dst=10.244.1.7 sport│
=15008 dport=37177 │
[UPDATE] tcp 6 60 SYN_RECV src=10.244.1.7 dst=10.244.2.9 sport=37177 dport=15008 src=10.244.2.9 dst=10.244.1.7 sport=15008 dport=│^[[A^[[A
37177 │
[UPDATE] tcp 6 432000 ESTABLISHED src=10.244.1.7 dst=10.244.2.9 sport=37177 dport=15008 src=10.244.2.9 dst=10.244.1.7 sport=15008│
dport=37177 [ASSURED] │
[NEW] tcp 6 120 SYN_SENT src=10.244.1.7 dst=10.244.2.9 sport=52185 dport=9080 [UNREPLIED] src=127.0.0.1 dst=10.244.1.7 sport=3│
7673 dport=52185
tcpdump -nvvei lo tcp
17:27:43.723014 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 8913, offset 0, flags [DF], proto TCP (6), length 60)
10.244.1.7.51945 > 127.0.0.1.37673: Flags [S], cksum 0x8b2a (incorrect -> 0xaed5), seq 2205104564, win 65495, options [mss 65495,sackOK,TS val 1288154164 ecr 0,nop,wscale 7], length 0
17:27:44.733180 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 8914, offset 0, flags [DF], proto TCP (6), length 60)
10.244.1.7.51945 > 127.0.0.1.37673: Flags [S], cksum 0x8b2a (incorrect -> 0xaae2), seq 2205104564, win 65495, options [mss 65495,sackOK,TS val 1288155175 ecr 0,nop,wscale 7], length 0
17:27:46.749438 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 8915, offset 0, flags [DF], proto TCP (6), length 60)
10.244.1.7.51945 > 127.0.0.1.37673: Flags [S], cksum 0x8b2a (incorrect -> 0xa302), seq 2205104564, win 65495, options [mss 65495,sackOK,TS val 1288157191 ecr 0,nop,wscale 7], length 0
17:27:50.845192 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 8916, offset 0, flags [DF], proto TCP (6), length 60)
10.244.1.7.51945 > 127.0.0.1.37673: Flags [S], cksum 0x8b2a (incorrect -> 0x9302), seq 2205104564, win 65495, options [mss 65495,sackOK,TS val 1288161287 ecr 0,nop,wscale 7], length 0
tcpdump when using curl 127.0.0.1:9080 (and it works)
17:55:01.772495 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 58256, offset 0, flags [DF], proto TCP (6), length 60)
127.0.0.1.55540 > 127.0.0.1.37673: Flags [S], cksum 0xfe30 (incorrect -> 0xf505), seq 1027527383, win 65495, options [mss 65495,sackOK,TS val 2377470226 ecr 0,nop,wscale 7], length 0
17:55:01.772514 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
127.0.0.1.9080 > 127.0.0.1.55540: Flags [S.], cksum 0xfe30 (incorrect -> 0x159e), seq 3993141833, ack 1027527384, win 65483, options [mss 65495,sackOK,TS val 2377470226 ecr 2377470226,nop,wscale 7], length 0
17:55:01.772528 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 58257, offset 0, flags [DF], proto TCP (6), length 52)
127.0.0.1.55540 > 127.0.0.1.37673: Flags [.], cksum 0xfe28 (incorrect -> 0xcca8), seq 1027527384, ack 3993141834, win 512, options [nop,nop,TS val 2377470226 ecr 2377470226], length 0
The response from istio is fairly sane so not sure what to I can add to the issue.
Ultimately the ambient service mesh has to be the outer encapsulation of the rules I guess. Else it would be trivial to leak.
Given that, can you implement the requirement to put it before or after istio in the chain as mentioned?
Yeah, that's how I debugged so far - still having quirky results.
when I added iptables -t nat -I POSTROUTING -m tcp -p tcp --dport 37673 -j SNAT --to-source 127.0.0.1
(tried all other chains/tables, none had any effect)
tcpdump had this result:
18:03:45.483230 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 28095, offset 0, flags [DF], proto TCP (6), length 60)
127.0.0.1.51753 > 127.0.0.1.37673: Flags [S], cksum 0xfe30 (incorrect -> 0x42ce), seq 3168861570, win 65495, options [mss 65495,sackOK,TS val 1290315924 ecr 0,nop,wscale 7], length 0
18:03:46.493265 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 28096, offset 0, flags [DF], proto TCP (6), length 60)
127.0.0.1.51753 > 127.0.0.1.37673: Flags [S], cksum 0xfe30 (incorrect -> 0x3edb), seq 3168861570, win 65495, options [mss 65495,sackOK,TS val 1290316935 ecr 0,nop,wscale 7], length 0
18:03:48.513237 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 28097, offset 0, flags [DF], proto TCP (6), length 60)
127.0.0.1.51753 > 127.0.0.1.37673: Flags [S], cksum 0xfe30 (incorrect -> 0x36f7), seq 3168861570, win 65495, options [mss 65495,sackOK,TS val 1290318955 ecr 0,nop,wscale 7], length 0
18:03:52.765431 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 28098, offset 0, flags [DF], proto TCP (6), length 60)
127.0.0.1.51753 > 127.0.0.1.37673: Flags [S], cksum 0xfe30 (incorrect -> 0x265b), seq 3168861570, win 65495, options [mss 65495,sackOK,TS val 1290323207 ecr 0,nop,wscale 7], length 0
but still, no synack..
Leaving this for now.
Thanks for trying, I guess fundamentally we're trying to fight the design of the service mesh - in a pinch I'll just have to disable the ambient mesh for the pod via label if I need to steal but ultimately this situation will likely mean I'm only going to use mirror mode from now on.
Thanks for trying, I guess fundamentally we're trying to fight the design of the service mesh - in a pinch I'll just have to disable the ambient mesh for the pod via label if I need to steal but ultimately this situation will likely mean I'm only going to use mirror mode from now on.
We aren't giving up, just deferring it for now. It's not a contradiction of mesh tbh - it's just some weird Linux magic that is happening (ambient uses TPROXY that is the least documented Linux feature I would guess)
Would this be less terrible with nftables or is iptables not the actual pain here?
Would this be less terrible with nftables or is iptables not the actual pain here?
Nope.
My current guess is that the kernel has no route to the ip sending the syn to the redirect, might need to add a connmark so the ip rule that routes to ztunnel will handle it.
A diagram to capture the situation as I see it now.
jfyi the fact I said we defer it doesn't mean I don't think about it, often this kind of bugs are best handled when doing something else, then I get a eureka moment ;)
I had a great idea, that didn't yield any important info (but did yield some info)
I added -j NFLOG
to -t raw
, (both prerouting and output) to see if we see the traffic back from the agent, and no.
I think the kernel doesn't even try to send synack for some reason :\
Chain PREROUTING (policy ACCEPT 939 packets, 7327K bytes) │
pkts bytes target prot opt in out source destination │
939 7327K NFLOG 0 -- * * 0.0.0.0/0 0.0.0.0/0 nflog-group 10 │
│
Chain OUTPUT (policy ACCEPT 915 packets, 3693K bytes) │
pkts bytes target prot opt in out source destination │
915 3693K NFLOG 0 -- * * 0.0.0.0/0 0.0.0.0/0 nflog-group 10
I have reproduced the issue without mirrord, to be able to debug it further.
kubectl debug -it $POD_NAME --profile=netadmin --image=ghcr.io/metalbear-co/mirrord:3.111.0 -- sh
run this in two different terminals.
then
apt install netcat-openbsd curl
then
nc -k -l -p 10200
and in the other
iptables -t nat -I OUTPUT -m tcp -p tcp --dport 9080 -j REDIRECT --to-ports 10200
and then
curl 127.0.0.1:9080
- can be seen in nc
kubectl exec deploy/notsleep -- curl -s http://productpage:9080/ | grep -o "<title>.*</title>"
stuck forever.
I found a workaround - https://sysctl-explorer.net/net/ipv4/route_localnet/
echo 1 > /proc/sys/net/ipv4/conf/all/route_localnet
I am not sure if that's the path we wanna go down on (still need to workaround the iptables rules to work with new chains but that should be easy)
I'm looking to see if we can do some other thing instead of needing that change (that requires root)
okay, can't find any worthy workaround, let's set a requirement for ambient mesh to use privileged.
okay, can't find any worthy workaround, let's set a requirement for ambient mesh to use privileged.
I don't think it's unreasonable to expect privileged access in this particular scenario.
hopefully by the time ambient is full blown some annoyed mirrord user will come and save us with a better answer ;)
I've assigned it to the king and maker of our iptables part to ship the final fix @DmitryDodzin
Mirrord works great with Istio's traditional sidecar deployment mode, and even gives a helpful message about
--steal
when it detects service meshes.However, I'm not sure if it's a bug or not implemented that mirrord doesn't work with Istio's ambient mode. Is this on the roadmap? The agent pod runs fine but ATM I need to remove a
istio.io/dataplane-mode: ambient
label from the namespace for mirrord to successfully steal traffic.