Open vincentmli opened 1 week ago
the first TCP segments contains handshake and clienthello and up to extension key_share
which has 1263 bytes. server_name
extension is in the second TCP segments, remove the handshake and clienthello check in sni.lua seems not enough to get server_name
, the offset could be incorrect still.
wrapping up our discussion on Matrix.. it might be resolved on kernel 6.9 by this:
https://elixir.bootlin.com/linux/v6.9/source/net/core/dev.c#L4914
see also: https://lore.kernel.org/bpf/cover.1707729884.git.lorenzo@kernel.org/
@lneto I tried upstream stable kernel 6.10.11 release, same problem
[root@bpfire-3 ~]# uname -a
Linux bpfire-3.localdomain 6.10.11-ipfire #1 SMP PREEMPT_DYNAMIC Fri Sep 20 15:26:24 GMT 2024 x86_64 Intel(R) Celeron(R) CPU G1820 @ 2.70GHz GenuineIntel GNU/Linux
I'm not sure why it's not triggering skb_linearize(), I would need to have a better look at this.. thinking also on a simple hack on Lua side to keep context of the parser.. will try it out later..
I'm not sure why it's not triggering skb_linearize(), I would need to have a better look at this.. thinking also on a simple hack on Lua side to keep context of the parser.. will try it out later..
is there some issue with skb_linearize()
not triggering in kernel or in lunatik?
I'm not sure why it's not triggering skb_linearize(), I would need to have a better look at this.. thinking also on a simple hack on Lua side to keep context of the parser.. will try it out later..
is there some issue with
skb_linearize()
not triggering in kernel or in lunatik?
the issue is probably with my understanding of what should actual happen ;-).. I was expecting the following flow..
do_xdp_generic netif_receive_generic_xdp netif_skb_check_for_xdp skb_linearize
however, I'm not sure why it's not following this.. I need to spend some brain cycles to understand it better =)
I'm not sure why it's not triggering skb_linearize(), I would need to have a better look at this.. thinking also on a simple hack on Lua side to keep context of the parser.. will try it out later..
is there some issue with
skb_linearize()
not triggering in kernel or in lunatik?the issue is probably with my understanding of what should actual happen ;-).. I was expecting the following flow..
do_xdp_generic netif_receive_generic_xdp netif_skb_check_for_xdp skb_linearize
however, I'm not sure why it's not following this.. I need to spend some brain cycles to understand it better =)
I am learning and I like to understand it better too :). if the skb_linearize
is triggered as you thought, when XDP pass the packet to Lua, it will be one whole clienthello packet so Lua can parse it correctly? not like two previously individual packet segments ?
perhaps this is the guy we want to use.. https://ebpf-docs.dylanreimerink.nl/linux/helper-function/bpf_xdp_get_buff_len/
also, page 15 from https://lpc.events/event/11/contributions/939/attachments/771/1551/xdp-multi-buff.pdf
I still need to digest.. but I think it should be the path to follow..
I'm not sure why it's not triggering skb_linearize(), I would need to have a better look at this.. thinking also on a simple hack on Lua side to keep context of the parser.. will try it out later..
is there some issue with
skb_linearize()
not triggering in kernel or in lunatik?the issue is probably with my understanding of what should actual happen ;-).. I was expecting the following flow.. do_xdp_generic netif_receive_generic_xdp netif_skb_check_for_xdp skb_linearize however, I'm not sure why it's not following this.. I need to spend some brain cycles to understand it better =)
I am learning and I like to understand it better too :). if the
skb_linearize
is triggered as you thought, when XDP pass the packet to Lua, it will be one whole clienthello packet so Lua can parse it correctly? not like two previously individual packet segments ?
precisely.. we have had the same issue with luaxtable.. you might want to take a look..
https://github.com/luainkernel/lunatik/issues/107#issuecomment-2276326453 https://github.com/luainkernel/lunatik/commit/a147522fb664bfbfdc0289c9a12d4e05b6ffae64
perhaps this is the guy we want to use.. https://ebpf-docs.dylanreimerink.nl/linux/helper-function/bpf_xdp_get_buff_len/
also, page 15 from https://lpc.events/event/11/contributions/939/attachments/771/1551/xdp-multi-buff.pdf
I still need to digest.. but I think it should be the path to follow..
these helpers doesn't seem to be merged in the tree..
UPDATE: sorry, only adjust_data that isn't.. we'll keep digging later..
it might be our guy.. https://elixir.bootlin.com/linux/v6.10/source/net/core/filter.c#L4175
it might be our guy.. https://elixir.bootlin.com/linux/v6.10/source/net/core/filter.c#L4175
can you try to add bpf_xdp_adjust_tail(ctx, 0);
in https.c just before https://github.com/luainkernel/lunatik/blob/master/examples/filter/https.c#L22?
a silly question.. you are using -skb
on xdp-loader
, right? e.g.,
sudo xdp-loader load -m skb <ifname> https.o
a silly question.. you are using
-skb
onxdp-loader
, right? e.g.,sudo xdp-loader load -m skb <ifname> https.o
yes
it might be our guy.. https://elixir.bootlin.com/linux/v6.10/source/net/core/filter.c#L4175
can you try to add
bpf_xdp_adjust_tail(ctx, 0);
in https.c just before https://github.com/luainkernel/lunatik/blob/master/examples/filter/https.c#L22?
I added bpf_xdp_adjust_tail(ctx, 0);
, same problem. curl blocked, chrome passed. the selftest seems to check the total length first, then adjust the tail according to the length, here you suggest offset 0
meaning ? can we artificially add an random offset number that is big enough to hold the chrome clienthello to try?
it might be our guy.. https://elixir.bootlin.com/linux/v6.10/source/net/core/filter.c#L4175
see: https://elixir.bootlin.com/linux/v6.10/source/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c#L6 can you try to add
bpf_xdp_adjust_tail(ctx, 0);
in https.c just before https://github.com/luainkernel/lunatik/blob/master/examples/filter/https.c#L22?I added
bpf_xdp_adjust_tail(ctx, 0);
, same problem. curl blocked, chrome passed. the selftest seems to check the total length first, then adjust the tail according to the length, here you suggest offset0
meaning ? can we artificially add an random offset number that is big enough to hold the chrome clienthello to try?
I added bpf_xdp_adjust_tail(ctx, 2048);
, no effect
I think we could call bpf_xdp_get_buff_len(ctx)
and print out the length in https.o so we know how many data length there is, right? I guess the end goal is we want to get the data length in https.o that is equal to the total packet length chrome browser sent. if we don't get that large data length, problem is in kernel side, if we get large data length in https.o, problem is in LuaXDP side?
I think we could call
bpf_xdp_get_buff_len(ctx)
and print out the length in https.o so we know how many data length there is, right? I guess the end goal is we want to get the data length in https.o that is equal to the total packet length chrome browser sent. if we don't get that large data length, problem is in kernel side, if we get large data length in https.o, problem is in LuaXDP side?
yup, I would print it as well..
perhaps we need this flag.. can you try this out? https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#xdp-fragments
can you try s/SEC("xdp")/SEC("xdp.frags")/
?
more info here.. https://lore.kernel.org/netdev/cover.1642758637.git.lorenzo@kernel.org/
perhaps we need to use bpf_xdp_load_bytes as well..
Btw, what's your NIC? see https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2
Can you check your MTU config as well?
I added bpf_printk to print length, and xdp.frags, I can see the printed length, but none of them exceeded MTU 1460 size, all are small, I need to adjust the https.c so I can clearly tell the printed length is from chrome, now there are some other noisy printout that can confuse me.
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
extern int bpf_luaxdp_run(char *key, size_t key__sz, struct xdp_md *xdp_ctx, void *arg, size_t arg__sz) __ksym;
static char runtime[] = "examples/filter/sni";
struct bpf_luaxdp_arg {
__u16 offset;
} __attribute__((packed));
SEC("xdp.frags")
int filter_https(struct xdp_md *ctx)
{
int data_len = bpf_xdp_get_buff_len(ctx);
struct bpf_luaxdp_arg arg;
//bpf_xdp_adjust_tail(ctx, 2048);
void *data_end = (void *)(long)ctx->data_end;
void *data = (void *)(long)ctx->data;
struct iphdr *ip = data + sizeof(struct ethhdr);
if (ip + 1 > (struct iphdr *)data_end)
goto pass;
if (ip->protocol != IPPROTO_TCP)
goto pass;
struct tcphdr *tcp = (void *)ip + (ip->ihl * 4);
if (tcp + 1 > (struct tcphdr *)data_end)
goto pass;
if (bpf_ntohs(tcp->dest) != 443 || !tcp->psh)
goto pass;
bpf_printk(" data len: %d from %pI4\n", data_len, &ip->saddr);
void *payload = (void *)tcp + (tcp->doff * 4);
if (payload > data_end)
goto pass;
arg.offset = bpf_htons((__u16)(payload - data));
int action = bpf_luaxdp_run(runtime, sizeof(runtime), ctx, &arg, sizeof(arg));
return action < 0 ? XDP_PASS : action;
pass:
return XDP_PASS;
}
char _license[] SEC("license") = "Dual MIT/GPL";
https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2
Mine is broadcom and MTU is set to 1500
3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000
link/ether 54:9f:35:0d:a7:4a brd ff:ff:ff:ff:ff:ff
prog/xdp id 81 name xdp_dispatcher tag 90f686eb86991928 jited
02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
Subsystem: Dell NetXtreme BCM5720 Gigabit Ethernet PCIe
Product Name: Broadcom NetXtreme Gigabit Ethernet
I can switch to a mini PC which has Intel I211 Gigabit and it uses igb driver, is that ok? since we are using XDP generic skb mode, does it matter which NIC?
Btw, what's your NIC? see https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2
Can you check your MTU config as well?
Can you also try to run it on veth? So we can discard the interference of your NIC driver..
Btw, what's your NIC? see https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2 Can you check your MTU config as well?
Can you also try to run it on veth? So we can discard the interference of your NIC driver..
since the test involves using chrome browser, maybe there is someway to use curl client to simulate chrome browser, I imagine veth would in some kind of container setup in one machine
https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2
Mine is broadcom and MTU is set to 1500
3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000 link/ether 54:9f:35:0d:a7:4a brd ff:ff:ff:ff:ff:ff prog/xdp id 81 name xdp_dispatcher tag 90f686eb86991928 jited 02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe Subsystem: Dell NetXtreme BCM5720 Gigabit Ethernet PCIe Product Name: Broadcom NetXtreme Gigabit Ethernet
I can switch to a mini PC which has Intel I211 Gigabit and it uses igb driver, is that ok?
Yes, please..
since we are using XDP generic skb mode, does it matter which NIC?
I assumed it doesn't as well.. but I'm not sure regarding the multi-buf support
Btw, what's your NIC? see https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2 Can you check your MTU config as well?
Can you also try to run it on veth? So we can discard the interference of your NIC driver..
since the test involves using chrome browser, maybe there is someway to use curl client to simulate chrome browser, I imagine veth would in some kind of container setup in one machine
I think you can just use nc
to send the client-hello you captured from chrome..
tested on mini PC with intel I211 igb driver, mtu 1500, not working. I am running timeout and got to fix some other issue of bpfire :)
https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2
it might be realistic for me to try virtio
https://github.com/torvalds/linux/commit/22174f79a44baf5e46faafff1d7b21363431b93a in my home lab since I could run BPFire as KVM virtual machine, I am wondering though if the underlying hardware NIC does not support XDP frag, would virtio
driver in guest still support multi buffer? it will take me some time to get the lab up and test...
https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2
it might be realistic for me to try
virtio
torvalds/linux@22174f7 in my home lab since I could run BPFire as KVM virtual machine, I am wondering though if the underlying hardware NIC does not support XDP frag, wouldvirtio
driver in guest still support multi buffer? it will take me some time to get the lab up and test...
That's a good question.. I'm not sure..
I added bpf_printk to print length, and xdp.frags, I can see the printed length, but none of them exceeded MTU 1460 size, all are small, I need to adjust the https.c so I can clearly tell the printed length is from chrome, now there are some other noisy printout that can confuse me.
#include "vmlinux.h" #include <bpf/bpf_helpers.h> #include <bpf/bpf_endian.h> extern int bpf_luaxdp_run(char *key, size_t key__sz, struct xdp_md *xdp_ctx, void *arg, size_t arg__sz) __ksym; static char runtime[] = "examples/filter/sni"; struct bpf_luaxdp_arg { __u16 offset; } __attribute__((packed)); SEC("xdp.frags") int filter_https(struct xdp_md *ctx) { int data_len = bpf_xdp_get_buff_len(ctx); struct bpf_luaxdp_arg arg; //bpf_xdp_adjust_tail(ctx, 2048); void *data_end = (void *)(long)ctx->data_end; void *data = (void *)(long)ctx->data; struct iphdr *ip = data + sizeof(struct ethhdr); if (ip + 1 > (struct iphdr *)data_end) goto pass; if (ip->protocol != IPPROTO_TCP) goto pass; struct tcphdr *tcp = (void *)ip + (ip->ihl * 4); if (tcp + 1 > (struct tcphdr *)data_end) goto pass; if (bpf_ntohs(tcp->dest) != 443 || !tcp->psh) goto pass; bpf_printk(" data len: %d from %pI4\n", data_len, &ip->saddr); void *payload = (void *)tcp + (tcp->doff * 4); if (payload > data_end) goto pass; arg.offset = bpf_htons((__u16)(payload - data)); int action = bpf_luaxdp_run(runtime, sizeof(runtime), ctx, &arg, sizeof(arg)); return action < 0 ? XDP_PASS : action; pass: return XDP_PASS; } char _license[] SEC("license") = "Dual MIT/GPL";
BTW, is bpf_xdp_get_buf_len(ctx) == data_end - data
?
https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2
it might be realistic for me to try
virtio
torvalds/linux@22174f7 in my home lab since I could run BPFire as KVM virtual machine, I am wondering though if the underlying hardware NIC does not support XDP frag, wouldvirtio
driver in guest still support multi buffer? it will take me some time to get the lab up and test...That's a good question.. I'm not sure..
now I got BPFire in KVM running and the green0
virtual interface is virtio_net
, still no success
3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc cake state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:62:45:3e brd ff:ff:ff:ff:ff:ff
prog/xdp id 9 name xdp_dispatcher tag 90f686eb86991928 jited
[root@bpfire-8 lua]# lspci -vvv | grep -i eth
01:00.0 Ethernet controller: Red Hat, Inc. Virtio 1.0 network device (rev 01)
07:00.0 Ethernet controller: Red Hat, Inc. Virtio 1.0 network device (rev 01)
[root@bpfire-8 lua]# dmesg | grep green0
[ 7.821719] 8021q: adding VLAN 0 to HW filter on device green0
[ 276.029437] virtio_net virtio6 green0: XDP request 5 queues but max is 1. XDP_TX and XDP_REDIRECT will operate in a slower locked tx mode.
BTW, is
bpf_xdp_get_buf_len(ctx) == data_end - data
?
there is noisy data length print and I could not tell which data length printout refer to chrome :), do you have better idea to condition the bpf_printk to maybe only print out for chrome ?
BTW, is
bpf_xdp_get_buf_len(ctx) == data_end - data
?there is noisy data length print and I could not tell which data length printout refer to chrome :), do you have better idea to condition the bpf_printk to maybe only print out for chrome ?
You can pass them to Lua adding two new __u16
fields in the bpf_luaxdp_arg
struct.. them you could use Lua to add more info.. perhaps other metadata from the TLS handshake.. cipher perhaps?
https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2
Mine is broadcom and MTU is set to 1500
3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000 link/ether 54:9f:35:0d:a7:4a brd ff:ff:ff:ff:ff:ff prog/xdp id 81 name xdp_dispatcher tag 90f686eb86991928 jited 02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe Subsystem: Dell NetXtreme BCM5720 Gigabit Ethernet PCIe Product Name: Broadcom NetXtreme Gigabit Ethernet
1500 seems too little, right? TCP length in your example is 1781 bytes, right? Can you try to increase it? let's say..
ip link set dev green0 mtu 3500
https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2
Mine is broadcom and MTU is set to 1500
3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000 link/ether 54:9f:35:0d:a7:4a brd ff:ff:ff:ff:ff:ff prog/xdp id 81 name xdp_dispatcher tag 90f686eb86991928 jited 02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe Subsystem: Dell NetXtreme BCM5720 Gigabit Ethernet PCIe Product Name: Broadcom NetXtreme Gigabit Ethernet
1500 seems too little, right? TCP length in your example is 1781 bytes, right? Can you try to increase it? let's say..
ip link set dev green0 mtu 3500
yes I already tried larger MTU, no success
BTW, is
bpf_xdp_get_buf_len(ctx) == data_end - data
?there is noisy data length print and I could not tell which data length printout refer to chrome :), do you have better idea to condition the bpf_printk to maybe only print out for chrome ?
You can pass them to Lua adding two new
__u16
fields in thebpf_luaxdp_arg
struct.. them you could use Lua to add more info.. perhaps other metadata from the TLS handshake.. cipher perhaps?
I think I need to understand and read more about the xdp frags so I know what I am trying :)
@vincentmli my understanding is that you need
1) MTU bigger or equal than the ETH frame that will carry the TCP frags;
2) load eBPF program in SEC("XDP.frags");
3) make your TCP frag request using chrome;
4) now, bpf_xdp_get_buf_len() should return the total TCP segment length, which should be greater than data_end - data
. This means that part of of the frame is linearized, part is not.
5) then, if I got correctly, you can call bpf_xdp_adjust_tail(ctx, 0) to "linearize" it.
6) after that, data_end - data
should have the length of the whole TCP segment.
@vincentmli my understanding is that you need
- MTU bigger or equal than the ETH frame that will carry the TCP frags;
- load eBPF program in SEC("XDP.frags");
- make your TCP frag request using chrome;
- now, bpf_xdp_get_buf_len() should return the total TCP segment length, which should be greater than
data_end - data
. This means that part of of the frame is linearized, part is not.- then, if I got correctly, you can call bpf_xdp_adjust_tail(ctx, 0) to "linearize" it.
- after that,
data_end - data
should have the length of the whole TCP segment.
thanks! I will try it, I am following the initial XDP multi buffer support and reading all the discussion involved and hasn't finished yet, it takes time but I am getting better understanding :) https://github.com/vincentmli/BPFire/issues/44
I followed your suggestion to set the MTU to 4096, and adjusted the code below to print true_len
, old_data_len
, new_data_len
, send chrome request, but they are always the same, still no success, I am expecting true_len
> old_data_len
, but it never occurs
MTU
[root@bpfire-5 ~]# ip l list dev green0
5: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4096 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000
link/ether 00:0e:c4:cf:4c:8e brd ff:ff:ff:ff:ff:ff
prog/xdp id 149 name xdp_dispatcher tag 90f686eb86991928 jited
code
18 SEC("xdp.frags")
19 int filter_https(struct xdp_md *ctx)
20 {
21 struct bpf_luaxdp_arg arg;
22
23 void *old_data_end = (void *)(long)ctx->data_end;
24 void *old_data = (void *)(long)ctx->data;
25
26 int true_len = bpf_xdp_get_buff_len(ctx);
27
28 int old_data_len = old_data_end - old_data;
29
30 bpf_xdp_adjust_tail(ctx, 0);
31
32 void *data_end = (void *)(long)ctx->data_end;
33 void *data = (void *)(long)ctx->data;
34
35 int new_data_len = data_end - data;
36
37 struct iphdr *ip = data + sizeof(struct ethhdr);
38
39 if (ip + 1 > (struct iphdr *)data_end)
40 goto pass;
41
42 if (ip->protocol != IPPROTO_TCP)
43 goto pass;
44
45 struct tcphdr *tcp = (void *)ip + (ip->ihl * 4);
46 if (tcp + 1 > (struct tcphdr *)data_end)
47 goto pass;
48
49 if (bpf_ntohs(tcp->dest) != 443 || !tcp->psh)
50 goto pass;
51
52 bpf_printk(" true_len: %d from %pI4\n", true_len, &ip->saddr);
53 bpf_printk(" old_data_len: %d from %pI4\n", old_data_len, &ip->saddr);
54 bpf_printk(" new_data_len: %d from %pI4\n", new_data_len, &ip->saddr);
55
56 void *payload = (void *)tcp + (tcp->doff * 4);
57 if (payload > data_end)
58 goto pass;
59
60 arg.offset = bpf_htons((__u16)(payload - data));
61
62 int action = bpf_luaxdp_run(runtime, sizeof(runtime), ctx, &arg, sizeof(arg));
63 return action < 0 ? XDP_PASS : action;
64 pass:
65 return XDP_PASS;
66 }
67
trace log
<idle>-0 [001] ..s21 2038.226457: bpf_trace_printk: true_len: 803 from 172.16.1.9
<idle>-0 [001] ..s21 2038.226464: bpf_trace_printk: old_data_len: 803 from 172.16.1.9
<idle>-0 [001] ..s21 2038.226465: bpf_trace_printk: new_data_len: 803 from 172.16.1.9
<idle>-0 [001] ..s21 2038.268004: bpf_trace_printk: true_len: 130 from 172.16.1.9
<idle>-0 [001] ..s21 2038.268012: bpf_trace_printk: old_data_len: 130 from 172.16.1.9
<idle>-0 [001] ..s21 2038.268013: bpf_trace_printk: new_data_len: 130 from 172.16.1.9
<idle>-0 [001] ..s21 2038.268509: bpf_trace_printk: true_len: 158 from 172.16.1.9
<idle>-0 [001] ..s21 2038.268514: bpf_trace_printk: old_data_len: 158 from 172.16.1.9
<idle>-0 [001] ..s21 2038.268515: bpf_trace_printk: new_data_len: 158 from 172.16.1.9
<idle>-0 [001] ..s21 2038.268595: bpf_trace_printk: true_len: 647 from 172.16.1.9
<idle>-0 [001] .Ns21 2038.268605: bpf_trace_printk: old_data_len: 647 from 172.16.1.9
<idle>-0 [001] .Ns21 2038.268605: bpf_trace_printk: new_data_len: 647 from 172.16.1.9
this tcpdump capture from the chrome client, so the clienthello is already segmented by the chrome client due to client MTU, even if BPFire green0
MTU is 4096, the incoming packet is smaller than standard 1500 MTU, so the generic xdp actually never received > 1500 packets, is that the reason ? I tried to set the chrome client MTU to 4096 too, but still it sends smaller packet < 1500. the chrome client is iMac by the way.
17:58:15.505326 IP 172.16.1.9.55621 > 104.198.14.52.443: Flags [.], seq 1:1229, ack 1, win 2053, options [nop,nop,TS val 1207024357 ecr 2206901463], length 1228
17:58:15.505327 IP 172.16.1.9.55621 > 104.198.14.52.443: Flags [P.], seq 1229:1934, ack 1, win 2053, options [nop,nop,TS val 1207024357 ecr 2206901463], length 705
and even if I can force chrome client send large than > mtu 1500 packet, this would not work in practice, can't force each user computer to do that :)
I tried a new Linux client with MTU set to 9000, and use hping3 to send > 1500 bytes
hping3 -c 1 -d 3000 -p 999 172.16.1.5
HPING 172.16.1.5 (enp2s0f1 172.16.1.5): NO FLAGS are set, 40 headers + 3000 data bytes
--- 172.16.1.5 hping statistic ---
1 packets transmitted, 0 packets received, 100% packet loss
bpfire trace log:
<idle>-0 [000] ..s21 9697.400542: bpf_trace_printk: true_len: 3054 from 172.16.1.11
<idle>-0 [000] ..s21 9697.400551: bpf_trace_printk: old_data_len: 3054 from 172.16.1.11
<idle>-0 [000] ..s21 9697.400552: bpf_trace_printk: new_data_len: 3054 from 172.16.1.11
it looks to me generic xdp driver already handles linearized skb since the true_len
equals old_data_len
to 3054
bytes, right? so my understanding is that if chrome (iMac) client does not segment the clienthello, https.o should get linearized skb with true length of clienthello and the sni filter should work. no need to call bpf_xdp_adjust_tail(ctx, 0);
, just curious, where you get calling bpf_xdp_adjust_tail(ctx, 0);
with offset 0
will linearize skb?
I followed your suggestion to set the MTU to 4096, and adjusted the code below to print
true_len
,old_data_len
,new_data_len
, send chrome request, but they are always the same, still no success, I am expectingtrue_len
>old_data_len
, but it never occursMTU
[root@bpfire-5 ~]# ip l list dev green0 5: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4096 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000 link/ether 00:0e:c4:cf:4c:8e brd ff:ff:ff:ff:ff:ff prog/xdp id 149 name xdp_dispatcher tag 90f686eb86991928 jited
code
18 SEC("xdp.frags") 19 int filter_https(struct xdp_md *ctx) 20 { 21 struct bpf_luaxdp_arg arg; 22 23 void *old_data_end = (void *)(long)ctx->data_end; 24 void *old_data = (void *)(long)ctx->data; 25 26 int true_len = bpf_xdp_get_buff_len(ctx); 27 28 int old_data_len = old_data_end - old_data; 29 30 bpf_xdp_adjust_tail(ctx, 0); 31 32 void *data_end = (void *)(long)ctx->data_end; 33 void *data = (void *)(long)ctx->data; 34 35 int new_data_len = data_end - data; 36 37 struct iphdr *ip = data + sizeof(struct ethhdr); 38 39 if (ip + 1 > (struct iphdr *)data_end) 40 goto pass; 41 42 if (ip->protocol != IPPROTO_TCP) 43 goto pass; 44 45 struct tcphdr *tcp = (void *)ip + (ip->ihl * 4); 46 if (tcp + 1 > (struct tcphdr *)data_end) 47 goto pass; 48 49 if (bpf_ntohs(tcp->dest) != 443 || !tcp->psh) 50 goto pass; 51 52 bpf_printk(" true_len: %d from %pI4\n", true_len, &ip->saddr); 53 bpf_printk(" old_data_len: %d from %pI4\n", old_data_len, &ip->saddr); 54 bpf_printk(" new_data_len: %d from %pI4\n", new_data_len, &ip->saddr); 55 56 void *payload = (void *)tcp + (tcp->doff * 4); 57 if (payload > data_end) 58 goto pass; 59 60 arg.offset = bpf_htons((__u16)(payload - data)); 61 62 int action = bpf_luaxdp_run(runtime, sizeof(runtime), ctx, &arg, sizeof(arg)); 63 return action < 0 ? XDP_PASS : action; 64 pass: 65 return XDP_PASS; 66 } 67
trace log
<idle>-0 [001] ..s21 2038.226457: bpf_trace_printk: true_len: 803 from 172.16.1.9 <idle>-0 [001] ..s21 2038.226464: bpf_trace_printk: old_data_len: 803 from 172.16.1.9 <idle>-0 [001] ..s21 2038.226465: bpf_trace_printk: new_data_len: 803 from 172.16.1.9 <idle>-0 [001] ..s21 2038.268004: bpf_trace_printk: true_len: 130 from 172.16.1.9 <idle>-0 [001] ..s21 2038.268012: bpf_trace_printk: old_data_len: 130 from 172.16.1.9 <idle>-0 [001] ..s21 2038.268013: bpf_trace_printk: new_data_len: 130 from 172.16.1.9 <idle>-0 [001] ..s21 2038.268509: bpf_trace_printk: true_len: 158 from 172.16.1.9 <idle>-0 [001] ..s21 2038.268514: bpf_trace_printk: old_data_len: 158 from 172.16.1.9 <idle>-0 [001] ..s21 2038.268515: bpf_trace_printk: new_data_len: 158 from 172.16.1.9 <idle>-0 [001] ..s21 2038.268595: bpf_trace_printk: true_len: 647 from 172.16.1.9 <idle>-0 [001] .Ns21 2038.268605: bpf_trace_printk: old_data_len: 647 from 172.16.1.9 <idle>-0 [001] .Ns21 2038.268605: bpf_trace_printk: new_data_len: 647 from 172.16.1.9
so, none exceeded the "default" MTU size.. my understanding is that wouldn't be non-linear anyway..
and even if I can force chrome client send large than > mtu 1500 packet, this would not work in practice, can't force each user computer to do that :)
my understanding (from the issue description) was that chrome client was already sending TCP frags into a large ETH frame.. if it's not the case, I don't think XDP will be aware of the frags because it runs before the network stack.. in this case, I see two paths:
1) handle frags in our eBPF/Lua program; 2) let the frags go through the network stack and handle this in luanetfilter (who is already linearizing the skb).
I tried a new Linux client with MTU set to 9000, and use hping3 to send > 1500 bytes
hping3 -c 1 -d 3000 -p 999 172.16.1.5 HPING 172.16.1.5 (enp2s0f1 172.16.1.5): NO FLAGS are set, 40 headers + 3000 data bytes --- 172.16.1.5 hping statistic --- 1 packets transmitted, 0 packets received, 100% packet loss
bpfire trace log:
<idle>-0 [000] ..s21 9697.400542: bpf_trace_printk: true_len: 3054 from 172.16.1.11 <idle>-0 [000] ..s21 9697.400551: bpf_trace_printk: old_data_len: 3054 from 172.16.1.11 <idle>-0 [000] ..s21 9697.400552: bpf_trace_printk: new_data_len: 3054 from 172.16.1.11
it looks to me generic xdp driver already handles linearized skb since the
true_len
equalsold_data_len
to3054
bytes, right? so my understanding is that if chrome (iMac) client does not segment the clienthello, https.o should get linearized skb with true length of clienthello and the sni filter should work. no need to callbpf_xdp_adjust_tail(ctx, 0);
,
yup, indeed I've the same understanding.. it looks like in the case of generic XDP it's resolved..
just curious, where you get calling
bpf_xdp_adjust_tail(ctx, 0);
with offset0
will linearize skb?
I think I got it wrongly.. I thought offset
referred to the beginning of the portion of the buffer you wanted to "linearize".. I guess it's actually the end of it and we should use true_len
here to linearize everything.. however, the frags would need to fit in the tail room.. I'm not sure it would be a solution anyway..
the case I was expecting for this to take effect was when true_len
is actually greater than old_data_len
, but if I got correctly it never happens on XDP generic, right? at least in your tests..
I will try to run some tests on my end as well.. but it sounds to me that we should fallback to luanetfilter in the case we don't have TCP frags in XDP.. we will add the network stack overhead, but I'm not sure it will be so different than handling frags on our parser.. it would need to be measured.. if you can run some benchmark on it, it would be very valuable.. =)
@vincentmli btw, in the case of hping you see the TCP segments in Wireshark?
@vincentmli btw, in the case of hping you see the TCP segments in Wireshark?
I used hping to only send one large tcp payload, it is one single tcp segment in tcpdump, no multiple tcp segmentation, wireshark should be same as tcpdump to show one tcp segment.
I will try to run some tests on my end as well.. but it sounds to me that we should fallback to luanetfilter in the case we don't have TCP frags in XDP.. we will add the network stack overhead, but I'm not sure it will be so different than handling frags on our parser.. it would need to be measured.. if you can run some benchmark on it, it would be very valuable.. =)
XDP multi buffer /jumbo support would be unrealistic for consumer network environment because consumer computer does not generate jumbo frames. it would be beneficial in enterprise grade data center. I think probably move SNI parsing to lunanetfilter is ok for my bpfire home firewall use case regardless of the performance :)
@vincentmli btw, in the case of hping you see the TCP segments in Wireshark?
I used hping to only send one large tcp payload, it is one single tcp segment in tcpdump, no multiple tcp segmentation, wireshark should be same as tcpdump to show one tcp segment.
wondering if you have actually reached the condition of TCP frags, then..
when use chrome browser, the ssl client hello is sent in two TCP segments, and reassembled, it appears sni.lua can't handle TCP reassembled clienthello SNI?