luainkernel / lunatik

Lunatik is a framework for scripting the Linux kernel with Lua.
GNU General Public License v2.0
506 stars 29 forks source link

SNI filter sni.lua can't filter chrome browser with reassembled TCP segments? #193

Open vincentmli opened 1 week ago

vincentmli commented 1 week ago

when use chrome browser, the ssl client hello is sent in two TCP segments, and reassembled, it appears sni.lua can't handle TCP reassembled clienthello SNI?

Screenshot 2024-09-19 at 7 38 38 AM

vincentmli commented 1 week ago

the first TCP segments contains handshake and clienthello and up to extension key_share which has 1263 bytes. server_name extension is in the second TCP segments, remove the handshake and clienthello check in sni.lua seems not enough to get server_name, the offset could be incorrect still.

lneto commented 1 week ago

wrapping up our discussion on Matrix.. it might be resolved on kernel 6.9 by this:

https://elixir.bootlin.com/linux/v6.9/source/net/core/dev.c#L4914

see also: https://lore.kernel.org/bpf/cover.1707729884.git.lorenzo@kernel.org/

vincentmli commented 1 week ago

@lneto I tried upstream stable kernel 6.10.11 release, same problem

[root@bpfire-3 ~]# uname -a
Linux bpfire-3.localdomain 6.10.11-ipfire #1 SMP PREEMPT_DYNAMIC Fri Sep 20 15:26:24 GMT 2024 x86_64 Intel(R) Celeron(R) CPU G1820 @ 2.70GHz GenuineIntel GNU/Linux
lneto commented 1 week ago

I'm not sure why it's not triggering skb_linearize(), I would need to have a better look at this.. thinking also on a simple hack on Lua side to keep context of the parser.. will try it out later..

vincentmli commented 1 week ago

I'm not sure why it's not triggering skb_linearize(), I would need to have a better look at this.. thinking also on a simple hack on Lua side to keep context of the parser.. will try it out later..

is there some issue with skb_linearize() not triggering in kernel or in lunatik?

lneto commented 1 week ago

I'm not sure why it's not triggering skb_linearize(), I would need to have a better look at this.. thinking also on a simple hack on Lua side to keep context of the parser.. will try it out later..

is there some issue with skb_linearize() not triggering in kernel or in lunatik?

the issue is probably with my understanding of what should actual happen ;-).. I was expecting the following flow..

do_xdp_generic netif_receive_generic_xdp netif_skb_check_for_xdp skb_linearize

however, I'm not sure why it's not following this.. I need to spend some brain cycles to understand it better =)

vincentmli commented 1 week ago

I'm not sure why it's not triggering skb_linearize(), I would need to have a better look at this.. thinking also on a simple hack on Lua side to keep context of the parser.. will try it out later..

is there some issue with skb_linearize() not triggering in kernel or in lunatik?

the issue is probably with my understanding of what should actual happen ;-).. I was expecting the following flow..

do_xdp_generic netif_receive_generic_xdp netif_skb_check_for_xdp skb_linearize

however, I'm not sure why it's not following this.. I need to spend some brain cycles to understand it better =)

I am learning and I like to understand it better too :). if the skb_linearize is triggered as you thought, when XDP pass the packet to Lua, it will be one whole clienthello packet so Lua can parse it correctly? not like two previously individual packet segments ?

lneto commented 1 week ago

perhaps this is the guy we want to use.. https://ebpf-docs.dylanreimerink.nl/linux/helper-function/bpf_xdp_get_buff_len/

also, page 15 from https://lpc.events/event/11/contributions/939/attachments/771/1551/xdp-multi-buff.pdf

I still need to digest.. but I think it should be the path to follow..

lneto commented 1 week ago

I'm not sure why it's not triggering skb_linearize(), I would need to have a better look at this.. thinking also on a simple hack on Lua side to keep context of the parser.. will try it out later..

is there some issue with skb_linearize() not triggering in kernel or in lunatik?

the issue is probably with my understanding of what should actual happen ;-).. I was expecting the following flow.. do_xdp_generic netif_receive_generic_xdp netif_skb_check_for_xdp skb_linearize however, I'm not sure why it's not following this.. I need to spend some brain cycles to understand it better =)

I am learning and I like to understand it better too :). if the skb_linearize is triggered as you thought, when XDP pass the packet to Lua, it will be one whole clienthello packet so Lua can parse it correctly? not like two previously individual packet segments ?

precisely.. we have had the same issue with luaxtable.. you might want to take a look..

https://github.com/luainkernel/lunatik/issues/107#issuecomment-2276326453 https://github.com/luainkernel/lunatik/commit/a147522fb664bfbfdc0289c9a12d4e05b6ffae64

lneto commented 1 week ago

perhaps this is the guy we want to use.. https://ebpf-docs.dylanreimerink.nl/linux/helper-function/bpf_xdp_get_buff_len/

also, page 15 from https://lpc.events/event/11/contributions/939/attachments/771/1551/xdp-multi-buff.pdf

I still need to digest.. but I think it should be the path to follow..

these helpers doesn't seem to be merged in the tree..

UPDATE: sorry, only adjust_data that isn't.. we'll keep digging later..

lneto commented 1 week ago

it might be our guy.. https://elixir.bootlin.com/linux/v6.10/source/net/core/filter.c#L4175

lneto commented 1 week ago

it might be our guy.. https://elixir.bootlin.com/linux/v6.10/source/net/core/filter.c#L4175

see: https://elixir.bootlin.com/linux/v6.10/source/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c#L6

can you try to add bpf_xdp_adjust_tail(ctx, 0); in https.c just before https://github.com/luainkernel/lunatik/blob/master/examples/filter/https.c#L22?

lneto commented 1 week ago

a silly question.. you are using -skb on xdp-loader, right? e.g.,

sudo xdp-loader load -m skb <ifname> https.o
vincentmli commented 1 week ago

a silly question.. you are using -skb on xdp-loader, right? e.g.,

sudo xdp-loader load -m skb <ifname> https.o

yes

vincentmli commented 1 week ago

it might be our guy.. https://elixir.bootlin.com/linux/v6.10/source/net/core/filter.c#L4175

see: https://elixir.bootlin.com/linux/v6.10/source/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c#L6

can you try to add bpf_xdp_adjust_tail(ctx, 0); in https.c just before https://github.com/luainkernel/lunatik/blob/master/examples/filter/https.c#L22?

I added bpf_xdp_adjust_tail(ctx, 0);, same problem. curl blocked, chrome passed. the selftest seems to check the total length first, then adjust the tail according to the length, here you suggest offset 0 meaning ? can we artificially add an random offset number that is big enough to hold the chrome clienthello to try?

vincentmli commented 1 week ago

it might be our guy.. https://elixir.bootlin.com/linux/v6.10/source/net/core/filter.c#L4175

see: https://elixir.bootlin.com/linux/v6.10/source/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c#L6 can you try to add bpf_xdp_adjust_tail(ctx, 0); in https.c just before https://github.com/luainkernel/lunatik/blob/master/examples/filter/https.c#L22?

I added bpf_xdp_adjust_tail(ctx, 0);, same problem. curl blocked, chrome passed. the selftest seems to check the total length first, then adjust the tail according to the length, here you suggest offset 0 meaning ? can we artificially add an random offset number that is big enough to hold the chrome clienthello to try?

I added bpf_xdp_adjust_tail(ctx, 2048);, no effect

vincentmli commented 1 week ago

I think we could call bpf_xdp_get_buff_len(ctx) and print out the length in https.o so we know how many data length there is, right? I guess the end goal is we want to get the data length in https.o that is equal to the total packet length chrome browser sent. if we don't get that large data length, problem is in kernel side, if we get large data length in https.o, problem is in LuaXDP side?

lneto commented 1 week ago

I think we could call bpf_xdp_get_buff_len(ctx) and print out the length in https.o so we know how many data length there is, right? I guess the end goal is we want to get the data length in https.o that is equal to the total packet length chrome browser sent. if we don't get that large data length, problem is in kernel side, if we get large data length in https.o, problem is in LuaXDP side?

yup, I would print it as well..

lneto commented 1 week ago

perhaps we need this flag.. can you try this out? https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#xdp-fragments

lneto commented 1 week ago

can you try s/SEC("xdp")/SEC("xdp.frags")/?

more info here.. https://lore.kernel.org/netdev/cover.1642758637.git.lorenzo@kernel.org/

perhaps we need to use bpf_xdp_load_bytes as well..

lneto commented 1 week ago

Btw, what's your NIC? see https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

Can you check your MTU config as well?

vincentmli commented 1 week ago

I added bpf_printk to print length, and xdp.frags, I can see the printed length, but none of them exceeded MTU 1460 size, all are small, I need to adjust the https.c so I can clearly tell the printed length is from chrome, now there are some other noisy printout that can confuse me.

#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>

extern int bpf_luaxdp_run(char *key, size_t key__sz, struct xdp_md *xdp_ctx, void *arg, size_t arg__sz) __ksym;

static char runtime[] = "examples/filter/sni";

struct bpf_luaxdp_arg {
        __u16 offset;
} __attribute__((packed));

SEC("xdp.frags")
int filter_https(struct xdp_md *ctx)
{
        int data_len = bpf_xdp_get_buff_len(ctx);
        struct bpf_luaxdp_arg arg;
        //bpf_xdp_adjust_tail(ctx, 2048);
        void *data_end = (void *)(long)ctx->data_end;
        void *data = (void *)(long)ctx->data;
        struct iphdr *ip = data + sizeof(struct ethhdr);

        if (ip + 1 > (struct iphdr *)data_end)
                goto pass;

        if (ip->protocol != IPPROTO_TCP)
                goto pass;

        struct tcphdr *tcp = (void *)ip + (ip->ihl * 4);
        if (tcp + 1 > (struct tcphdr *)data_end)
                goto pass;

        if (bpf_ntohs(tcp->dest) != 443 || !tcp->psh)
                goto pass;

        bpf_printk(" data len: %d from %pI4\n", data_len, &ip->saddr);
        void *payload = (void *)tcp + (tcp->doff * 4);
        if (payload > data_end)
                goto pass;

        arg.offset = bpf_htons((__u16)(payload - data));

        int action = bpf_luaxdp_run(runtime, sizeof(runtime), ctx, &arg, sizeof(arg));
        return action < 0 ? XDP_PASS : action;
pass:
        return XDP_PASS;
}

char _license[] SEC("license") = "Dual MIT/GPL";
vincentmli commented 1 week ago

https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

Mine is broadcom and MTU is set to 1500

3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000
    link/ether 54:9f:35:0d:a7:4a brd ff:ff:ff:ff:ff:ff
    prog/xdp id 81 name xdp_dispatcher tag 90f686eb86991928 jited 

02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
    Subsystem: Dell NetXtreme BCM5720 Gigabit Ethernet PCIe
        Product Name: Broadcom NetXtreme Gigabit Ethernet

I can switch to a mini PC which has Intel I211 Gigabit and it uses igb driver, is that ok? since we are using XDP generic skb mode, does it matter which NIC?

lneto commented 1 week ago

Btw, what's your NIC? see https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

Can you check your MTU config as well?

Can you also try to run it on veth? So we can discard the interference of your NIC driver..

vincentmli commented 1 week ago

Btw, what's your NIC? see https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2 Can you check your MTU config as well?

Can you also try to run it on veth? So we can discard the interference of your NIC driver..

since the test involves using chrome browser, maybe there is someway to use curl client to simulate chrome browser, I imagine veth would in some kind of container setup in one machine

lneto commented 1 week ago

https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

Mine is broadcom and MTU is set to 1500

3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000
    link/ether 54:9f:35:0d:a7:4a brd ff:ff:ff:ff:ff:ff
    prog/xdp id 81 name xdp_dispatcher tag 90f686eb86991928 jited 

02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
  Subsystem: Dell NetXtreme BCM5720 Gigabit Ethernet PCIe
      Product Name: Broadcom NetXtreme Gigabit Ethernet

I can switch to a mini PC which has Intel I211 Gigabit and it uses igb driver, is that ok?

Yes, please..

since we are using XDP generic skb mode, does it matter which NIC?

I assumed it doesn't as well.. but I'm not sure regarding the multi-buf support

lneto commented 1 week ago

Btw, what's your NIC? see https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2 Can you check your MTU config as well?

Can you also try to run it on veth? So we can discard the interference of your NIC driver..

since the test involves using chrome browser, maybe there is someway to use curl client to simulate chrome browser, I imagine veth would in some kind of container setup in one machine

I think you can just use nc to send the client-hello you captured from chrome..

vincentmli commented 1 week ago

tested on mini PC with intel I211 igb driver, mtu 1500, not working. I am running timeout and got to fix some other issue of bpfire :)

vincentmli commented 1 week ago

https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

it might be realistic for me to try virtio https://github.com/torvalds/linux/commit/22174f79a44baf5e46faafff1d7b21363431b93a in my home lab since I could run BPFire as KVM virtual machine, I am wondering though if the underlying hardware NIC does not support XDP frag, would virtio driver in guest still support multi buffer? it will take me some time to get the lab up and test...

lneto commented 1 week ago

https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

it might be realistic for me to try virtio torvalds/linux@22174f7 in my home lab since I could run BPFire as KVM virtual machine, I am wondering though if the underlying hardware NIC does not support XDP frag, would virtio driver in guest still support multi buffer? it will take me some time to get the lab up and test...

That's a good question.. I'm not sure..

lneto commented 1 week ago

I added bpf_printk to print length, and xdp.frags, I can see the printed length, but none of them exceeded MTU 1460 size, all are small, I need to adjust the https.c so I can clearly tell the printed length is from chrome, now there are some other noisy printout that can confuse me.

#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>

extern int bpf_luaxdp_run(char *key, size_t key__sz, struct xdp_md *xdp_ctx, void *arg, size_t arg__sz) __ksym;

static char runtime[] = "examples/filter/sni";

struct bpf_luaxdp_arg {
        __u16 offset;
} __attribute__((packed));

SEC("xdp.frags")
int filter_https(struct xdp_md *ctx)
{
        int data_len = bpf_xdp_get_buff_len(ctx);
        struct bpf_luaxdp_arg arg;
        //bpf_xdp_adjust_tail(ctx, 2048);
        void *data_end = (void *)(long)ctx->data_end;
        void *data = (void *)(long)ctx->data;
        struct iphdr *ip = data + sizeof(struct ethhdr);

        if (ip + 1 > (struct iphdr *)data_end)
                goto pass;

        if (ip->protocol != IPPROTO_TCP)
                goto pass;

        struct tcphdr *tcp = (void *)ip + (ip->ihl * 4);
        if (tcp + 1 > (struct tcphdr *)data_end)
                goto pass;

        if (bpf_ntohs(tcp->dest) != 443 || !tcp->psh)
                goto pass;

        bpf_printk(" data len: %d from %pI4\n", data_len, &ip->saddr);
        void *payload = (void *)tcp + (tcp->doff * 4);
        if (payload > data_end)
                goto pass;

        arg.offset = bpf_htons((__u16)(payload - data));

        int action = bpf_luaxdp_run(runtime, sizeof(runtime), ctx, &arg, sizeof(arg));
        return action < 0 ? XDP_PASS : action;
pass:
        return XDP_PASS;
}

char _license[] SEC("license") = "Dual MIT/GPL";

BTW, is bpf_xdp_get_buf_len(ctx) == data_end - data?

vincentmli commented 1 week ago

https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

it might be realistic for me to try virtio torvalds/linux@22174f7 in my home lab since I could run BPFire as KVM virtual machine, I am wondering though if the underlying hardware NIC does not support XDP frag, would virtio driver in guest still support multi buffer? it will take me some time to get the lab up and test...

That's a good question.. I'm not sure..

now I got BPFire in KVM running and the green0 virtual interface is virtio_net, still no success

3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc cake state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:62:45:3e brd ff:ff:ff:ff:ff:ff
    prog/xdp id 9 name xdp_dispatcher tag 90f686eb86991928 jited 

[root@bpfire-8 lua]# lspci -vvv | grep -i eth
01:00.0 Ethernet controller: Red Hat, Inc. Virtio 1.0 network device (rev 01)
07:00.0 Ethernet controller: Red Hat, Inc. Virtio 1.0 network device (rev 01)

[root@bpfire-8 lua]# dmesg | grep green0
[    7.821719] 8021q: adding VLAN 0 to HW filter on device green0
[  276.029437] virtio_net virtio6 green0: XDP request 5 queues but max is 1. XDP_TX and XDP_REDIRECT will operate in a slower locked tx mode.
vincentmli commented 1 week ago

BTW, is bpf_xdp_get_buf_len(ctx) == data_end - data?

there is noisy data length print and I could not tell which data length printout refer to chrome :), do you have better idea to condition the bpf_printk to maybe only print out for chrome ?

lneto commented 1 week ago

BTW, is bpf_xdp_get_buf_len(ctx) == data_end - data?

there is noisy data length print and I could not tell which data length printout refer to chrome :), do you have better idea to condition the bpf_printk to maybe only print out for chrome ?

You can pass them to Lua adding two new __u16 fields in the bpf_luaxdp_arg struct.. them you could use Lua to add more info.. perhaps other metadata from the TLS handshake.. cipher perhaps?

lneto commented 1 week ago

https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

Mine is broadcom and MTU is set to 1500

3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000
    link/ether 54:9f:35:0d:a7:4a brd ff:ff:ff:ff:ff:ff
    prog/xdp id 81 name xdp_dispatcher tag 90f686eb86991928 jited 

02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
  Subsystem: Dell NetXtreme BCM5720 Gigabit Ethernet PCIe
      Product Name: Broadcom NetXtreme Gigabit Ethernet

1500 seems too little, right? TCP length in your example is 1781 bytes, right? Can you try to increase it? let's say..

ip link set dev green0 mtu 3500
vincentmli commented 1 week ago

https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_XDP/#__tabbed_1_2

Mine is broadcom and MTU is set to 1500

3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000
    link/ether 54:9f:35:0d:a7:4a brd ff:ff:ff:ff:ff:ff
    prog/xdp id 81 name xdp_dispatcher tag 90f686eb86991928 jited 

02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
    Subsystem: Dell NetXtreme BCM5720 Gigabit Ethernet PCIe
        Product Name: Broadcom NetXtreme Gigabit Ethernet

1500 seems too little, right? TCP length in your example is 1781 bytes, right? Can you try to increase it? let's say..

ip link set dev green0 mtu 3500

yes I already tried larger MTU, no success

vincentmli commented 1 week ago

BTW, is bpf_xdp_get_buf_len(ctx) == data_end - data?

there is noisy data length print and I could not tell which data length printout refer to chrome :), do you have better idea to condition the bpf_printk to maybe only print out for chrome ?

You can pass them to Lua adding two new __u16 fields in the bpf_luaxdp_arg struct.. them you could use Lua to add more info.. perhaps other metadata from the TLS handshake.. cipher perhaps?

I think I need to understand and read more about the xdp frags so I know what I am trying :)

lneto commented 1 week ago

@vincentmli my understanding is that you need

1) MTU bigger or equal than the ETH frame that will carry the TCP frags; 2) load eBPF program in SEC("XDP.frags"); 3) make your TCP frag request using chrome; 4) now, bpf_xdp_get_buf_len() should return the total TCP segment length, which should be greater than data_end - data. This means that part of of the frame is linearized, part is not. 5) then, if I got correctly, you can call bpf_xdp_adjust_tail(ctx, 0) to "linearize" it. 6) after that, data_end - data should have the length of the whole TCP segment.

vincentmli commented 1 week ago

@vincentmli my understanding is that you need

  1. MTU bigger or equal than the ETH frame that will carry the TCP frags;
  2. load eBPF program in SEC("XDP.frags");
  3. make your TCP frag request using chrome;
  4. now, bpf_xdp_get_buf_len() should return the total TCP segment length, which should be greater than data_end - data. This means that part of of the frame is linearized, part is not.
  5. then, if I got correctly, you can call bpf_xdp_adjust_tail(ctx, 0) to "linearize" it.
  6. after that, data_end - data should have the length of the whole TCP segment.

thanks! I will try it, I am following the initial XDP multi buffer support and reading all the discussion involved and hasn't finished yet, it takes time but I am getting better understanding :) https://github.com/vincentmli/BPFire/issues/44

vincentmli commented 1 week ago

I followed your suggestion to set the MTU to 4096, and adjusted the code below to print true_len, old_data_len, new_data_len, send chrome request, but they are always the same, still no success, I am expecting true_len > old_data_len, but it never occurs

MTU

[root@bpfire-5 ~]# ip l list dev green0
5: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4096 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000
    link/ether 00:0e:c4:cf:4c:8e brd ff:ff:ff:ff:ff:ff
    prog/xdp id 149 name xdp_dispatcher tag 90f686eb86991928 jited 

code

 18 SEC("xdp.frags")
 19 int filter_https(struct xdp_md *ctx)
 20 {
 21         struct bpf_luaxdp_arg arg;
 22 
 23         void *old_data_end = (void *)(long)ctx->data_end;
 24         void *old_data = (void *)(long)ctx->data;
 25 
 26         int true_len = bpf_xdp_get_buff_len(ctx);
 27 
 28         int old_data_len = old_data_end - old_data;
 29 
 30         bpf_xdp_adjust_tail(ctx, 0);
 31 
 32         void *data_end = (void *)(long)ctx->data_end;
 33         void *data = (void *)(long)ctx->data;
 34 
 35         int new_data_len = data_end - data;
 36 
 37         struct iphdr *ip = data + sizeof(struct ethhdr);
 38 
 39         if (ip + 1 > (struct iphdr *)data_end)
 40                 goto pass;
 41 
 42         if (ip->protocol != IPPROTO_TCP)
 43                 goto pass;
 44 
 45         struct tcphdr *tcp = (void *)ip + (ip->ihl * 4);
 46         if (tcp + 1 > (struct tcphdr *)data_end)
 47                 goto pass;
 48 
 49         if (bpf_ntohs(tcp->dest) != 443 || !tcp->psh)
 50                 goto pass;
 51 
 52         bpf_printk(" true_len: %d from %pI4\n", true_len, &ip->saddr);
 53         bpf_printk(" old_data_len: %d from %pI4\n", old_data_len, &ip->saddr);
 54         bpf_printk(" new_data_len: %d from %pI4\n", new_data_len, &ip->saddr);
 55 
 56         void *payload = (void *)tcp + (tcp->doff * 4);
 57         if (payload > data_end)
 58                 goto pass;
 59 
 60         arg.offset = bpf_htons((__u16)(payload - data));
 61 
 62         int action = bpf_luaxdp_run(runtime, sizeof(runtime), ctx, &arg, sizeof(arg));
 63         return action < 0 ? XDP_PASS : action;
 64 pass:
 65         return XDP_PASS;
 66 }
 67 

trace log

          <idle>-0       [001] ..s21  2038.226457: bpf_trace_printk:  true_len: 803 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.226464: bpf_trace_printk:  old_data_len: 803 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.226465: bpf_trace_printk:  new_data_len: 803 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268004: bpf_trace_printk:  true_len: 130 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268012: bpf_trace_printk:  old_data_len: 130 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268013: bpf_trace_printk:  new_data_len: 130 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268509: bpf_trace_printk:  true_len: 158 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268514: bpf_trace_printk:  old_data_len: 158 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268515: bpf_trace_printk:  new_data_len: 158 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268595: bpf_trace_printk:  true_len: 647 from 172.16.1.9

          <idle>-0       [001] .Ns21  2038.268605: bpf_trace_printk:  old_data_len: 647 from 172.16.1.9

          <idle>-0       [001] .Ns21  2038.268605: bpf_trace_printk:  new_data_len: 647 from 172.16.1.9
vincentmli commented 1 week ago

this tcpdump capture from the chrome client, so the clienthello is already segmented by the chrome client due to client MTU, even if BPFire green0 MTU is 4096, the incoming packet is smaller than standard 1500 MTU, so the generic xdp actually never received > 1500 packets, is that the reason ? I tried to set the chrome client MTU to 4096 too, but still it sends smaller packet < 1500. the chrome client is iMac by the way.

17:58:15.505326 IP 172.16.1.9.55621 > 104.198.14.52.443: Flags [.], seq 1:1229, ack 1, win 2053, options [nop,nop,TS val 1207024357 ecr 2206901463], length 1228

17:58:15.505327 IP 172.16.1.9.55621 > 104.198.14.52.443: Flags [P.], seq 1229:1934, ack 1, win 2053, options [nop,nop,TS val 1207024357 ecr 2206901463], length 705
vincentmli commented 1 week ago

and even if I can force chrome client send large than > mtu 1500 packet, this would not work in practice, can't force each user computer to do that :)

vincentmli commented 1 week ago

I tried a new Linux client with MTU set to 9000, and use hping3 to send > 1500 bytes

 hping3 -c 1 -d 3000 -p 999 172.16.1.5
HPING 172.16.1.5 (enp2s0f1 172.16.1.5): NO FLAGS are set, 40 headers + 3000 data bytes

--- 172.16.1.5 hping statistic ---
1 packets transmitted, 0 packets received, 100% packet loss

bpfire trace log:

         <idle>-0       [000] ..s21  9697.400542: bpf_trace_printk:  true_len: 3054 from 172.16.1.11

          <idle>-0       [000] ..s21  9697.400551: bpf_trace_printk:  old_data_len: 3054 from 172.16.1.11

          <idle>-0       [000] ..s21  9697.400552: bpf_trace_printk:  new_data_len: 3054 from 172.16.1.11

it looks to me generic xdp driver already handles linearized skb since the true_len equals old_data_len to 3054 bytes, right? so my understanding is that if chrome (iMac) client does not segment the clienthello, https.o should get linearized skb with true length of clienthello and the sni filter should work. no need to call bpf_xdp_adjust_tail(ctx, 0);, just curious, where you get calling bpf_xdp_adjust_tail(ctx, 0); with offset 0 will linearize skb?

lneto commented 6 days ago

I followed your suggestion to set the MTU to 4096, and adjusted the code below to print true_len, old_data_len, new_data_len, send chrome request, but they are always the same, still no success, I am expecting true_len > old_data_len, but it never occurs

MTU

[root@bpfire-5 ~]# ip l list dev green0
5: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4096 xdpgeneric qdisc cake state UP mode DEFAULT group default qlen 1000
    link/ether 00:0e:c4:cf:4c:8e brd ff:ff:ff:ff:ff:ff
    prog/xdp id 149 name xdp_dispatcher tag 90f686eb86991928 jited 

code

 18 SEC("xdp.frags")
 19 int filter_https(struct xdp_md *ctx)
 20 {
 21         struct bpf_luaxdp_arg arg;
 22 
 23         void *old_data_end = (void *)(long)ctx->data_end;
 24         void *old_data = (void *)(long)ctx->data;
 25 
 26         int true_len = bpf_xdp_get_buff_len(ctx);
 27 
 28         int old_data_len = old_data_end - old_data;
 29 
 30         bpf_xdp_adjust_tail(ctx, 0);
 31 
 32         void *data_end = (void *)(long)ctx->data_end;
 33         void *data = (void *)(long)ctx->data;
 34 
 35         int new_data_len = data_end - data;
 36 
 37         struct iphdr *ip = data + sizeof(struct ethhdr);
 38 
 39         if (ip + 1 > (struct iphdr *)data_end)
 40                 goto pass;
 41 
 42         if (ip->protocol != IPPROTO_TCP)
 43                 goto pass;
 44 
 45         struct tcphdr *tcp = (void *)ip + (ip->ihl * 4);
 46         if (tcp + 1 > (struct tcphdr *)data_end)
 47                 goto pass;
 48 
 49         if (bpf_ntohs(tcp->dest) != 443 || !tcp->psh)
 50                 goto pass;
 51 
 52         bpf_printk(" true_len: %d from %pI4\n", true_len, &ip->saddr);
 53         bpf_printk(" old_data_len: %d from %pI4\n", old_data_len, &ip->saddr);
 54         bpf_printk(" new_data_len: %d from %pI4\n", new_data_len, &ip->saddr);
 55 
 56         void *payload = (void *)tcp + (tcp->doff * 4);
 57         if (payload > data_end)
 58                 goto pass;
 59 
 60         arg.offset = bpf_htons((__u16)(payload - data));
 61 
 62         int action = bpf_luaxdp_run(runtime, sizeof(runtime), ctx, &arg, sizeof(arg));
 63         return action < 0 ? XDP_PASS : action;
 64 pass:
 65         return XDP_PASS;
 66 }
 67 

trace log

          <idle>-0       [001] ..s21  2038.226457: bpf_trace_printk:  true_len: 803 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.226464: bpf_trace_printk:  old_data_len: 803 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.226465: bpf_trace_printk:  new_data_len: 803 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268004: bpf_trace_printk:  true_len: 130 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268012: bpf_trace_printk:  old_data_len: 130 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268013: bpf_trace_printk:  new_data_len: 130 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268509: bpf_trace_printk:  true_len: 158 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268514: bpf_trace_printk:  old_data_len: 158 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268515: bpf_trace_printk:  new_data_len: 158 from 172.16.1.9

          <idle>-0       [001] ..s21  2038.268595: bpf_trace_printk:  true_len: 647 from 172.16.1.9

          <idle>-0       [001] .Ns21  2038.268605: bpf_trace_printk:  old_data_len: 647 from 172.16.1.9

          <idle>-0       [001] .Ns21  2038.268605: bpf_trace_printk:  new_data_len: 647 from 172.16.1.9

so, none exceeded the "default" MTU size.. my understanding is that wouldn't be non-linear anyway..

lneto commented 6 days ago

and even if I can force chrome client send large than > mtu 1500 packet, this would not work in practice, can't force each user computer to do that :)

my understanding (from the issue description) was that chrome client was already sending TCP frags into a large ETH frame.. if it's not the case, I don't think XDP will be aware of the frags because it runs before the network stack.. in this case, I see two paths:

1) handle frags in our eBPF/Lua program; 2) let the frags go through the network stack and handle this in luanetfilter (who is already linearizing the skb).

lneto commented 6 days ago

I tried a new Linux client with MTU set to 9000, and use hping3 to send > 1500 bytes

 hping3 -c 1 -d 3000 -p 999 172.16.1.5
HPING 172.16.1.5 (enp2s0f1 172.16.1.5): NO FLAGS are set, 40 headers + 3000 data bytes

--- 172.16.1.5 hping statistic ---
1 packets transmitted, 0 packets received, 100% packet loss

bpfire trace log:

         <idle>-0       [000] ..s21  9697.400542: bpf_trace_printk:  true_len: 3054 from 172.16.1.11

          <idle>-0       [000] ..s21  9697.400551: bpf_trace_printk:  old_data_len: 3054 from 172.16.1.11

          <idle>-0       [000] ..s21  9697.400552: bpf_trace_printk:  new_data_len: 3054 from 172.16.1.11

it looks to me generic xdp driver already handles linearized skb since the true_len equals old_data_len to 3054 bytes, right? so my understanding is that if chrome (iMac) client does not segment the clienthello, https.o should get linearized skb with true length of clienthello and the sni filter should work. no need to call bpf_xdp_adjust_tail(ctx, 0);,

yup, indeed I've the same understanding.. it looks like in the case of generic XDP it's resolved..

just curious, where you get calling bpf_xdp_adjust_tail(ctx, 0); with offset 0 will linearize skb?

I think I got it wrongly.. I thought offset referred to the beginning of the portion of the buffer you wanted to "linearize".. I guess it's actually the end of it and we should use true_len here to linearize everything.. however, the frags would need to fit in the tail room.. I'm not sure it would be a solution anyway..

the case I was expecting for this to take effect was when true_len is actually greater than old_data_len, but if I got correctly it never happens on XDP generic, right? at least in your tests..

I will try to run some tests on my end as well.. but it sounds to me that we should fallback to luanetfilter in the case we don't have TCP frags in XDP.. we will add the network stack overhead, but I'm not sure it will be so different than handling frags on our parser.. it would need to be measured.. if you can run some benchmark on it, it would be very valuable.. =)

lneto commented 6 days ago

@vincentmli btw, in the case of hping you see the TCP segments in Wireshark?

vincentmli commented 6 days ago

@vincentmli btw, in the case of hping you see the TCP segments in Wireshark?

I used hping to only send one large tcp payload, it is one single tcp segment in tcpdump, no multiple tcp segmentation, wireshark should be same as tcpdump to show one tcp segment.

vincentmli commented 6 days ago

I will try to run some tests on my end as well.. but it sounds to me that we should fallback to luanetfilter in the case we don't have TCP frags in XDP.. we will add the network stack overhead, but I'm not sure it will be so different than handling frags on our parser.. it would need to be measured.. if you can run some benchmark on it, it would be very valuable.. =)

XDP multi buffer /jumbo support would be unrealistic for consumer network environment because consumer computer does not generate jumbo frames. it would be beneficial in enterprise grade data center. I think probably move SNI parsing to lunanetfilter is ok for my bpfire home firewall use case regardless of the performance :)

lneto commented 6 days ago

@vincentmli btw, in the case of hping you see the TCP segments in Wireshark?

I used hping to only send one large tcp payload, it is one single tcp segment in tcpdump, no multiple tcp segmentation, wireshark should be same as tcpdump to show one tcp segment.

wondering if you have actually reached the condition of TCP frags, then..