google / nftables

This repository contains a Go module to interact with Linux nftables (the iptables successor).
Apache License 2.0
1.1k stars 134 forks source link

Adding rules in code produces different results and logs than the rules I added directly from the command line #247

Open xlango opened 11 months ago

xlango commented 11 months ago

When i used command lines { nft add table ip filter nft add chain ip filter output { type filter hook output priority 0\; } nft add set filter ipSet { type ipv4_addr \; flags interval\;} nft add element ip filter ipSet {10.34.11.179} nft add set filter portSet { type inet_service \; flags interval\;} nft add element ip filter portSet {1234} nft add rule ip filter output ip daddr @ipSet tcp dport @portSet counter log drop
}

to add a rule related to set, it worked correctly. But when i used nftables-main to add a similar rule, it blocked tcp flow to "127.0.0.1" too. The code is : func main() { c, err := nftables.New() if err != nil { return }

c.FlushRuleset()

filter := c.AddTable(&nftables.Table{
    Family: nftables.TableFamilyIPv4,
    Name:   "filter",
})

input := c.AddChain(&nftables.Chain{
    Name:     "output",
    Hooknum:  nftables.ChainHookOutput,
    Priority: nftables.ChainPriorityFilter,
    Table:    filter,
    Type:     nftables.ChainTypeFilter,
})
ipSet := &nftables.Set{
    Name:"ipSet",
    Table:     filter,
    Interval:      true,
    Concatenation: true,
    KeyType:   nftables.TypeIPAddr,
}
if err := c.AddSet(ipSet, []nftables.SetElement{
    {
        Key: []byte(net.ParseIP("10.34.11.179").To4()),
        KeyEnd: []byte(net.ParseIP("10.34.11.180").To4()),
    },
}); err != nil {
    return
}

portSet := &nftables.Set{
    Name:"portSet",
    Table:     filter,
    Interval:      true,
    Concatenation: true,
    KeyType:    nftables.TypeInetService,
}

if err := c.AddSet(portSet, []nftables.SetElement{
    {
        Key: binaryutil.BigEndian.PutUint16(1234),
        KeyEnd:binaryutil.BigEndian.PutUint16(1235),
    },
}); err != nil {
    return
}

c.AddRule(&nftables.Rule{
    Table: filter,
    Chain: input,
    Exprs: []expr.Any{
        &expr.Payload{
            DestRegister: 1,
            Base:         expr.PayloadBaseNetworkHeader,
            Offset:       16,
            Len:          4,
        },

        &expr.Lookup{
            SourceRegister: 1,
            SetName:        ipSet.Name,
            SetID:          ipSet.ID,
        },

        &expr.Meta{Key: expr.MetaKeyL4PROTO, Register: 1},
        &expr.Cmp{
            Op:       expr.CmpOpEq,
            Register: 1,
            Data:     []byte{unix.IPPROTO_TCP},
        },

        &expr.Payload{
            DestRegister: 1,
            Base:         expr.PayloadBaseTransportHeader,
            Offset:       2,
            Len:          2,
        },

        &expr.Lookup{
            SourceRegister: 1,
            SetName:        portSet.Name,
            SetID:          portSet.ID,
        },

        &expr.Counter{},
        &expr.Log{},
        &expr.Verdict{
            Kind: expr.VerdictDrop,
        },
    },
})
if err := c.Flush(); err != nil {
    return
}

}

log 0a1fe720dba7af5505639d0a97611d6

turekt commented 10 months ago

Hi @xlango,

your code does not replicate netlink messages sent by the nft command.

Can you test the following code and let me know if it works?

package main

import (
    "github.com/google/nftables"
    "github.com/google/nftables/binaryutil"
    "github.com/google/nftables/expr"
    "golang.org/x/sys/unix"
    "net"
)

func main() {
    c, err := nftables.New()
    if err != nil {
        panic(err)
    }
    c.FlushRuleset()

    filter := c.AddTable(&nftables.Table{
        Family: nftables.TableFamilyIPv4,
        Name:   "filter",
    })

    input := c.AddChain(&nftables.Chain{
        Name:     "output",
        Hooknum:  nftables.ChainHookOutput,
        Priority: nftables.ChainPriorityFilter,
        Table:    filter,
        Type:     nftables.ChainTypeFilter,
    })
    ipSet := &nftables.Set{
        Name:     "ipSet",
        Table:    filter,
        Interval: true,
        KeyType:  nftables.TypeIPAddr,
    }
    if err := c.AddSet(ipSet, nil); err != nil {
        panic(err)
    }
    if err := c.SetAddElements(ipSet, []nftables.SetElement{
        {
            Key:         []byte{0x00, 0x00, 0x00, 0x00},
            IntervalEnd: true,
        },
        {
            Key: []byte(net.ParseIP("10.34.11.179").To4()),
        },
        {
            Key:         []byte(net.ParseIP("10.34.11.180").To4()),
            IntervalEnd: true,
        },
    }); err != nil {
        panic(err)
    }

    portSet := &nftables.Set{
        Name:     "portSet",
        Table:    filter,
        Interval: true,
        KeyType:  nftables.TypeInetService,
    }

    if err := c.AddSet(portSet, nil); err != nil {
        panic(err)
    }
    if err := c.SetAddElements(portSet, []nftables.SetElement{
        {
            Key:         []byte{0x00, 0x00},
            IntervalEnd: true,
        },
        {
            Key: binaryutil.BigEndian.PutUint16(1234),
        },
        {
            Key:         binaryutil.BigEndian.PutUint16(1235),
            IntervalEnd: true,
        },
    }); err != nil {
        panic(err)
    }

    c.AddRule(&nftables.Rule{
        Table: filter,
        Chain: input,
        Exprs: []expr.Any{
            &expr.Payload{
                DestRegister: 1,
                Base:         expr.PayloadBaseNetworkHeader,
                Offset:       16,
                Len:          4,
            },

            &expr.Lookup{
                SourceRegister: 1,
                SetName:        ipSet.Name,
                SetID:          ipSet.ID,
            },

            &expr.Meta{Key: expr.MetaKeyL4PROTO, Register: 1},
            &expr.Cmp{
                Op:       expr.CmpOpEq,
                Register: 1,
                Data:     []byte{unix.IPPROTO_TCP},
            },

            &expr.Payload{
                DestRegister: 1,
                Base:         expr.PayloadBaseTransportHeader,
                Offset:       2,
                Len:          2,
            },

            &expr.Lookup{
                SourceRegister: 1,
                SetName:        portSet.Name,
                SetID:          portSet.ID,
            },

            &expr.Counter{},
            &expr.Log{},
            &expr.Verdict{
                Kind: expr.VerdictDrop,
            },
        },
    })
    if err := c.Flush(); err != nil {
        panic(err)
    }
}

I hope that this resolves your issue.

realsyy commented 10 months ago

@turekt Thanks, it worked now.But i wonder why should i add a rule set element like "{ Key: []byte{0x00, 0x00, 0x00, 0x00}, IntervalEnd: true, }" ?

turekt commented 10 months ago

Hi @realsyy,

when analysing mnl debug information, you will see that this set element is added by nft (reduced output for brevity):

$ nft --debug=mnl add element ip filter ipSet {10.34.11.179}
...
----------------    ------------------
|  0000000120  |    | message length |
| 02572 | R--- |    |  type | flags  |
|  0000000001  |    | sequence number|
|  0000000000  |    |     port ID    |
----------------    ------------------
| 02 00 00 00  |    |  extra header  |
|00011|--|00001|    |len |flags| type|
| 66 69 6c 74  |    |      data      |   f i l t
| 65 72 00 00  |    |      data      |   e r    
|00010|--|00002|    |len |flags| type|
| 69 70 53 65  |    |      data      |   i p S e
| 74 00 00 00  |    |      data      |   t      
|00008|--|00004|    |len |flags| type|
| 00 00 00 01  |    |      data      |          
|00068|N-|00003|    |len |flags| type|
|00024|N-|00001|    |len |flags| type|
|00008|--|00003|    |len |flags| type|
| 00 00 00 01  |    |      data      |          
|00012|N-|00001|    |len |flags| type|
|00008|--|00001|    |len |flags| type|
| 00 00 00 00  |    |      data      |          
|00016|N-|00002|    |len |flags| type|
|00012|N-|00001|    |len |flags| type|
|00008|--|00001|    |len |flags| type|
| 0a 22 0b b3  |    |      data      |     "    
|00024|N-|00003|    |len |flags| type|
|00008|--|00003|    |len |flags| type|
| 00 00 00 01  |    |      data      |          
|00012|N-|00001|    |len |flags| type|
|00008|--|00001|    |len |flags| type|
| 0a 22 0b b4  |    |      data      |     "    
----------------    ------------------
...

Dissection of the netlink packet (nft_set_elem_list_attributes) as follows:

11 -- 1 (NFTA_SET_ELEM_LIST_TABLE)    | filter\x00\x00    |    --> table name (filter)
10 -- 2 (NFTA_SET_ELEM_LIST_SET)      | ipSet\x00\x00\x00 |    --> set name (ipSet)
 8 -- 4 (NFTA_SET_ELEM_LIST_SET_ID)   | \x00\x00\x00\x01  |    --> set ID (1)
68 N- 3 (NFTA_SET_ELEM_LIST_ELEMENTS) |                   |    --> nested, nft_set_elem_attributes struct
24 N- 1                               |                   |    --> nested, first element
 8 -- 3 (NFTA_SET_ELEM_FLAGS)         | \x00\x00\x00\x01  |    --> flag NFT_SET_ELEM_INTERVAL_END
12 N- 1                               |                   |    --> nested
 8 -- 1 (NFTA_SET_ELEM_KEY)           | \x00\x00\x00\x00  |    --> key, 4 bytes, value 0
16 N- 2                               |                   |    --> nested, second element
12 N- 1                               |                   |    --> nested
 8 -- 1 (NFTA_SET_ELEM_KEY)           | \x0a\x22\x0b\xb3  |    --> key, 4 bytes, value IP 10.34.11.179
24 N- 3                               |                   |    --> nested, third element
 8 -- 3 (NFTA_SET_ELEM_FLAGS)         | \x00\x00\x00\x01  |    --> flag NFT_SET_ELEM_INTERVAL_END
12 N- 1                               |                   |    --> nested
 8 -- 1 (NFTA_SET_ELEM_KEY)           | \x0a\x22\x0b\xb4  |    --> key, 4 bytes, value IP 10.34.11.180

Same structure is observed when setting debug to netlink:

$ nft --debug=netlink add element ip filter ipSet {10.34.11.179}
ip filter @ipSet
    element b40b220a  : 1 [end]
    element b30b220a  : 0 [end]
    element 00000000  : 1 [end]
ipSet filter 0
    element b30b220a  : 0 [end] element b40b220a  : 1 [end]

As to why this additional element is added by nftables, my guess would be due to use of segment trees in nftables intervals.

I don't know for sure but code shows a call to set_to_intervals when elements are being added via do_add_elements function here: https://git.netfilter.org/nftables/tree/src/rule.c?id=2f1050a6b30b41f4125ab6f0da7ea5255090ccce#n1431

static int do_add_elements(struct netlink_ctx *ctx, struct cmd *cmd,
               uint32_t flags)
{
    struct expr *init = cmd->expr;
    struct set *set = cmd->elem.set;

    if (set_is_non_concat_range(set) &&
        set_to_intervals(set, init, true) < 0)
        return -1;

    return __do_add_elements(ctx, cmd, set, init, flags);
}

The set_to_intervals checks whether a set requires "first segment": https://git.netfilter.org/nftables/tree/src/intervals.c?id=2f1050a6b30b41f4125ab6f0da7ea5255090ccce#n673:

...
if (!prev && segtree_needs_first_segment(set, init, add) &&
    mpz_cmp_ui(elem->key->left->value, 0)) {
...

Condition whether the first segment should be added is met because the set exists and there are no elements yet: https://git.netfilter.org/nftables/tree/src/intervals.c?id=2f1050a6b30b41f4125ab6f0da7ea5255090ccce#n631

static bool segtree_needs_first_segment(const struct set *set,
                    const struct expr *init, bool add)
{
    if (add && !set->root) {
        /* Add the first segment in four situations:
         *
         * 1) This is an anonymous set.
         * 2) This set exists and it is empty.
         * 3) New empty set and, separately, new elements are added.
         * 4) This set is created with a number of initial elements.
         */
        if ((set_is_anonymous(set->flags)) ||
            (set->init && set->init->size == 0) ||
            (set->init == NULL && init) ||
            (set->init == init)) {
            return true;
        }
...

The content of this first segment is added inside the if block. Its value is set to 0 and flags to INTERVAL_END: https://git.netfilter.org/nftables/tree/src/intervals.c?id=2f1050a6b30b41f4125ab6f0da7ea5255090ccce#n680

...
// sets value to 0
mpz_set(expr->value, p);
// expr set as root
root = set_elem_expr_alloc(&internal_location, expr);
if (i->etype == EXPR_MAPPING) {
    root = mapping_expr_alloc(&internal_location,
      root,
      expr_get(i->right));
}
// adds flag for interval end
root->flags |= EXPR_F_INTERVAL_END;
list_add(&root->list, &intervals);
...
realsyy commented 10 months ago

@turekt How can i dissect a netlink packet as you do, like: Dissection of the netlink packet (nft_set_elem_list_attributes) as follows: 11 -- 1 (NFTA_SET_ELEM_LIST_TABLE) | filter\x00\x00 | --> table name (filter) 10 -- 2 (NFTA_SET_ELEM_LIST_SET) | ipSet\x00\x00\x00 | --> set name (ipSet) 8 -- 4 (NFTA_SET_ELEM_LIST_SET_ID) | \x00\x00\x00\x01 | --> set ID (1) 68 N- 3 (NFTA_SET_ELEM_LIST_ELEMENTS) | | --> nested, nft_set_elem_attributes struct 24 N- 1 | | --> nested, first element 8 -- 3 (NFTA_SET_ELEM_FLAGS) | \x00\x00\x00\x01 | --> flag NFT_SET_ELEM_INTERVAL_END 12 N- 1 | | --> nested 8 -- 1 (NFTA_SET_ELEM_KEY) | \x00\x00\x00\x00 | --> key, 4 bytes, value 0 16 N- 2 | | --> nested, second element 12 N- 1 | | --> nested 8 -- 1 (NFTA_SET_ELEM_KEY) | \x0a\x22\x0b\xb3 | --> key, 4 bytes, value IP 10.34.11.179 24 N- 3 | | --> nested, third element 8 -- 3 (NFTA_SET_ELEM_FLAGS) | \x00\x00\x00\x01 | --> flag NFT_SET_ELEM_INTERVAL_END 12 N- 1 | | --> nested 8 -- 1 (NFTA_SET_ELEM_KEY) | \x0a\x22\x0b\xb4 | --> key, 4 bytes, value IP 10.34.11.180

turekt commented 10 months ago

@realsyy Once you obtain the debug messages, you need to check the source code to dissect packets.

Using your case as an example.

Obtain the debug message:

...
----------------    ------------------
|  0000000120  |    | message length |
| 02572 | R--- |    |  type | flags  |
|  0000000001  |    | sequence number|
|  0000000000  |    |     port ID    |
----------------    ------------------
| 02 00 00 00  |    |  extra header  |
|00011|--|00001|    |len |flags| type|
| 66 69 6c 74  |    |      data      |   f i l t
| 65 72 00 00  |    |      data      |   e r    
|00010|--|00002|    |len |flags| type|
| 69 70 53 65  |    |      data      |   i p S e
| 74 00 00 00  |    |      data      |   t      
|00008|--|00004|    |len |flags| type|
| 00 00 00 01  |    |      data      |          
|00068|N-|00003|    |len |flags| type|
|00024|N-|00001|    |len |flags| type|
|00008|--|00003|    |len |flags| type|
| 00 00 00 01  |    |      data      |          
|00012|N-|00001|    |len |flags| type|
|00008|--|00001|    |len |flags| type|
| 00 00 00 00  |    |      data      |          
|00016|N-|00002|    |len |flags| type|
|00012|N-|00001|    |len |flags| type|
|00008|--|00001|    |len |flags| type|
| 0a 22 0b b3  |    |      data      |     "    
|00024|N-|00003|    |len |flags| type|
|00008|--|00003|    |len |flags| type|
| 00 00 00 01  |    |      data      |          
|00012|N-|00001|    |len |flags| type|
|00008|--|00001|    |len |flags| type|
| 0a 22 0b b4  |    |      data      |     "    
----------------    ------------------
...

Use nf_tables.h from either netfilter or linux source code: https://git.netfilter.org/libnftnl/tree/include/linux/netfilter/nf_tables.h

The 2572 as message type marks the NFT_MSG type:

>>> bin(2572)
'0b101000001100'

Checking for message types in source code: https://git.netfilter.org/libnftnl/tree/include/linux/netfilter/nf_tables.h?id=3eaa940bc33a3186dc7ba1e30640ec79b5f261b9#n101, index in the enum corresponds to its value, so NFT_MSG_NEWSETELEM is 12:

>>> bin(12)
'0b1100'

The last 8 bits in the type are the same, so we can conclude that this is a message for adding a new set element. The meaning of the first 4 bits in the message type value (2560) is for you to try and conclude on your own.

Now that we know that this is a message for adding a new set element, the comment above the nf_tables_msg_types enum reveals:

 * @NFT_MSG_NEWSETELEM: create a new set element (enum nft_set_elem_attributes)

Moving to the appropriate struct: https://git.netfilter.org/libnftnl/tree/include/linux/netfilter/nf_tables.h?id=3eaa940bc33a3186dc7ba1e30640ec79b5f261b9#n422

The enum index corresponds to its integer value, so NFTA_SET_ELEM_KEY is 1, NFTA_SET_ELEM_DATA is 2 and so on.

I'm leaving further dissection of the above packet to you as an exercise.

realsyy commented 10 months ago

@turekt Yes, i can analyze a debug message as you do by doing manual work,i thought there may be a tool like nft or any other binary program which can analyze a debug message itself, all you do is just type a bash command.Thanks very much for your guidance!