zerotier / ZeroTierOne

A Smart Ethernet Switch for Earth
https://zerotier.com
Other
14.54k stars 1.7k forks source link

Obscure rules evaluation quirks #2200

Closed laduke closed 8 months ago

laduke commented 10 months ago

Hello

This is one issue in two parts. It all started with another attempt at figuring out why tag-based rulesets sorta mostly work but then don't in confusing ways. When we finally figured that out, we noticed a more of the same issue with some non-tag rules/matches.

Related: #1495 There are many other's in the discussion board and internal ticket tracker, etc...

Old context

There have been tickets/complaints/confusions over the years with tag rules. We had been suggesting putting accept ethertype arp at the top of the rule set to work around it. We weren't 100% sure why this helped.

We think we finally understand. accept ethertype arp lets arp packets flow before checking tags, and so tag values are exchanged between peers during the arp request and repsonse. Then, both peers have knowledge of each other's tag values and can evaluate all other traffic without any special casing.

Consider this tag based ruleset

tag role
  id 13
  default 0
  flag 0 foo
;

accept tand role 1;
drop;

tand is bitwise AND

If both members have the "foo" bit set, they can talk to each other. If either or both don't have the bit, they can't talk.

Now consider this code in the rule evaluator

 else {
    // Outbound side is not strict since if we have to match both tags and
    // we are sending a first packet to a recipient, we probably do not know
    // about their tags yet. They will filter on inbound and we will filter
    // once we get their tag. If we are a tee/redirect target we are also
    // not strict since we likely do not have these tags.
    thisRuleMatches = 1;
}

Open up the code to get a little more context. We're in the tag processing section of the rule evaluator.

Forcing thisRuleMatches = 1 causes the rule evalutor to match and stop at accept tand role 1, even though we don't know the recipient's tags yet.

Makes sense.

Otherwise accept tand role 1 wouldn't match, and the evaluator would go to the next rules. The only rule left is drop. With out the thisRuleMatches = 1 special case, we wouldn't be able to talk.

Now consider this slightly different ruleset

It doesn't the same thing, but with inverted logic.

drop tand role 0;
accept;

What happens when we set thisRuleMatches = 1 here?

drop tand role 0; matches, and we never get to accept.

We can never send a packet! So we can never send our tags to each other. We can never talk.

Not too hard to fix

If we can keep a little state we can skip "drops" caused by missing remote tags.

We think this is OK security-wise and matches the original intention. The remote node will still do the right thing, drop the packet if needed.

If the rules were something like this:

drop tand role 0;
drop ipprotocol sctp;
accept;

drop tand role 0; will get skipped by the sender. If it's a sctp packet, it'll get stopped at the sender. Otherwise it'll get sent, then the recipient will evaluate the rules again -with all the needed tag information.

This is what we tried so far anyways. There may be other solutions.

But wait there's more.

Rules can be inverted with the not keyword.

accept not tand role 0;
drop;

When we get to the end of the evaluation of a rule, we check if the not bit is set, and invert the match.

if ((rules[rn].t & 0x40)) {
    thisSetMatches |= (thisRuleMatches ^ ((rules[rn].t >> 7) & 1));
} else {
    thisSetMatches &= (thisRuleMatches ^ ((rules[rn].t >> 7) & 1));
}

This is XOR of thisRuleMatches and the NOT bit of the rule.

So accept not tand role 0; doesn't match, and we fall through to drop.

We can't talk.

So we think we need to use something like:

thisRuleMatches = (rules[rn].t >> 7) ^ 1; 

There many cases, not just in the tags processing, where we hard code thisRuleMatches to either true or false:

case ZT_NETWORK_RULE_MATCH_IPV4_SOURCE:
    if ((etherType == ZT_ETHERTYPE_IPV4)&&(frameLen >= 20)) {
        thisRuleMatches = (uint8_t)(InetAddress((const void *)&(rules[rn].v.ipv4.ip),4,rules[rn].v.ipv4.mask).containsAddress(InetAddress((const void *)(frameData + 12),4,0)));
    } else {
        thisRuleMatches = 0;
    }
    break;

Which we think if we're in a rule with a not modifier, it will be wrong.

We implemented both of the mentioned changes and tag based rulesets work better.

Test cases

Here are some of the rulesets we've tested:

tag role
  id 13
  default 0
  flag 0 foo
;
# dummy rules
# accept ethertype wol;  # 37 01
# drop ipprotocol sctp; # 36 00

# accept tand role 1;
# drop;

# drop tand role 0;
# accept;

# drop not tand role 1;
# accept;

accept not tand role 0;
drop;

#accept not treq role 0 and not ipprotocol icmp4;
#drop; 

#accept treq role 1 ;
#drop; 

#accept not tseq role 0;
#drop;

Bugs in other types of matches

This leads us to bugs in other matches

accept ethertype arp;
drop not ipsrc 10.11.12.1/24;
accept;

ipsrc <ipv4 address> gets evaluated in ZT_NETWORK_RULE_MATCH_IPV4_DEST

            case ZT_NETWORK_RULE_MATCH_IPV4_DEST:
                if ((etherType == ZT_ETHERTYPE_IPV4)&&(frameLen >= 20)) {
                    thisRuleMatches = (uint8_t)(InetAddress((const void *)&(rules[rn].v.ipv4.ip),4,rules[rn].v.ipv4.mask).containsAddress(InetAddress((const void *)(frameData + 16),4,0)));
                } else {
                    thisRuleMatches = 0;
                }
                break;

All IPv6 traffic should work with this rule, we think. It's not IPv4 so it's intended to be skipped by the thisRuleMatches = 0. But because it's a not rule, the 0 gets inverted.

accept ethertype arp; is needed for the desired IPv4 traffic to work as well.

Another example:

Should this accept non-ipv4 traffic? It does... ping6 and arp (conveniently) work. We think the intention is that not ipsrc only match on ipv4 traffic.

accept not ipsrc 10.13.13.0/24;
drop;

One more example

sport/dport consider only ipv4 or 6 ethertypes. So this rule (maybe conveniently) accepts ARP. Pinging works on this network:

accept not sport 1234;
drop;

But ping does not work here:

drop not sport 1234;
accept;

Others

There are a few other hard coded thisRuleMatches = 0; branches in the evaluater that might cause trouble, we just haven't thought through the rest of them.

List of possibly affected matches

Possibly this only applies when used with a not modifier.

In rules language the affected matches are:

In c++ they are:

A branch with a potential fix

Using this issue for discussion for now, but will make a PR to discuss code eventually

https://github.com/zerotier/ZeroTierOne/tree/tl-tags-3

Breaking changes

Fixing this is a "breaking" change. Though for probably a small amount of users. They might have to adjust rule sets on their networks if any clients get updated to a fixed version. We can dig through the Central database for rulesets that use the above rules to get a better idea.

It might not be breaking to fix only the Tags portion of this. It would just make an inconsistent behavior more consistent. Nothing that is consistently blocked would become allowed or vice versa.

laduke commented 10 months ago

Trying write a simpler the explanation of the impact of this:

If your rule uses a not modifier, and one of the affected matches (list above), then the rule evaluator will match on non-IP ethertypes.

The list of ethertypes (that we handle) is: ipv4, arp, ipv6, wol (wake on LAN), rarp, atalk, aarp, ipx_a, ipx_b complete list Most of these are rare, and the default ruleset blocks most ethertypes:

drop
    not ethertype ipv4
    and not ethertype arp
    and not ethertype ipv6
    ;

Some rules may be blocking or allowing ARP inadvertently. Blocking ARP will make IPv4 traffic very unreliable. Accidentally accepting all ARP is probably fine.

The ipsrc/dest rules could be problematic on dual stack networks. But if you have a dual stack network, why would use ipsrc/dest? You'd have to handle every node twice or more.

Summary of each rule: