zerotier / ZeroTierOne

A Smart Ethernet Switch for Earth
https://zerotier.com
Other
14.52k stars 1.7k forks source link

Flow rule tags broken on 1.14.0 #2338

Open valiac opened 3 months ago

valiac commented 3 months ago

On nodes running zerotier-one v1.14.0, the following flow rule does not work as expected and blocks all zerotier traffic on the affected node

On nodes running v1.12.2 or older, the rule works as expected.

Flow rule:

tag gateway
  enum 1 true
  enum 0 false
  id 666
  default 0
;

break
  not tor gateway 1
  and not teq gateway 1
;

accept;

I've taken a quick glance at the code and other issues, and i suspect the error might have been introduced in https://github.com/zerotier/ZeroTierOne/commit/0bf67bf67ca9594b4539e3faf4f77f7d3afef625 which addresses https://github.com/zerotier/ZeroTierOne/issues/2200 but seems to have introduced other weird behavior.

Side note: Putting accept ethertype arp; at the top of the ruleset, fixes this issue. I don't think it is intended to not work without that rule, but correct me if I'm wrong. In my mind, a rule matching on tags should be applied correctly on all ethertypes.

- What you expect to be happening.

Traffic not blocked by the flow rule is passed ( so, traffic between 2 nodes with tag set to 1 and traffic between a node with tag set to 1 and a node with tag set to 0 )

- What is actually happening?

On nodes > version 1.14.0 no traffic is passed at all with the flow rule enabled. Only ARP requests are sent out over the ZT interface.

This happens regardless of whether...:

- Any steps to reproduce the error.

create a zerotier network and add 3 members:

Add the flow rule included at the start to the network.

Try to ping the ZT address of member A from member B and C. The flow rule allows both, but B -> A will not work.

- Any relevant console output or screenshots.

- What operating system and ZeroTier version. Please try the latest ZeroTier release.

Working nodes: OS: macos, arch Version: 1.12.2

Broken nodes: OS: windows, macos, arch, ubuntu version: 1.14.0

valiac commented 3 months ago

@someara This is the flow rule issue we talked about.

laduke commented 3 months ago

Thanks for writing this up.

If you have mixed versions, tag based rules need the arp rule. And before 1.14, if it worked without arp it was basically by luck.

Once 2 nodes decide not to talk to each other, it can take a while for them to start talking with each other after tags change. Leave and rejoin the network to possibly get a fresh start. Which node is initiating the communication can have an affect.

I'm not sure I'm understanding the example rule

a break not teq gateway 1; would require both nodes to have their tag set to 1 no?

I would use accept ethertype arp; while experimenting or if the network is mixed versions.

valiac commented 3 months ago

Thanks for getting back to me so quickly!

If you have mixed versions, tag based rules need the arp rule. And before 1.14, if it worked without arp it was basically by luck.

Noted. This is ok for us, since allowing ARP between any nodes is not really problematic from a security perspective, but it should probably be mentioned in the changelog/release for 1.14.0 as a breaking change / caveat.

Side note: i have also tested this with two nodes which are both on 1.14.0, and that does not seem to work without the arp rule either ( looking at a pcap, it's definitely due to ARP requests being blocked ) - is this also expected or is that indeed unexpected behavior?

I'm not sure I'm understanding the example rule a break not teq gateway 1; would require both nodes to have their tag set to 1 no?

On it's own it would, yes, but since it's ANDed with the previous condition not tor gateway 1 , the rule only breaks the communication when both conditions are true.

So based on the documentation the example rule should result in the following:

break if both values ORed together don't equal 1 AND both values are not the same

-> communication between 2 nodes with tag gateway set to 1 is allowed -> communication between 1 node with tag gateway set to 1 and one with tag gateway set to 0 is allowed -> communication between 2 nodes with tag gateway set to 0 is not allowed

And it does behave like that when both nodes are on 1.12.2 or the arp rule is added. So i think that part is fine.

laduke commented 3 months ago

I see. You want the gateways blocked from each other. Aside: I think you could use the txor match for that.

Will try to look into why teq or the combination of "not tor and not teq " is acting weird.