stangri / source.openwrt.melmac.net

OpenWrt Packages
GNU General Public License v3.0
144 stars 47 forks source link

[pbr] issue: Regression from 1.1.1-7 to 1.1.3-25 - overlaps in `src_addr` arguments among `policy` entries results in cryptic `nft` failures #194

Closed posita closed 4 months ago

posita commented 8 months ago

With 1.1.3-25, the following will now result in collisions detectable via nft -c -f /var/run/pbr.nft:

Overlap of 192.168.1.0/24 between two rules:

config policy
    option name 'hulu.com'
    option src_addr '192.168.1.0/24'
    option dest_addr 'hulu.com'
    option interface 'wan'

config policy
    option name 'ticketmaster.com'
    option src_addr '192.168.1.0/24'
    option dest_addr 'ticketmaster.com'
    option interface 'wan'

Overlap of 192.168.1.0/24 between two rules, even though second rule includes another entry in src_addr:

config policy
    option name 'hulu.com'
    option src_addr '192.168.1.0/24'
    option dest_addr 'hulu.com'
    option interface 'wan'

config policy
    option name 'ticketmaster.com'
    option src_addr '192.168.1.0/24 192.168.2.0/24'
    option dest_addr 'ticketmaster.com'
    option interface 'wan'

The following results in the error going away, but this means that only one policy entry can contain a particular src_addr:

config policy
    option name 'hulu.com'
    option src_addr '192.168.1.0/24'
    option dest_addr 'hulu.com ticketmaster.com'
    option interface 'wan'

config policy
    option name 'ticketmaster.com'
    option src_addr '192.168.2.0/24'
    option dest_addr 'ticketmaster.com'
    option interface 'wan'

This appears to be a regression from prior versions (e.g., 1.1.1-7), where multiple rules could contain a particular src_addr. The prior allowance was helpful if one wanted to route traffic directed to a lot of different domains from particular network, and wanted to break those up into multiple rules. For example, one could once do this:

config policy
    option name 'domain1.dom'
    option src_addr '192.168.1.0/24'
    option dest_addr 'foo.domain1.dom bar.domain1.dom'
    option interface 'wan'

config policy
    option name 'domain2.dom'
    option src_addr '192.168.1.0/24 192.168.2.0/24'
    option dest_addr 'domain2.dom'
    option interface 'wan'

# ...

config policy
    option name 'domain20.dom'
    option src_addr '192.168.1.0/24 192.168.2.0/24'
    option dest_addr 'foo.domain20.dom bar.domain20.dom baz.domain20.dom'
    option interface 'wan'

Now one can only accommodate these one way, and it has to be in a single entry for each subnet:

config policy
    option name 'subnet-one'
    option src_addr '192.168.1.0/24'
    option dest_addr 'foo.domain1.dom bar.domain1.dom domain2.dom ... foo.domain20.dom bar.domain20.dom baz.domain20.dom'  # this gets really unruly with dozens of entries
    option interface 'wan'

config policy
    option name 'subnet-two'
    option src_addr '192.168.2.0/24'
    option dest_addr 'domain2.dom ... foo.domain20.dom bar.domain20.dom baz.domain20.dom'  # also, despite substantial overlap, these lists now need to be maintained completely independently
    option interface 'wan'

I don't believe this restriction is reflected in the documentation. It certainly doesn't result in an accessible or actionable error in pbr's own logging.

stangri commented 8 months ago

Overlap of 192.168.1.0/24 between two rules:

config policy
  option name 'hulu.com'
  option src_addr '192.168.1.0/24'
  option dest_addr 'hulu.com'
  option interface 'wan'

config policy
  option name 'ticketmaster.com'
  option src_addr '192.168.1.0/24'
  option dest_addr 'ticketmaster.com'
  option interface 'wan'

I can't reproduce, I've added those two policies to my test OpenWrt config and pbr started just fine in the nft_file mode.

posita commented 8 months ago

I can't reproduce, I've added those two policies to my test OpenWrt config and pbr started just fine in the nft_file mode.

Hmmm ... that's odd. I might have misunderstood when I received the error. What about with this?

config policy
    option name 'lan-to-wan'
    option src_addr '192.168.101.0/23'
    option dest_addr 'hulu.com'
    option interface 'wan'

config policy
    option name 'lan-guest-to-wan'
    option src_addr '192.168.101.0/23 192.168.102.0/23'
    option dest_addr 'espn.api.edge.bamgrid.com d2f2ekwwtg17a.cloudfront.net'
    option interface 'wan'

config policy
    option name 'to-vpn'
    option dest_addr 'espn.com'
    option interface 'vpn'  # or any VPN interface
stangri commented 8 months ago

Yeah, that fails. I don't understand why nft sees a collision there. But if you split the middle policy into two, this works:

config policy
    option name 'lan-guest-to-wan'
    option src_addr '192.168.100.0/23'
    option dest_addr 'espn.api.edge.bamgrid.com d2f2ekwwtg17a.cloudfront.net'
    option interface 'wan'

config policy
    option name 'lan-guest-to-wan2'
    option src_addr '192.168.102.0/23'
    option dest_addr 'espn.api.edge.bamgrid.com d2f2ekwwtg17a.cloudfront.net'
    option interface 'wan'
posita commented 8 months ago

I can see several options here:

  1. Do the requisite set manipulation in pbr to accommodate the use case (my favorite, since it preserves the ergonomics of the documented parameters);
  2. Detect the failure mode in pbr, intercept it, and notify the user how to correct it (possibly with a reference to updated docs providing a more thorough explanation);
  3. Enumerate known failure modes with nft in the docs, and when any nft failure occurs, detect it and direct the user there to diagnose on their own (not nearly as ergonomic as either of the above, but probably less time to implement);
  4. Only update the docs (my least favorite, since it requires the user to map the cryptic error to the appropriate place in the docs, and doesn't work well for people like me, who've experienced a degradation post upgrade); or
  5. Fix the problem in nft (probably not viable; that workflow looks awful to me; even submitting/searching for a bug looks unnecessarily cumbersome given that it's 2024).

By "set manipulation", I mean that pbr could read the example from my https://github.com/stangri/source.openwrt.melmac.net/issues/194#issuecomment-1904581610 and (de?)normalize it to entries without any overlap, like that in your https://github.com/stangri/source.openwrt.melmac.net/issues/194#issuecomment-1904966510.

stangri commented 8 months ago

You're welcome to contribute any improvement to both pbr and pbr docs on the matter.

Additionally it may be helpful to post the example of nft ip collision on the OpenWrt forum and if the nft gurus can suggest an easy way to create the nft file from this policy which would not be rejected by nft I'm game to implement it.

stangri commented 4 months ago

Closing due to lack of response.