sapcc / octavia-f5-provider-driver

Apache License 2.0
15 stars 2 forks source link

Source IP persistence mask #220

Open m-kratochvil opened 1 year ago

m-kratochvil commented 1 year ago

Contrary to common expectations, the Big-IP implementation of source IP persistence by default doesn't apply per each source IP, but per source IP range, defined by configured mask. This mask can be set to 255.255.255.255 and thus have the "per-single-ip" behavior but it can also be set to a larger range and cover for example whole subnets instead.

The problem I just recently discovered that the default TMOS source-address persistence mask setting of none does not imply a 255.255.255.255 mask, but instead the "catch-all" 0.0.0.0 mask (confirmed by F5). This in turn creates single persistence record that matches every client, no matter what IP they come from, and persists the connection to the one and only pool member, that was initially selected by the load balancing algo. This causes potentially disastrous behavior with pools that see large amount of clients coming in, such as proxy setups, where the selected pool member can be quickly overwhelmed with traffic. Plus it is likely misleading to the users that rightfully expect source IP persistence to be per single client IP.

One example where this actually happened is this BTP proxy setup: https://dashboard.ap-au-1.cloud.sap/neo/neo-ap-au-1-factoryap1-iel-a01/lbaas2/?r=/loadbalancers/e819ec77-09a9-4f08-b631-fc79192fd1be/show When source IP persistence was enabled, the first pool member was utterly destroyed by the incoming amount of connections.

Fix We should explicitly set the initial default mask value to 255.255.255.255 to enable one persistence record per client IP, but at the same time have this setting configurable, for use-cases such as the proxy setup mentioned above, because we also don't want tens of thousands of persistence records filling up the Big-IP memory. This of course requires customers that are about to deploy such setup to either know what they are doing or consult us prior to such deployment (both questionable).

Octavia The source IP persistence mask setting seems supported in Octavia by the persistence_granularity attribute (needs confirmation), I see the f5-provider-driver implementation in https://github.com/sapcc/octavia-f5-provider-driver/blob/d3ccf2bebf9910c2a663a2d56681dcff9dd4c733/octavia_f5/restclient/as3objects/persist.py#L27-L29

AS3 We are setting "persistenceMethods": "source-address" in an AS3 declaration which points to the default persistence profile /Common/source_addr, that comes with the Big-IP. This is simple and effort-less, but can't be used if we want to have the mask setting configurable.

Current AS3 declaration snippet:

  "listener_890e96da-1eb0-4544-a436-907283d094bf": {
    "virtualAddresses": [
      "10.180.88.99"
    ],
    "virtualPort": 22,
    "persistenceMethods": [
      "source-address"
    ],
  <omitted>

The target AS3 declaration (relevant part only) with custom source-address persistence:

  "lb_487e7950-d600-483c-9493-89206f2f338b": {
    "template": "generic",
          "label": "487e7950-d600-483c-9493-89206f2f338b",
          "class": "Application",
          "cc_sip_persit": {
            "class": "Persist",
            "addressMask": "255.255.255.255",
            "persistenceMethod": "source-address"
          },
          "listener_890e96da-1eb0-4544-a436-907283d094bf": {
            "virtualAddresses": [
              "10.180.88.99"
            ],
            "virtualPort": 22,
            "persistenceMethods": [
                { "use": "cc_sip_persit" }
            ],
  <omitted>

This way we can accept arbitrary values for the mask. The 255.255.255.255 should be used as default value if not specified.

m-kratochvil commented 1 year ago

I am actually failing to reproduce the behavior in qa-de-1, I don't have a good test setup with multiple pool members. I will have to gather some more data from BTP for the actual scenario where this reportedly caused the proxy failure..

BenjaminLudwigSAP commented 1 year ago

Amazing find, thanks! Would it make sense (in the interest of keeping the number of persistence records in check) to have a default persistence mask of 255.255.255.0? This is probably the most commonly used subnet prefix (I did not confirm this assumption yet though).

m-kratochvil commented 1 year ago

Possibly yes, this was also a suggestion from BTP. I will discuss this topic with our F5 rep in CW20, especially how we can reliably test and reproduce the behavior I mentioned earlier, and to have a solid understanding on the implications the mask value can have on the system. Then we'll have a better base to make a decision on the value, or whether this is actually needed at all.

m-kratochvil commented 1 year ago

Agreed with Kai Streubel from BTP network he will setup a test environment and try to reproduce the issue as closely as possible to the way it surfaced in the actual BTP production setup. ETA end of May 2023

m-kratochvil commented 1 year ago

Kai has finished testing on the BTP side and could also not reproduce the behavior described above. On-going discussion with F5, test results provided, F5 has opposite view on the behavior, further discussion pending.

m-kratochvil commented 1 year ago

After further discussion and testing with F5 it was concluded the mask setting none does NOT implicitly set mask value 0.0.0.0 as initially thought and as documented in the F5 bug https://cdn.f5.com/product/bugtracker/ID752933.html Out tests show that the resulting persistence record works with single client IP instead, as if the mask none resulted in an actual mask of 255.255.255.255.

However, statement from F5 on the default none mask setting is:

In general I think ambiguity should be avoided and “none” is probably a bad default value. I’d therefore think it would be good to explicitly configure the desired mask if possible.

And also:

Your config refers the “F5 default profile”, I’d recommend to create a CC default profile and modify the mask in there (not touching the actual default profiles).

It's a valid recommendation, I will see if I can open PR for these.