Open m-kratochvil opened 1 year ago
I am actually failing to reproduce the behavior in qa-de-1, I don't have a good test setup with multiple pool members. I will have to gather some more data from BTP for the actual scenario where this reportedly caused the proxy failure..
Amazing find, thanks! Would it make sense (in the interest of keeping the number of persistence records in check) to have a default persistence mask of 255.255.255.0
? This is probably the most commonly used subnet prefix (I did not confirm this assumption yet though).
Possibly yes, this was also a suggestion from BTP. I will discuss this topic with our F5 rep in CW20, especially how we can reliably test and reproduce the behavior I mentioned earlier, and to have a solid understanding on the implications the mask value can have on the system. Then we'll have a better base to make a decision on the value, or whether this is actually needed at all.
Agreed with Kai Streubel from BTP network he will setup a test environment and try to reproduce the issue as closely as possible to the way it surfaced in the actual BTP production setup. ETA end of May 2023
Kai has finished testing on the BTP side and could also not reproduce the behavior described above. On-going discussion with F5, test results provided, F5 has opposite view on the behavior, further discussion pending.
After further discussion and testing with F5 it was concluded the mask setting none
does NOT implicitly set mask value 0.0.0.0
as initially thought and as documented in the F5 bug https://cdn.f5.com/product/bugtracker/ID752933.html
Out tests show that the resulting persistence record works with single client IP instead, as if the mask none
resulted in an actual mask of 255.255.255.255
.
However, statement from F5 on the default none
mask setting is:
In general I think ambiguity should be avoided and “none” is probably a bad default value. I’d therefore think it would be good to explicitly configure the desired mask if possible.
And also:
Your config refers the “F5 default profile”, I’d recommend to create a CC default profile and modify the mask in there (not touching the actual default profiles).
It's a valid recommendation, I will see if I can open PR for these.
Contrary to common expectations, the Big-IP implementation of source IP persistence by default doesn't apply per each source IP, but per source IP range, defined by configured mask. This mask can be set to
255.255.255.255
and thus have the "per-single-ip" behavior but it can also be set to a larger range and cover for example whole subnets instead.The problem I just recently discovered that the default TMOS source-address persistence mask setting of
none
does not imply a255.255.255.255
mask, but instead the "catch-all"0.0.0.0
mask (confirmed by F5). This in turn creates single persistence record that matches every client, no matter what IP they come from, and persists the connection to the one and only pool member, that was initially selected by the load balancing algo. This causes potentially disastrous behavior with pools that see large amount of clients coming in, such as proxy setups, where the selected pool member can be quickly overwhelmed with traffic. Plus it is likely misleading to the users that rightfully expect source IP persistence to be per single client IP.One example where this actually happened is this BTP proxy setup: https://dashboard.ap-au-1.cloud.sap/neo/neo-ap-au-1-factoryap1-iel-a01/lbaas2/?r=/loadbalancers/e819ec77-09a9-4f08-b631-fc79192fd1be/show When source IP persistence was enabled, the first pool member was utterly destroyed by the incoming amount of connections.
Fix We should explicitly set the initial default
mask
value to 255.255.255.255 to enable one persistence record per client IP, but at the same time have this setting configurable, for use-cases such as the proxy setup mentioned above, because we also don't want tens of thousands of persistence records filling up the Big-IP memory. This of course requires customers that are about to deploy such setup to either know what they are doing or consult us prior to such deployment (both questionable).Octavia The source IP persistence
mask
setting seems supported in Octavia by thepersistence_granularity
attribute (needs confirmation), I see thef5-provider-driver
implementation in https://github.com/sapcc/octavia-f5-provider-driver/blob/d3ccf2bebf9910c2a663a2d56681dcff9dd4c733/octavia_f5/restclient/as3objects/persist.py#L27-L29AS3 We are setting
"persistenceMethods": "source-address"
in an AS3 declaration which points to the default persistence profile/Common/source_addr
, that comes with the Big-IP. This is simple and effort-less, but can't be used if we want to have themask
setting configurable.Current AS3 declaration snippet:
The target AS3 declaration (relevant part only) with custom
source-address
persistence:This way we can accept arbitrary values for the
mask
. The255.255.255.255
should be used as default value if not specified.