opnsense / src

OPNsense operating system on top of FreeBSD
https://opnsense.org/
Other
367 stars 154 forks source link

MFC r360903: pf: Don't allocate per-table entry counters unless required. #107

Closed cbrueffer closed 3 years ago

cbrueffer commented 3 years ago

FreeBSD r345177 moved pf stats to Counter variables. This introduced some CPU load and system stability issues in SMP environments with large pf tables which were fixed in a followup commit (HEAD r360903, r361451 in 12-STABLE) and could only be worked around by disabling SMP.

OPNsense contains the original pf Counter commit, but not the subsequent fix.

pfSense forum thread where the symptoms around this issue were originally described: https://forum.netgate.com/topic/149595/2-4-5-a-20200110-1421-and-earlier-high-cpu-usage-from-pfctl/71?lang=en-US

fichtner commented 3 years ago

@cbrueffer thanks a lot! the original commit went to releng/12.1 but not the fix. It's only on releng/12.2. See b9315bd38115

Not sure if this makes 21.1.3 next week, but if you want I can provide a test kernel today.

Cheers, Franco

cbrueffer commented 3 years ago

That would be great @fichtner, thanks!

fichtner commented 3 years ago

Here you go:

# opnsense-update -zbkr 21.7.a_40

If it checks out please close the issue and we will ponder about the backport urgency internally.

Thanks, Franco

cbrueffer commented 3 years ago

Thanks! The issues I'm seeing are a bit unpredictable, so it may take a few days before I can reliably say whether it helped.

fichtner commented 3 years ago

No problem :)

On 3. Mar 2021, at 15:04, Christian Brueffer notifications@github.com wrote:

 Thanks! The issues I'm seeing are a bit unpredictable, so it may take a few days before I can reliably say whether it helped.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

cbrueffer commented 3 years ago

Stupid question, but:

root@gw:~ # opnsense-update -bkr 21.7.a_40
Fetching base-21.7.a_40-amd64.txz: .. failed, no signature found
root@gw:~ # opnsense-update -bikr 21.7.a_40
Fetching base-21.7.a_40-amd64.txz: .. failed, no update found

What's the best way to make this work?

AdSchellevis commented 3 years ago

remove the b :) [-kr]

cbrueffer commented 3 years ago

Same thing:

root@gw:~ # opnsense-update -kr 21.7.a_40
Fetching kernel-21.7.a_40-amd64.txz: .. failed, no signature found
root@gw:~ # opnsense-update -ikr 21.7.a_40
Fetching kernel-21.7.a_40-amd64.txz: .. failed, no update found
AdSchellevis commented 3 years ago

ok, it looks like their published to snapshots (both base and kernel). can you try:

opnsense-update -bkzr 21.7.a_40
cbrueffer commented 3 years ago

That appears to be working; thanks!

fichtner commented 3 years ago

Sorry, I do not heed my own safeguard additions. Ad is correct, -z is used to select snapshots which this is... :)

cbrueffer commented 3 years ago

The patch unfortunately hasn't solved my specific problem, but it also hasn't been detrimental.

Considering the severe symptoms for some people described in the pfSense forum thread it may be good to include in 21.1.3 nontheless.

fichtner commented 3 years ago

TBH, we haven't seen the issues described there and 2.4.5 is not even 12.1 so it may have been another issue with the backport to 11 maybe? It doesn't look like a smooth sail to 21.1.3 if it adds no value.

What symptoms are you experiencing? Since 20.7 I guess? Or 21.1? It's not clear from the report...

Cheers, Franco

cbrueffer commented 3 years ago

The problem is described in https://forum.opnsense.org/index.php?topic=21145.0; basically I'm seeing recurrent 30-50 second network outages on one APU2D4 igb(4) interface carrying three VLANs. While I'm not 100% sure our OPNsense router is at fault, it does increasingly look like it (no other colo customers experience this problem).

I've had the first reports when I was using 20.7.4, but it may have occurred before that. Some of the symptoms described in the pfSense thread sounded similar to what I'm seeing, which brought me to this patch.

Like I wrote on the forum, I'm a bit suspicious of the iflib'ified igb(4). There have been several bugfixes in that area not currently in OPNsense, so my weekend project is testing those and seeing how it looks. For easier testing, is there anything in OPNsense that likely wouldn't work with a stock FreeBSD 12-STABLE kernel?

Edit: I should note that I used 19.7.X and 20.1.X on the same hardware and setting without problems.

cbrueffer commented 3 years ago

This can be closed, OPNsense was not at fault for the mentioned VLAN issue.

fichtner commented 3 years ago

Thanks for the follow up! 😊