openwrt / packages

Community maintained packages for OpenWrt. Documentation for submitting pull requests is in CONTRIBUTING.md
GNU General Public License v2.0
3.96k stars 3.46k forks source link

squid: FATAL: assertion failed: mem/PageStack.cc:159: "StoredNode().is_lock_free()" when workers enabled #24469

Open codemarauder opened 3 months ago

codemarauder commented 3 months ago

Maintainer: @krant Environment: x86 (APU, SuperMicro, Generic), ramips (unielec u7621-06, Mikrotik RB750Gr3), 23.05.x and snapshot

Squid supports workers to run parallel kids on multiple cores to utilise SMP. It was working fine until squid version 4.x but stopped working with squid 5.x and squid 6.x on Openwrt.

To enable workers following changes are required:

/etc/squid/squid.conf:

workers 2
# optionally bind processes to specific cores:
cpu_affinity_map process_numbers=1,2 cores=1,3 

/etc/init.d/squid:

## replace -N with --foreground
procd_set_param command $PROG -s -f $CONFIGFILE --foreground 

With multiple kids, processes are seen as below:

# ps ax | grep squid
2997 pts/0    S+     0:00 grep squid
 8815 ?        S      0:04 /usr/sbin/squid -s -f /tmp/squid/squid.conf --foreground
 9132 ?        S      1:55 (squid-coord-3) --kid squid-coord-3 -s -f /tmp/squid/squid.conf --foreground
 9133 ?        S    114:19 (squid-2) --kid squid-2 -s -f /tmp/squid/squid.conf --foreground
 9134 ?        S    113:01 (squid-1) --kid squid-1 -s -f /tmp/squid/squid.conf --foreground

But with newer squid (5.x and 6.x) on Openwrt 23.05.0 onwards fail to start when workers are enabled (I haven't tested on 22.03.x as jumped to 23.05.x from 21.02.x):

Thu Jun 27 07:07:11 2024 daemon.notice squid[3357]: Processing Configuration File: /tmp/squid/squid.conf (depth 0)
Thu Jun 27 07:07:11 2024 daemon.notice squid[3357]: ERROR: 'pinger_enable' requires --enable-icmp
Thu Jun 27 07:07:12 2024 local4.notice squid[3357]: Created PID file (/var/run/squid.pid)
Thu Jun 27 07:07:12 2024 local4.warn squid[3357]: FATAL: assertion failed: mem/PageStack.cc:159: "StoredNode().is_lock_free()"

Below is the squid debug log with debug_options 54,9:

2024/06/27 07:10:41| Processing Configuration File: /tmp/squid/squid.conf (depth 0)
2024/06/27 07:10:41| ERROR: 'pinger_enable' requires --enable-icmp
2024/06/27 07:10:42| Created PID file (/var/run/squid.pid)
2024/06/27 07:10:42.385| 54,3| mem/Segment.cc(245) unlink: unlinked /squid-cf__metadata.shm segment
2024/06/27 07:10:42.385| 54,3| mem/Segment.cc(128) create: created /squid-cf__metadata.shm segment: 8
2024/06/27 07:10:42.385| 54,5| mem/Segment.cc(211) lock: mlock(2)-ing disabled
2024/06/27 07:10:42.386| 54,3| mem/Segment.cc(245) unlink: unlinked /squid-cf__queues.shm segment
2024/06/27 07:10:42.386| 54,3| mem/Segment.cc(128) create: created /squid-cf__queues.shm segment: 73912
2024/06/27 07:10:42.386| 54,5| mem/Segment.cc(211) lock: mlock(2)-ing disabled
2024/06/27 07:10:42.387| 54,3| mem/Segment.cc(245) unlink: unlinked /squid-cf__readers.shm segment
2024/06/27 07:10:42.387| 54,3| mem/Segment.cc(128) create: created /squid-cf__readers.shm segment: 56
2024/06/27 07:10:42.387| 54,5| mem/Segment.cc(211) lock: mlock(2)-ing disabled
2024/06/27 07:10:42.387| 54,7| Queue.cc(50) QueueReader: constructed ipcQR1
2024/06/27 07:10:42.388| 54,7| Queue.cc(50) QueueReader: constructed ipcQR2
2024/06/27 07:10:42.388| 54,7| Queue.cc(50) QueueReader: constructed ipcQR3
2024/06/27 07:10:42.388| 54,5| mem/PageStack.cc(129) IdSetMeasurements: rounded capacity up from 8192 to 8192
2024/06/27 07:10:42.388| 54,3| mem/Segment.cc(245) unlink: unlinked /squid-squid-page-pool.shm segment
2024/06/27 07:10:42.389| 54,3| mem/Segment.cc(128) create: created /squid-squid-page-pool.shm segment: 268437592
2024/06/27 07:10:42.389| 54,5| mem/Segment.cc(211) lock: mlock(2)-ing disabled
2024/06/27 07:10:42.389| 54,5| mem/PageStack.cc(129) IdSetMeasurements: rounded capacity up from 8192 to 8192
2024/06/27 07:10:42| FATAL: assertion failed: mem/PageStack.cc:159: "StoredNode().is_lock_free()"
brada4 commented 3 months ago

Your ramips have one CPU and the assert is that system pointers are not 64bit....

codemarauder commented 3 months ago

Your ramips have one CPU and the assert is that system pointers are not 64bit....

I have also posted it on squid mailing list and here is the reply:

https://lists.squid-cache.org/pipermail/squid-users/2024-June/026781.html

The assertion in question may be overreaching -- I suspect the relevant 
code works correctly when std::atomic<uint64_t>::is_lock_free() is 
false. Depending on how those locks are implemented in your environment, 
and how your traffic tickles them, SMP Squid without atomic locks might 
become very slow! We do not (and, IMO, should not) optimize performance 
for environments without lock-free atomics!

I see the following options for going forward:

* Comment out the assertion, void your warranty, and hope for the best.
* Audit relevant code to confirm that the assertion is safe to remove.
* Find a usable OS/environment that has lock-free 64-bit atomics.

> SMP worked fine with squid 4.13 on same architecture.

SMP Squid v4 has an ABA problem that could, at least in theory, result 
in silent cache corruption. If you are interested in low-level details, 
please see commit 7a5af8db message:
https://github.com/squid-cache/squid/commit/7a5af8db

For x86, it worked once I added the following to /etc/init.d/squid:

mkdir /var/run/squid && chown squid:squid /var/run/squid

neheb commented 3 months ago

mipsel doesn't have lock-free atomics?

brada4 commented 3 months ago

Likely \<int> and not as per c++ specs

neheb commented 3 months ago

Something tells me this is a result of mips16. mips32 should support lock free atomics.

codemarauder commented 3 months ago

I commented the assertion in the code in ipc/mem/PageStack.cc (line 159), re-compiled and am able to run squid in SMP mode with multiple workers on ramips.

I have yet to test the performance and see if it causes any other issues with IPC or otherwise.

codemarauder commented 3 months ago

Something tells me this is a result of mips16. mips32 should support lock free atomics.

Does that mean MT7621 is mips16 and not mips32?

neheb commented 3 months ago

no.

mips16 is compiled as default for space reasons. You can disable with

PKG_BUILD_FLAGS:=no-mips16
codemarauder commented 2 months ago

PKG_BUILD_FLAGS:=no-mips16

Adding the above to Makefile and recompiling didn't help. Still getting the following error:

Tue Jul  2 11:30:51 2024 daemon.notice squid[31589]: Processing Configuration File: /tmp/squid/squid.conf (depth 0)
Tue Jul  2 11:30:52 2024 local4.notice squid[31589]: Created PID file (/var/run/squid.pid)
Tue Jul  2 11:30:52 2024 local4.warn squid[31589]: FATAL: assertion failed: mem/PageStack.cc:159: "StoredNode().is_lock_free()"
brada4 commented 2 months ago

The assertion assumes 64bit lock id.

codemarauder commented 2 months ago

The assertion assumes 64bit lock id.

:-(

No other option than to comment the code then as suggested by Alex on squid-users list.

brada4 commented 2 months ago

Just change assertion to check for "int" lock id.

codemarauder commented 2 months ago

Just change assertion to check for "int" lock id.

Changed uint64_t to uint32_t in ipc/mem/PageStack.h at line 79.

It is starting with workers now. In code for version 4.x, there is no mention of uint64_t. It was introduced with 5.x.

No idea about the repercussions or instability. Need to test.

brada4 commented 2 months ago

The posix definition is that lock ID is native integer. It should be some conditional in ./sonfigure style portability of squid to run on (most) 32bit platforms.