linuxppc / issues

Issues repository for linuxppc
5 stars 0 forks source link

ppc32 MPC8248 __sync_bool_compare_and_swap hangs with kernel v5.2.1 #258

Closed dcrawford1 closed 2 years ago

dcrawford1 commented 5 years ago

We are testing the v5.2.1 kernel on a MPC8248 board. Everything works, except nginx hangs when it calls __sync_bool_compare_and_swap. We are using GCC 7.4.0 with musl 1.1.21.

The same nginx binary works fine with a 4.19.26 kernel. I am not sure where to start looking.

mpe commented 5 years ago

Hi @dcrawford1, sorry I don't have any idea what the problem could be there. I'm the only person who really looks at these github issues, so you should probably post a full description of your problem to linuxppc-dev@lists.ozlabs.org and hopefully you can get the attention of someone from NXP.

chleroy commented 5 years ago

Please provide your .config

Does the following app also hangs ?

#include <stdio.h>

int main(void)
{
    int r;
    int x = 7;

    r = __sync_bool_compare_and_swap(&x, 7, 9);

    printf("%d %d\n", r, x);

    r = __sync_bool_compare_and_swap(&x, 7, 10);

    printf("%d %d\n", r, x);
}
dcrawford1 commented 5 years ago

Your test program runs fine with my 5.2 kernel. After digging deeper, I don't think __sync_bool_compare_and_swap is the problem. It looks like the issue is related to something with mmap using MAP_ANON|MAP_SHARED.

my nginx runs until it hits this line: https://github.com/nginx/nginx/blob/144242b033e476e9644c760b240ff3e561eba612/src/event/ngx_event.c#L564 and then it waits indefinitely. That ngx_atomic_cmp_set is defined as __sync_bool_compare_and_swap. But, even if I replace that line with *ngx_connection_counter = 1; nginx will still wait indefinitely at that point. ngx_connection_counter is a pointer into a shared memory area created by mmap with MAP_ANON|MAP_SHARED. Interestingly, if I change the mmap call in ngx_shmem.c to use MAP_PRIVATE then nginx runs to completion (nginx will not behave correctly since it needs MAP_SHARED to share data between master and worker processes).

So, an unmodified nginx binary blocks at that line 564 in ngx_event.c. But, the exact same binary runs fine on a 4.19 kernel.

I tried running 5.0 and 5.1 kernels but, the kernel would always freeze right before "starting /sbin/init" which is the same behavior with the 5.2 kernel with CONFIG_PPC_KUAP enabled.

dcrawford1 commented 5 years ago

Here is my defconfig:

CONFIG_SYSVIPC=y
# CONFIG_CROSS_MEMORY_ATTACH is not set
CONFIG_HIGH_RES_TIMERS=y
CONFIG_PREEMPT=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_CGROUPS=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
# CONFIG_SGETMASK_SYSCALL is not set
# CONFIG_SYSFS_SYSCALL is not set
# CONFIG_FHANDLE is not set
# CONFIG_BUG is not set
# CONFIG_BASE_FULL is not set
# CONFIG_ADVISE_SYSCALLS is not set
# CONFIG_MEMBARRIER is not set
# CONFIG_KALLSYMS is not set
# CONFIG_RSEQ is not set
CONFIG_EMBEDDED=y
# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_SLUB_DEBUG is not set
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB_MERGE_DEFAULT is not set
# CONFIG_PPC_KUEP is not set
# CONFIG_PPC_KUAP is not set
CONFIG_PANIC_TIMEOUT=1
# CONFIG_PPC_CHRP is not set
# CONFIG_PPC_PMAC is not set
CONFIG_PPC_82xx=y
CONFIG_EB8248=y
CONFIG_HZ_100=y
CONFIG_PM=y
# CONFIG_SECCOMP is not set
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
CONFIG_PARTITION_ADVANCED=y
# CONFIG_MQ_IOSCHED_DEADLINE is not set
# CONFIG_MQ_IOSCHED_KYBER is not set
CONFIG_ZSMALLOC=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_SYN_COOKIES=y
# CONFIG_IPV6 is not set
CONFIG_BRIDGE=y
# CONFIG_BRIDGE_IGMP_SNOOPING is not set
CONFIG_BRIDGE_VLAN_FILTERING=y
CONFIG_VLAN_8021Q=y
# CONFIG_WIRELESS is not set
CONFIG_UEVENT_HELPER=y
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
# CONFIG_STANDALONE is not set
# CONFIG_PREVENT_FIRMWARE_BUILD is not set
# CONFIG_ALLOW_DEV_COREDUMP is not set
CONFIG_MTD=y
CONFIG_MTD_BLOCK=y
CONFIG_MTD_CFI=y
CONFIG_MTD_CFI_ADV_OPTIONS=y
CONFIG_MTD_CFI_GEOMETRY=y
CONFIG_MTD_CFI_I4=y
CONFIG_MTD_CFI_AMDSTD=y
CONFIG_MTD_PHYSMAP=y
CONFIG_MTD_PHYSMAP_OF=y
CONFIG_MTD_PLATRAM=y
CONFIG_MTD_M25P80=y
CONFIG_MTD_SST25L=y
CONFIG_MTD_SPI_NOR=y
# CONFIG_MTD_SPI_NOR_USE_4K_SECTORS is not set
CONFIG_ZRAM=m
CONFIG_BLK_DEV_LOOP=y
CONFIG_EEPROM_AT25=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
# CONFIG_SCSI_LOWLEVEL is not set
CONFIG_NETDEVICES=y
# CONFIG_NET_VENDOR_ALACRITECH is not set
# CONFIG_NET_VENDOR_AMAZON is not set
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
# CONFIG_NET_VENDOR_AURORA is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
# CONFIG_NET_VENDOR_CADENCE is not set
# CONFIG_NET_VENDOR_CAVIUM is not set
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
CONFIG_FS_ENET=y
# CONFIG_FS_ENET_HAS_SCC is not set
CONFIG_FS_ENET_MDIO_FCC=y
# CONFIG_NET_VENDOR_HUAWEI is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MELLANOX is not set
# CONFIG_NET_VENDOR_MICREL is not set
# CONFIG_NET_VENDOR_MICROCHIP is not set
# CONFIG_NET_VENDOR_MICROSEMI is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NI is not set
# CONFIG_NET_VENDOR_QUALCOMM is not set
# CONFIG_NET_VENDOR_RENESAS is not set
# CONFIG_NET_VENDOR_ROCKER is not set
# CONFIG_NET_VENDOR_SAMSUNG is not set
# CONFIG_NET_VENDOR_SEEQ is not set
# CONFIG_NET_VENDOR_SOLARFLARE is not set
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
CONFIG_NATIONAL_PHY=y
# CONFIG_USB_NET_DRIVERS is not set
# CONFIG_WLAN is not set
# CONFIG_INPUT is not set
# CONFIG_SERIO is not set
# CONFIG_VT is not set
# CONFIG_LEGACY_PTYS is not set
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_DEVMEM is not set
CONFIG_SERIAL_CPM=y
CONFIG_SERIAL_CPM_CONSOLE=y
CONFIG_SERIAL_SC16IS7XX=y
# CONFIG_SERIAL_SC16IS7XX_I2C is not set
CONFIG_SERIAL_SC16IS7XX_SPI=y
CONFIG_SERIAL_Z85C30=y
CONFIG_HW_RANDOM=y
# CONFIG_NVRAM is not set
CONFIG_I2C=y
# CONFIG_I2C_COMPAT is not set
CONFIG_I2C_CHARDEV=y
# CONFIG_I2C_HELPER_AUTO is not set
CONFIG_I2C_CPM=y
CONFIG_SPI=y
CONFIG_SPI_FSL_SPI=y
CONFIG_SPI_SPIDEV=y
CONFIG_PPS=y
CONFIG_PPS_CLIENT_LDISC=y
CONFIG_GPIO_SYSFS=y
# CONFIG_HWMON is not set
# CONFIG_LCD_CLASS_DEVICE is not set
# CONFIG_BACKLIGHT_CLASS_DEVICE is not set
CONFIG_USB=m
CONFIG_USB_ISP116X_HCD=m
CONFIG_USB_STORAGE=m
CONFIG_MMC=m
# CONFIG_PWRSEQ_EMMC is not set
CONFIG_MMC_SPI=m
CONFIG_RTC_CLASS=y
# CONFIG_RTC_HCTOSYS is not set
# CONFIG_RTC_SYSTOHC is not set
# CONFIG_RTC_NVMEM is not set
CONFIG_RTC_DRV_PCF8563=y
# CONFIG_VIRTIO_MENU is not set
# CONFIG_IOMMU_SUPPORT is not set
CONFIG_GENERIC_PHY=y
CONFIG_EXT4_FS=y
# CONFIG_MANDATORY_FILE_LOCKING is not set
# CONFIG_DNOTIFY is not set
# CONFIG_INOTIFY_USER is not set
CONFIG_OVERLAY_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_PROC_CHILDREN=y
CONFIG_TMPFS=y
CONFIG_TMPFS_XATTR=y
CONFIG_JFFS2_FS=y
CONFIG_JFFS2_SUMMARY=y
CONFIG_JFFS2_FS_XATTR=y
# CONFIG_JFFS2_FS_POSIX_ACL is not set
# CONFIG_JFFS2_FS_SECURITY is not set
CONFIG_JFFS2_COMPRESSION_OPTIONS=y
# CONFIG_JFFS2_ZLIB is not set
# CONFIG_JFFS2_RTIME is not set
CONFIG_JFFS2_CMODE_NONE=y
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_FILE_DIRECT=y
CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y
# CONFIG_SQUASHFS_ZLIB is not set
CONFIG_SQUASHFS_LZO=y
CONFIG_SQUASHFS_ZSTD=y
CONFIG_MINIX_FS=y
# CONFIG_NETWORK_FILESYSTEMS is not set
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_UTF8=y
CONFIG_CRYPTO_AUTHENC=y
CONFIG_CRYPTO_ECDH=y
CONFIG_CRYPTO_LZO=y
# CONFIG_CRYPTO_HW is not set
CONFIG_CRC_CCITT=y
CONFIG_CRC_ITU_T=y
CONFIG_CRC7=y
CONFIG_PRINTK_TIME=y
# CONFIG_ENABLE_MUST_CHECK is not set
CONFIG_STRIP_ASM_SYMS=y
# CONFIG_SECTION_MISMATCH_WARN_ONLY is not set
CONFIG_PANIC_ON_OOPS=y
# CONFIG_SCHED_DEBUG is not set
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_FTRACE is not set
# CONFIG_RUNTIME_TESTING_MENU is not set
dcrawford1 commented 5 years ago

Here is the boot console:

## Loading kernel from FIT Image at fc040000 ...
   Using 'revA@1' configuration
   Trying 'kernel@1' kernel subimage
     Description:  Linux kernel
     Type:         Kernel Image
     Compression:  uncompressed
     Data Start:   0xfc0400e4
     Data Size:    4636544 Bytes = 4.4 MiB
     Architecture: PowerPC
     OS:           Linux
     Load Address: 0x00000000
     Entry Point:  0x00000000
## Loading fdt from FIT Image at fc040000 ...
   Using 'revA@1' configuration
   Trying 'fdt@1' fdt subimage
     Description:  Flattened Device Tree blob
     Type:         Flat Device Tree
     Compression:  uncompressed
     Data Start:   0xfc4ac114
     Data Size:    8606 Bytes = 8.4 KiB
     Architecture: PowerPC
   Booting using the fdt blob at 0xfc4ac114
   Loading Kernel Image ... OK
   Loading Device Tree to 007fa000, end 007ff19d ... OK
[    0.000000] Activating Kernel Userspace Access Protection
[    0.000000] Linux version 5.2.2 (dev@ubuntu) (gcc version 7.4.0 (crosstool-NG 1.24.0-rc1.3-0ecfe4f)) #1 PREEMPT Fri Aug 16 19:25:34 UTC 2019
[    0.000000] Using Intelight EB8248 machine description
[    0.000000] -----------------------------------------------------
[    0.000000] phys_mem_size     = 0x4000000
[    0.000000] dcache_bsize      = 0x20
[    0.000000] icache_bsize      = 0x20
[    0.000000] cpu_features      = 0x0000000001010008
[    0.000000]   possible        = 0x000000002f7ff149
[    0.000000]   always          = 0x0000000001000000
[    0.000000] cpu_user_features = 0x8c000000 0x00000000
[    0.000000] mmu_features      = 0x00010000
[    0.000000] Hash_size         = 0x0
[    0.000000] -----------------------------------------------------
[    0.000000] Top of RAM: 0x4000000, Total RAM: 0x4000000
[    0.000000] Memory hole size: 0MB
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x0000000003ffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000003ffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000003ffffff]
[    0.000000] On node 0 totalpages: 16384
[    0.000000]   Normal zone: 144 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 16384 pages, LIFO batch:3
[    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[    0.000000] pcpu-alloc: [0] 0
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 16240
[    0.000000] Kernel command line: console=ttyCPM0,115200 init=/sbin/init root=/dev/mtdblock2 loglevel=9
[    0.000000] Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
[    0.000000] Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
[    0.000000] Memory: 60192K/65536K available (3752K kernel code, 140K rwdata, 500K rodata, 132K init, 96K bss, 5344K reserved, 0K cma-reserved)
[    0.000000] Kernel virtual memory layout:
[    0.000000]   * 0xfffdf000..0xfffff000  : fixmap
[    0.000000]   * 0xfff95000..0xfffdf000  : early ioremap
[    0.000000]   * 0xc5000000..0xfff95000  : vmalloc & ioremap
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] rcu: Preemptible hierarchical RCU implementation.
[    0.000000]  Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
[    0.000000] random: get_random_u32 called from 0xc042ac04 with crng_init=0
[    0.000000] time_init: decrementer frequency = 25.000000 MHz
[    0.000000] time_init: processor frequency   = 400.000000 MHz
[    0.000028] clocksource: timebase: mask: 0xffffffffffffffff max_cycles: 0x5c40939b5, max_idle_ns: 440795202646 ns
[    0.000051] clocksource: timebase mult[28000000] shift[24] registered
[    0.000093] clockevent: decrementer mult[6666666] shift[32] cpu[0]
[    0.269739] printk: console [ttyCPM0] enabled
[    0.274082] pid_max: default: 4096 minimum: 301
[    0.278799] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
[    0.285509] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
[    0.296663] rcu: Hierarchical SRCU implementation.
[    0.302793] devtmpfs: initialized
[    0.314097] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.323999] futex hash table entries: 16 (order: -5, 192 bytes)
[    0.330753] NET: Registered protocol family 16
[    0.338274] timer@10d80: gtm block at (ptrval), 100000000Hz
[    0.379022] SCSI subsystem initialized
[    0.383165] pps_core: LinuxPPS API ver. 1 registered
[    0.388563] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    0.399408] clocksource: Switched to clocksource timebase
[    0.408652] NET: Registered protocol family 2
[    0.414611] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes)
[    0.422485] TCP established hash table entries: 1024 (order: 0, 4096 bytes)
[    0.429634] TCP bind hash table entries: 1024 (order: 0, 4096 bytes)
[    0.435975] TCP: Hash tables configured (established 1024 bind 1024)
[    0.442728] UDP hash table entries: 256 (order: 0, 4096 bytes)
[    0.448570] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
[    0.455237] NET: Registered protocol family 1
[    0.484180] workingset: timestamp_bits=14 max_order=14 bucket_order=0
[    0.510998] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.517528] jffs2: version 2.2. (NAND) (SUMMARY)  © 2001-2006 Red Hat, Inc.
[    0.534031] f0011a80.serial: ttyCPM0 at MMIO 0xc500ea80 (irq = 16, base_baud = 25000000) is a CPM UART
[    0.544465] f0011a90.serial: ttyCPM1 at MMIO 0xc5010a90 (irq = 21, base_baud = 25000000) is a CPM UART
[    0.554844] f0011a00.serial: ttyCPM2 at MMIO 0xc5014a00 (irq = 40, base_baud = 25000000) is a CPM UART
[    0.565196] f0011a40.serial: ttyCPM3 at MMIO 0xc5018a40 (irq = 42, base_baud = 25000000) is a CPM UART
[    0.575545] f0011a60.serial: ttyCPM4 at MMIO 0xc501ca60 (irq = 43, base_baud = 25000000) is a CPM UART
[    0.586227] fa400000.scc: ttyS0 at MMIO 0xfa400000 (irq = 25, base_baud = 460800) is a zs
[    0.595056] fa400000.scc: ttyS1 at MMIO 0xfa400002 (irq = 25, base_baud = 460800) is a zs
[    0.618816] loop: module loaded
[    0.628051] physmap-flash fc000000.flash: physmap platform flash device: [mem 0xfc000000-0xffffffff]
[    0.637336] fc000000.flash: Found 2 x16 devices at 0x0 in 32-bit bank. Manufacturer ID 0x000001 Chip ID 0x002201
[    0.647578] Amd/Fujitsu Extended Query Table at 0x0040
[    0.652769]   Amd/Fujitsu Extended Query version 1.3.
[    0.657810]   Advanced Sector Protection (PPB Locking) supported
[    0.663874] number of CFI chips: 1
[    0.668496] 9 fixed-partitions partitions found on MTD device fc000000.flash
[    0.675689] Creating 9 MTD partitions on "fc000000.flash":
[    0.681362] 0x000000000000-0x000000040000 : "hrcw"
[    0.688667] 0x000000040000-0x000000f80000 : "firmware_a"
[    0.696715] 0x0000004c0000-0x000000f80000 : "rootfs_a"
[    0.704573] 0x000000f80000-0x000001ec0000 : "firmware_b"
[    0.712541] 0x000001400000-0x000001ec0000 : "rootfs_b"
[    0.720276] 0x000001ec0000-0x000003f00000 : "opt"
[    0.727568] 0x000003f00000-0x000003f40000 : "u-boot"
[    0.735135] 0x000003f40000-0x000003f80000 : "reserved"
[    0.742957] 0x000003f80000-0x000004000000 : "u-boot_env"
[    0.751196] physmap-flash fa000000.sram: physmap platform flash device: [mem 0xfa000000-0xfa0fffff]
[    0.767119] m25p80 spi0.1: m25p80-nonjedec (1024 Kbytes)
[    0.776851] at25 spi0.2: 256 Byte at25 eeprom, pagesize 8
[    0.787925] fsl_spi f0011aa0.spi: at 0x(ptrval) (irq = 23), CPM2 mode
[    0.795375] libphy: Fixed MDIO Bus: probed
[    0.801324] eth0: fs_enet: 64:55:63:00:05:e4
[    0.807067] eth1: fs_enet: 64:55:63:00:85:e4
[    0.812364] libphy: CPM2 Bitbanged MII: probed
[    0.820154] i2c /dev entries driver
[    0.827072] rtc rtc0: invalid alarm value: 1970-01-13T38:20:00
[    0.833566] rtc-pcf8563 0-0051: registered as rtc0
[    0.839617] pps_ldisc: PPS line discipline registered
[    0.845085] NET: Registered protocol family 17
[    0.849969] 8021q: 802.1Q VLAN Support v1.8
[    0.854314] drmem: No dynamic reconfiguration memory found
[    0.866542] VFS: Mounted root (squashfs filesystem) readonly on device 31:2.
[    0.874988] devtmpfs: mounted

If CONFIG_PPC_KUAP is enabled the machine freezes at this point. If CONFIG_PPC_KUAP is disabled this is followed by:

[    0.218840] Freeing unused kernel memory: 132K
[    0.218856] This architecture does not have kernel memory protection.
[    0.218866] Run /sbin/init as init process
chleroy commented 5 years ago

Your test program runs fine with my 5.2 kernel. After digging deeper, I don't think __sync_bool_compare_and_swap is the problem. It looks like the issue is related to something with mmap using MAP_ANON|MAP_SHARED.

my nginx runs until it hits this line: https://github.com/nginx/nginx/blob/144242b033e476e9644c760b240ff3e561eba612/src/event/ngx_event.c#L564 and then it waits indefinitely. That ngx_atomic_cmp_set is defined as __sync_bool_compare_and_swap. But, even if I replace that line with *ngx_connection_counter = 1; nginx will still wait indefinitely at that point. ngx_connection_counter is a pointer into a shared memory area created by mmap with MAP_ANON|MAP_SHARED. Interestingly, if I change the mmap call in ngx_shmem.c to use MAP_PRIVATE then nginx runs to completion (nginx will not behave correctly since it needs MAP_SHARED to share data between master and worker processes).

I think MAP_SHARED mmap is one of the rare cases where _PAGE_RW is set without _PAGE_DIRTY, and it looks like this case has not been properly handled in DataStoreTLBMiss() since series https://patchwork.ozlabs.org/cover/1046032/

Could please try the change below ?

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index f255e22184b4..534dd2718795 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -557,9 +557,9 @@ DataStoreTLBMiss:
    cmplw   0,r1,r3
    mfspr   r2, SPRN_SPRG_PGDIR
 #ifdef CONFIG_SWAP
-   li  r1, _PAGE_RW | _PAGE_PRESENT | _PAGE_ACCESSED
+   li  r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
 #else
-   li  r1, _PAGE_RW | _PAGE_PRESENT
+   li  r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT
 #endif
    bge-    112f
    lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, use */
chleroy commented 5 years ago

Problem reproduced on an MPC8321 with the following app:

#include <sys/mman.h>

void main(void)
{
    volatile char *ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);

    *ptr = *ptr;
}

Proposed patch at https://patchwork.ozlabs.org/patch/1149054/

Please test both with and without KUAP to see it is also fixes the KUAP lockup or if it is something else.

dcrawford1 commented 5 years ago

That patches fixes the mmap bug. Now nginx runs fine.

I still have the problem with CONFIG_PPC_KUAP enabled, but that is a different issue.

chleroy commented 5 years ago

For the CONFIG_PPC_KUAP problem, we need to understand what the kernel is doing.

Could you try to retrieve the status of all threads/processes by using Sysrq command 't' from the console ? (You'll need CONFIG_MAGIC_SYSRQ for that).

It is likely that we are in a forever loop in handle_page_fault() as we were with the MAP_SHARED, but for another reason, on something than happens at the end of init.

chleroy commented 5 years ago

Hi Doug @dcrawford1, were you able to go further on this ? Any inputs ?

chleroy commented 4 years ago

Hi @dcrawford1 , any news ?

chleroy commented 4 years ago

Hi @dcrawford1 , could you see if the following patch fixes the issue ?

https://patchwork.ozlabs.org/patch/1176512/

dcrawford1 commented 4 years ago

I tried this patch, but get the same result. The kernel hangs right before calling /sbin/init. If I disable CONFIG_PPC_KUAP it starts fine. Tomorrow I can try to get more status with CONFIG_MAGIC_SYSRQ

dcrawford1 commented 4 years ago

I tried enabling CONFIG_MAGIC_SYSRQ, but the kernel is still unresponsive. I tried "alt" "prtscr" "t" but the kernel just appears frozen.

chleroy commented 4 years ago

Euh ... have you already successfully used sysrq on a working kernel ?

As far as I can see in the dmesg you dumped here, your console is console=ttyCPM0,115200

So I don't know what 'alt prtscr' does, but unless it sends a 'break' over the serial line, I'm not sure it will work. I'm also using serial console, with PuTTY client on windows, and there is a special action to send a 'break' on the line, an the action to be performed is 'break' 't'

dcrawford1 commented 4 years ago

I have CONFIG_MAGIC_SYSRQ enabled and it is working on the kernel (if CONFIG_PPC_KUAP is disabled). I can use the putty break command and "t" to dump current tasks.

Now, when I have CONFIG_PPC_KUAP the kernel reboots after CONFIG_PANIC_TIMEOUT seconds after printing [ 0.874988] devtmpfs: mounted from the log above.

I am not sure why previously it would hang at that point. But, now It is just rebooting.

I can spend more time debugging next week.

chleroy commented 4 years ago

I have identified a hang situation: if userspace access has been allowed for some segments and the kernel tries to access userspace not covered by those segments, it will loop forever in do_page_fault()

Will send a patch soon for this.

chleroy commented 4 years ago

Can you try with the patch https://patchwork.ozlabs.org/patch/1227375/

chleroy commented 4 years ago

Any news about this issue ? Does latest kernel still have it ?

chleroy commented 3 years ago

Any news on this ?

dcrawford1 commented 3 years ago

I have not really had time to look at this. I just tried the 5.9.1 kernel (with both KUEP and KUAP disabled). But, it crashes before executing init. I am not sure if the 5.9.1 kernel has the patch you posted. Here is the kernel log:

Environment size: 737/262139 bytes
=> setenv loglevel 7
=> setenv init /bin/sh
=> boot
## Loading kernel from FIT Image at fcf80000 ...
   Using 'host0@0' configuration
   Trying 'kernel@1' kernel subimage
     Description:  Linux kernel
     Type:         Kernel Image
     Compression:  uncompressed
     Data Start:   0xfcf800e4
     Data Size:    5471788 Bytes = 5.2 MiB
     Architecture: PowerPC
     OS:           Linux
     Load Address: 0x00000000
     Entry Point:  0x00000000
## Loading fdt from FIT Image at fcf80000 ...
   Using 'host0@0' configuration
   Trying 'fdt@1' fdt subimage
     Description:  Flattened Device Tree blob
     Type:         Flat Device Tree
     Compression:  uncompressed
     Data Start:   0xfd4b7fc0
     Data Size:    10161 Bytes = 9.9 KiB
     Architecture: PowerPC
   Booting using the fdt blob at 0xfd4b7fc0
   Loading Kernel Image ... OK
   Loading Device Tree to 007fa000, end 007ff7b0 ... OK
[    0.000000] Linux version 5.9.1 (dcrawford@NUMENOR) (powerpc-buildroot-linux-musl-gcc.br_real (Buildroot 2020.08-165-gf351cdf7cd) 7.5.0, GNU ld (GNU Binutils) 2.33.1) #1 PREEMPT Fri Oct 23 08:45:20 MST 2020
[    0.000000] Using Intelight EB8248 machine description
[    0.000000] -----------------------------------------------------
[    0.000000] phys_mem_size     = 0x4000000
[    0.000000] dcache_bsize      = 0x20
[    0.000000] icache_bsize      = 0x20
[    0.000000] cpu_features      = 0x0000000001010008
[    0.000000]   possible        = 0x00000000277de149
[    0.000000]   always          = 0x0000000001000000
[    0.000000] cpu_user_features = 0x8c000000 0x00000000
[    0.000000] mmu_features      = 0x00010000
[    0.000000] Hash_size         = 0x0
[    0.000000] -----------------------------------------------------
[    0.000000] ioremap() called early from cpm2_reset+0x18/0x48. Use early_ioremap() instead
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x0000000003ffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000003ffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000003ffffff]
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 16240
[    0.000000] Kernel command line: console=ttyCPM0,115200 init=/bin/sh root=/dev/mtdblock4 loglevel=7
[    0.000000] Dentry cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
[    0.000000] Inode-cache hash table entries: 4096 (order: 2, 16384 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 59376K/65536K available (3844K kernel code, 404K rwdata, 940K rodata, 156K init, 104K bss, 6160K reserved, 0K cma-reserved)
[    0.000000] Kernel virtual memory layout:
[    0.000000]   * 0xffbdf000..0xfffff000  : fixmap
[    0.000000]   * 0xffb9f000..0xffbdf000  : early ioremap
[    0.000000]   * 0xc5000000..0xffb9f000  : vmalloc & ioremap
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] rcu: Preemptible hierarchical RCU implementation.
[    0.000000] rcu:     RCU event tracing is enabled.
[    0.000000]  Trampoline variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
[    0.000000] random: get_random_u32 called from start_kernel+0x308/0x418 with crng_init=0
[    0.000028] clocksource: timebase: mask: 0xffffffffffffffff max_cycles: 0x5c40939b5, max_idle_ns: 440795202646 ns
[    0.000050] clocksource: timebase mult[28000000] shift[24] registered
[    0.248756] printk: console [ttyCPM0] enabled
[    0.253315] pid_max: default: 4096 minimum: 301
[    0.258328] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.265688] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.279042] rcu: Hierarchical SRCU implementation.
[    0.285069] devtmpfs: initialized
[    0.298907] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.308922] futex hash table entries: 16 (order: -5, 192 bytes, linear)
[    0.317073] NET: Registered protocol family 16
[    0.325632] timer@10d80: gtm block at (ptrval), 100000000Hz
[    0.353587] SCSI subsystem initialized
[    0.357808] pps_core: LinuxPPS API ver. 1 registered
[    0.363563] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    0.374749] clocksource: Switched to clocksource timebase
[    0.495419] NET: Registered protocol family 2
[    0.501171] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.509657] TCP established hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.517514] TCP bind hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.524665] TCP: Hash tables configured (established 1024 bind 1024)
[    0.531689] UDP hash table entries: 256 (order: 0, 4096 bytes, linear)
[    0.538407] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes, linear)
[    0.546032] NET: Registered protocol family 1
[    0.646884] workingset: timestamp_bits=14 max_order=14 bucket_order=0
[    0.670050] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.676351] jffs2: version 2.2. (NAND) (SUMMARY)  © 2001-2006 Red Hat, Inc.
[    0.950404] f0011a80.serial: ttyCPM0 at MMIO 0xc5077a80 (irq = 16, base_baud = 25000000) is a CPM UART
[    0.961464] f0011a90.serial: ttyCPM1 at MMIO 0xc5079a90 (irq = 21, base_baud = 25000000) is a CPM UART
[    0.972584] f0011a00.serial: ttyCPM2 at MMIO 0xc507da00 (irq = 40, base_baud = 25000000) is a CPM UART
[    0.983678] f0011a40.serial: ttyCPM3 at MMIO 0xc5081a40 (irq = 42, base_baud = 25000000) is a CPM UART
[    0.994726] f0011a60.serial: ttyCPM4 at MMIO 0xc5085a60 (irq = 43, base_baud = 25000000) is a CPM UART
[    1.006113] fa400000.scc: ttyS0 at MMIO 0xfa400000 (irq = 25, base_baud = 460800) is a zs
[    1.015138] fa400000.scc: ttyS1 at MMIO 0xfa400002 (irq = 25, base_baud = 460800) is a zs
[    1.030670] random: fast init done
[    1.037127] random: crng init done
[    1.056726] loop: module loaded
[    1.063224] physmap-flash fc000000.flash: physmap platform flash device: [mem 0xfc000000-0xffffffff]
[    1.072501] fc000000.flash: Found 2 x16 devices at 0x0 in 32-bit bank. Manufacturer ID 0x000001 Chip ID 0x002201
[    1.082972] Amd/Fujitsu Extended Query Table at 0x0040
[    1.088216]   Amd/Fujitsu Extended Query version 1.3.
[    1.093321]   Advanced Sector Protection (PPB Locking) supported
[    1.099580] number of CFI chips: 1
[    1.105218] 9 fixed-partitions partitions found on MTD device fc000000.flash
[    1.112222] Creating 9 MTD partitions on "fc000000.flash":
[    1.118154] 0x000000000000-0x000000040000 : "hrcw"
[    1.127100] 0x000000040000-0x000000f80000 : "firmware_a"
[    1.136424] 0x0000005c0000-0x000000f80000 : "rootfs_a"
[    1.145731] 0x000000f80000-0x000001ec0000 : "firmware_b"
[    1.155163] 0x000001500000-0x000001ec0000 : "rootfs_b"
[    1.164151] 0x000001ec0000-0x000003f00000 : "opt"
[    1.172825] 0x000003f00000-0x000003f40000 : "u-boot"
[    1.181792] 0x000003f40000-0x000003f80000 : "reserved"
[    1.190882] 0x000003f80000-0x000004000000 : "u-boot_env"
[    1.200429] physmap-flash fa000000.sram: physmap platform flash device: [mem 0xfa000000-0xfa0fffff]
[    1.371686] libphy: Fixed MDIO Bus: probed
[    1.377743] eth0: fs_enet: 64:55:63:00:05:e4
[    1.383911] eth1: fs_enet: 64:55:63:00:85:e4
[    1.390461] libphy: CPM2 Bitbanged MII: probed
[    1.400211] i2c /dev entries driver
[    1.408396] rtc rtc0: invalid alarm value: 1970-01-01T38:00:00
[    1.414600] rtc-pcf8563 0-0051: registered as rtc0
[    1.426046] pps_ldisc: PPS line discipline registered
[    1.431441] NET: Registered protocol family 17
[    1.436182] drmem: No dynamic reconfiguration memory found
[    1.446623] VFS: Mounted root (squashfs filesystem) readonly on device 31:4.
[    1.455577] devtmpfs: mounted
[    1.459700] Freeing unused kernel memory: 156K
[    1.463989] Kernel memory protection not selected by kernel config.
[    1.470541] BUG: Unable to handle kernel instruction fetch
[    1.470557] Faulting instruction address: 0xc03c0000
[    1.470575] Oops: Kernel access of bad area, sig: 11 [#1]
[    1.470588] BE PAGE_SIZE=4K PREEMPT Intelight EB8248
[    1.470596] Modules linked in:
[    1.470618] CPU: 0 PID: 1 Comm: swapper Not tainted 5.9.1 #1
[    1.470631] NIP:  c03c0000 LR: c0026ad4 CTR: c00097ac
[    1.470644] REGS: c501dc18 TRAP: 0400   Not tainted  (5.9.1)
[    1.470649] MSR:  20001032 <ME,IR,DR,RI>  CR: 2201d422  XER: 00000000
[    1.470677]
[    1.470677] GPR00: c0026ad4 c501dcd0 c3840000 00000000 3b5dc976 00000020 00019a21 00019a21
[    1.470677] GPR08: 00000000 00000000 00200140 c501dd40 8201d424 00000000 00002990 00000088
[    1.470677] GPR16: 00000000 c0540f00 c0540000 c0542f08 00000000 ffff8b5c c0540000 c0542f18
[    1.470677] GPR24: c0520000 00200140 c04d0000 c501dd80 00000000 00000202 c04d62c0 c04d6b60
[    1.470829] NIP [c03c0000] __do_softirq+0x30/0x298
[    1.470848] LR [c0026ad4] irq_exit+0x10/0x20
[    1.470856] Call Trace:
[    1.470874] [c501dd30] [c0026ad4] irq_exit+0x10/0x20
[    1.470911] [c501dd40] [c0009b44] timer_interrupt+0x1ec/0x1f8
[    1.470946] [c501dd70] [c0010440] ret_from_except+0x0/0x14
[    1.470986] --- interrupt: 901 at console_unlock+0x60/0x574
[    1.470986]     LR = console_unlock+0x4d0/0x574
[    1.471009] [c501de98] [c00544f8] vprintk_emit+0x1e8/0x208
[    1.471031] [c501ded8] [c0051200] printk+0x5c/0x84
[    1.471054] [c501df18] [c0004e54] kernel_init+0x30/0x100
[    1.471079] [c501df38] [c001016c] ret_from_kernel_thread+0x14/0x1c
[    1.471090] Instruction dump:
[    1.471102] 9421ffa0 7c0802a6 bde1001c 3f00c052 90010064 3f40c04d 82b8353c 8322005c
[    1.471138] 83ba62c0 3ab50001 572a0566 9142005c <81420000> 394a0100 91420000 3ee0c054
[    1.471181] ---[ end trace 1d3539007b6d0b96 ]---
[    1.471186]
[    2.471236] Kernel panic - not syncing: Fatal exception
[    2.646977] Rebooting in 1 seconds..
chleroy commented 3 years ago

Hi @dcrawford1

I have not been able to investigate that further until now. Time as passed since your last try, and several changes have been done that may impact your issue.

Could you make a new try with latest kernel, ie 5.12-rc8 ?

Thanks Christophe

chleroy commented 3 years ago

Hi @dcrawford1

I have not been able to investigate that further until now. Time as passed since your last try, and several changes have been done that may impact your issue.

Could you make a new try with latest kernel, ie 5.14-rc4 ? Or at least 5.13 ?

Thanks Christophe

chleroy commented 2 years ago

Hi @dcrawford1

What is the status on this problem ? Can it be closed ?

Thanks Christophe

dcrawford1 commented 2 years ago

This can be closed. I just tested kernel 5.4.189 on my MPC8248 with both KUEP and KUAP enabled and it works fine.

@chleroy, I should probably open a separate issue for this, but maybe you have some insight into this ethernet bandwidth regression I am seeing: on the 4.19 kernel when I run iperf3 I get about 92 Mb/s with iperf3 using about 95% cpu and ksoftirqd using around 1% cpu. When I run the same test on the 5.4 kernel I only get about 80 Mb/s and iperf3 uses about 80% cpu and ksoftirqd uses about 15% cpu.

dcrawford1 commented 2 years ago

A bisected the poor network performance to this commit (between linux 5.3 and 5.4): ac7c3e4ff401b30 compiler: enable CONFIG_OPTIMIZE_INLINING forcibly If I revert this then the kernel is a bit bigger, but the iperf3 bandwidth is back up to 93-95 Mb/s and ksoftirqd cpu usage is low.

The problem is that now with CONFIG_OPTIMIZE_INLINING disabled the 5.4.189 kernel hangs right before /sbin/init. Interestingly this only occurs if MEM_CG (memory cgroups) is enabled. I have traced this back to something related to this commit in 5.4.19: 844d2025b68d9 eventfd: track eventfd_signal() recursion depth But, I can't figure out how the eventfd change could be affected by CONFIG_OPTIMIZE_INLINING. Maybe a recursion problem?

mpe commented 2 years ago

It might be more fruitful to try and identify what about the inlining change causes the performance regression. In upstream we now always build with that inlining behavior. Are you able to get a perf trace of the iperf3 run? Or can you test iperf3 using a closer-to-mainline kernel?

dcrawford1 commented 2 years ago

I moved this to a new issue #406

chleroy commented 2 years ago

Thanks @dcrawford1