Open arturpzol opened 3 months ago
Hello Artur,
Artur Piechocki:
I have encountered an issue with the AUFS filesystem that occurs sporadically during system startup. The error message I receive is:
error while loading shared libraries: libc.so.6: cannot open shared object file: Input/output error
:::mount |grep aufs aufs on / type aufs (rw,relatime,sync,si=57b4c9ce733e61da,udba=none,nowarn_perm) while true; do mount -o remount / ; done
::: @sfjro , would you be able to run a remount loop on your setup with AUFS using
udba=none
and check if the issue occurs for you as well?
Thanx for the report. I have tried hundreds times on aufs6.x-rcN (for linux-v6.11-rc4) but could not reproduce. It may be related to my regular test environment whose root dir is not aufs.
I have some questions for you.
I need to know which process got that error. Busybox(mount), /bin/mount, or /sbin/mount.aufs? Could you try strace or something?
Also
J. R. Okajima
@sfjro thank you for attempting to reproduce the problem in your environment.
The issue I occasionally experience during system startup is very rare, and I have no way to retrieve any logs from that startup. However, the loop that remounts the main file system tends to trigger the problem relatively quickly. It's possible that these are two separate issues...
Answers to your questions:
- How did you mount your root aufs at the beginning? initramfs + busybox + swtich_root?
yes exacly, it is initramfs + busybox + swtich_root
- How may branches does your root aufs have?
The number of branches seems to be irrelevant here, as the problem occurs with both 3 and 10 branches, for example. However, in my case, I have this native configuration::
find /sys/fs/aufs/si_876424372235a759/*
/sys/fs/aufs/si_876424372235a759/br0
/sys/fs/aufs/si_876424372235a759/br1
/sys/fs/aufs/si_876424372235a759/br2
/sys/fs/aufs/si_876424372235a759/br3
/sys/fs/aufs/si_876424372235a759/br4
/sys/fs/aufs/si_876424372235a759/brid0
/sys/fs/aufs/si_876424372235a759/brid1
/sys/fs/aufs/si_876424372235a759/brid2
/sys/fs/aufs/si_876424372235a759/brid3
/sys/fs/aufs/si_876424372235a759/brid4
/sys/fs/aufs/si_876424372235a759/xi_path
cat /sys/fs/aufs/si_876424372235a759/*
/mnt/live/changes=rw
/mnt/live/images/ssh.lzm=rr+wh
/mnt/live/images/linux-6.10-oe64.lzm=rr+wh
/mnt/live/images/baseosx-apache.lzm=rr+wh
/mnt/live/images/baseos.lzm=rr+wh
64
68
67
66
65
/mnt/live/xino/.aufs.xino
- Which branch contains libc.so.6?
baseos.lzm
I need to know which process got that error. Busybox(mount), /bin/mount, or /sbin/mount.aufs? Could you try strace or something?
In the case of a bash loop that is executed and remounts /
, it uses /bin/mount
.
The initial problem that occurs during system startup is triggered by various commands. For example, I get:
INIT: Id "2" respawning too fast: disabled for 5 minutes
INIT: Id "3" respawning too fast: disabled for 5 minutes/sbin/getty: error while loading shared libraries:
libc.so.6: cannot open shared object file: Input/output error
/sbin/getty: error while loading shared libraries: libc.so.6: cannot open shared object file: Input/
/sbin/getty: error while loading shared libraries: libc.so.6: cannot open shared object file: Input/
output error
INIT: Id "4" respawning too fast: disabled for 5 minutes
INIT: Id "5" respawning too fast: disabled for 5 minutes
INIT: Id "6" respawning too fast: disabled for 5 minutes
- Are there any kernel log left?
No, there are no call traces or other errors in the kernel logs that would indicate the cause of the problem.
- Have you installed aufs-util.git on your system?
aufs-util.git is installed.
- Is your kernel patched by a linux distribution?
it is vanila 6.10.6 kernel with aufs 6.10-20240722 compiled by me.
- How is your aufs configuration (define/undefine CONFIG_AUFS_xxx)?
AUFS configuration:
cat /sys/fs/aufs/config
CONFIG_AUFS_FS=m
CONFIG_AUFS_BRANCH_MAX_127=y
CONFIG_AUFS_SBILIST=y
CONFIG_AUFS_HNOTIFY=y
CONFIG_AUFS_HFSNOTIFY=y
CONFIG_AUFS_EXPORT=y
CONFIG_AUFS_INO_T_64=y
CONFIG_AUFS_BDEV_LOOP=y
CONFIG_AUFS_DEBUG=y
CONFIG_AUFS_MAGIC_SYSRQ=y
- How is your LSM setting?
no any LSM settings
Strace from while true; do strace mount -o remount / ; done
...
...
execve("/bin/mount", ["mount", "-o", "remount", "/"], [/* 18 vars */]) = 0
brk(0) = 0x6daa000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f227000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=59821, ...}) = 0
mmap(NULL, 59821, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f058f211000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libmount.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\254\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=284096, ...}) = 0
mmap(NULL, 2383648, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058ec00000
mprotect(0x7f058ec43000, 2097152, PROT_NONE) = 0
mmap(0x7f058ee43000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x43000) = 0x7f058ee43000
mmap(0x7f058ee45000, 3872, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f058ee45000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20c\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=142728, ...}) = 0
mmap(NULL, 2246896, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058e800000
mprotect(0x7f058e821000, 2097152, PROT_NONE) = 0
mmap(0x7f058ea21000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x21000) = 0x7f058ea21000
mmap(0x7f058ea23000, 6384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f058ea23000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1738176, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f226000
mmap(NULL, 3844640, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058e400000
mprotect(0x7f058e5a1000, 2097152, PROT_NONE) = 0
mmap(0x7f058e7a1000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a1000) = 0x7f058e7a1000
mmap(0x7f058e7a7000, 14880, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f058e7a7000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libblkid.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\210\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=258688, ...}) = 0
mmap(NULL, 2358248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058e000000
mprotect(0x7f058e03b000, 2097152, PROT_NONE) = 0
mmap(0x7f058e23b000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3b000) = 0x7f058e23b000
mmap(0x7f058e23f000, 3048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f058e23f000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpcre.so.3", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\27\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=448440, ...}) = 0
mmap(NULL, 2543976, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058dc00000
mprotect(0x7f058dc6c000, 2097152, PROT_NONE) = 0
mmap(0x7f058de6c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6c000) = 0x7f058de6c000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\16\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=14664, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f225000
mmap(NULL, 2109712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058d800000
mprotect(0x7f058d803000, 2093056, PROT_NONE) = 0
mmap(0x7f058da02000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f058da02000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libuuid.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \26\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=18904, ...}) = 0
mmap(NULL, 2113952, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058d400000
mprotect(0x7f058d404000, 2093056, PROT_NONE) = 0
mmap(0x7f058d603000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f058d603000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320n\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=137384, ...}) = 0
mmap(NULL, 2213008, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058d000000
mprotect(0x7f058d018000, 2093056, PROT_NONE) = 0
mmap(0x7f058d217000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f058d217000
mmap(0x7f058d219000, 13456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f058d219000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f224000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f223000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f20f000
arch_prctl(ARCH_SET_FS, 0x7f058f20f840) = 0
mprotect(0x7f058e7a1000, 16384, PROT_READ) = 0
mprotect(0x7f058d217000, 4096, PROT_READ) = 0
mprotect(0x7f058d603000, 4096, PROT_READ) = 0
mprotect(0x7f058da02000, 4096, PROT_READ) = 0
mprotect(0x7f058de6c000, 4096, PROT_READ) = 0
mprotect(0x7f058e23b000, 12288, PROT_READ) = 0
mprotect(0x7f058ea21000, 4096, PROT_READ) = 0
mprotect(0x7f058ee43000, 4096, PROT_READ) = 0
mprotect(0x608000, 4096, PROT_READ) = 0
mprotect(0x7f058f220000, 4096, PROT_READ) = 0
munmap(0x7f058f211000, 59821) = 0
set_tid_address(0x7f058f20fb10) = 31590
set_robust_list(0x7f058f20fb20, 24) = 0
rt_sigaction(SIGRTMIN, {0x7f058d0069b0, [], SA_RESTORER|SA_SIGINFO, 0x7f058d00f890}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7f058d006a40, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f058d00f890}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
statfs("/sys/fs/selinux", 0x7ffcb7d1f500) = -1 ENOENT (No such file or directory)
statfs("/selinux", {f_type=0x61756673, f_bsize=4096, f_blocks=1013432, f_bfree=996590, f_bavail=940078, f_files=262144, f_ffree=255896, f_fsid={0, 0}, f_namelen=242, f_frsize=4096}) = 0
brk(0) = 0x6daa000
brk(0x6dcb000) = 0x6dcb000
open("/proc/filesystems", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f21f000
read(3, "nodev\tsysfs\nnodev\ttmpfs\nnodev\tbd"..., 1024) = 413
read(3, "", 1024) = 0
close(3) = 0
munmap(0x7f058f21f000, 4096) = 0
getuid() = 0
geteuid() = 0
getuid() = 0
geteuid() = 0
getgid() = 0
getegid() = 0
prctl(PR_GET_DUMPABLE) = 1
stat("/etc/fstab", {st_mode=S_IFREG|0644, st_size=79, ...}) = 0
open("/etc/fstab", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=79, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f21f000
read(3, "aufs / rootfs sync\t\t0 "..., 1024) = 79
read(3, "", 1024) = 0
close(3) = 0
munmap(0x7f058f21f000, 4096) = 0
stat("/sbin/mount.rootfs", 0x7ffcb7d1e280) = -1 ENOENT (No such file or directory)
stat("/sbin/fs.d/mount.rootfs", 0x7ffcb7d1e280) = -1 ENOENT (No such file or directory)
stat("/sbin/fs/mount.rootfs", 0x7ffcb7d1e280) = -1 ENOENT (No such file or directory)
getuid() = 0
geteuid() = 0
getgid() = 0
getegid() = 0
prctl(PR_GET_DUMPABLE) = 1
getuid() = 0
geteuid() = 0
getgid() = 0
getegid() = 0
prctl(PR_GET_DUMPABLE) = 1
stat("/run", {st_mode=S_IFDIR|0755, st_size=1200, ...}) = 0
lstat("/etc/mtab", {st_mode=S_IFLNK|0777, st_size=12, ...}) = 0
lstat("/run/mount/utab", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
open("/run/mount/utab", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 3
close(3) = 0
mount("aufs", "/", 0x6daa2d0, MS_MGC_VAL|MS_SYNCHRONOUS|MS_REMOUNT, NULL) = 0
access("/", W_OK) = 0
close(1) = 0
close(2) = 0
exit_group(0) = ?
+++ exited with 0 +++
-bash: /usr/bin/strace: Input/output error
-bash: /usr/bin/strace: Input/output error
-bash: /usr/bin/strace: Input/output error
It seems that the remount is successful because the exit code is 0, but everything afterward results in an Input/output error.
I enabled debug mode for AUFS, but there are a lot of logs, so I'm pasting a snippet where a -5 error occurs:
<7>[ 283.876619][T10391] aufs aufs_open_nondir:90:mount[10391]: DEBUG: ld.so.cache, f_flags 0x8000, f_mode 0x801d
<7>[ 283.876641][T10391] aufs au_reval_and_lock_fdi:755:mount[10391]: DEBUG: sigen 23, figen 22
<7>[ 283.876651][T10391] aufs au_refresh_dentry:792:mount[10391]: DEBUG: dentry
<7>[ 283.876654][T10391] d-1: 000000006c6ca199, etc/ld.so.cache?, aufs, cnt 1, flags 0x480040, hashed
<7>[ 283.876661][T10391] i-1: 00000000fda5122e, i33, aufs, cnt 1, nl 1, 0100644, sz 59083, blk 120, hn -1, ct 36345, np 0, st 0x0, f 0x0, v 2, g d525d45
<7>[ 283.876666][T10391] d-1: btop 0, bbot 0, bwh -1, bdiropq -1, gen 22, tmp 0
<7>[ 283.876668][T10391] d0: 00000000ed6e068d, etc/ld.so.cache?, ext4, cnt 2, flags 0x480040, hashed
<7>[ 283.876670][T10391] i0: 00000000c27efab9, i136782, ext4, cnt 2, nl 1, 0100644, sz 59083, blk 120, hn -1, ct 36345, np 15, st 0x0, f 0x0, v 1, g 28a3fb87
<7>[ 283.876677][T10391] aufs au_lkup_dentry:231:mount[10391]: DEBUG: positive
<7>[ 283.876679][T10391] aufs au_refresh_dentry:804:mount[10391]: DEBUG: dentry
<7>[ 283.876681][T10391] d-1: 000000006c6ca199, etc/ld.so.cache?, aufs, cnt 1, flags 0x480040, hashed
<7>[ 283.876683][T10391] i-1: 00000000fda5122e, i33, aufs, cnt 1, nl 1, 0100644, sz 59083, blk 120, hn -1, ct 36345, np 0, st 0x0, f 0x0, v 2, g d525d45
<7>[ 283.876686][T10391] d-1: btop 0, bbot 0, bwh -1, bdiropq -1, gen 22, tmp 0
<7>[ 283.876688][T10391] d0: 00000000ed6e068d, etc/ld.so.cache?, ext4, cnt 3, flags 0x480040, hashed
<7>[ 283.876690][T10391] i0: 00000000c27efab9, i136782, ext4, cnt 2, nl 1, 0100644, sz 59083, blk 120, hn -1, ct 36345, np 15, st 0x0, f 0x0, v 1, g 28a3fb87
<7>[ 283.876693][T10391] aufs au_refresh_dentry:810:mount[10391]: DEBUG: dentry
<7>[ 283.876695][T10391] d-1: 000000006c6ca199, etc/ld.so.cache?, aufs, cnt 1, flags 0x480040, hashed
<7>[ 283.876696][T10391] i-1: 00000000fda5122e, i33, aufs, cnt 1, nl 1, 0100644, sz 59083, blk 120, hn -1, ct 36345, np 0, st 0x0, f 0x0, v 2, g d525d45
<7>[ 283.876699][T10391] d-1: btop 0, bbot 0, bwh -1, bdiropq -1, gen 22, tmp 0
<7>[ 283.876701][T10391] d0: 00000000ed6e068d, etc/ld.so.cache?, ext4, cnt 3, flags 0x480040, hashed
<7>[ 283.876702][T10391] i0: 00000000c27efab9, i136782, ext4, cnt 2, nl 1, 0100644, sz 59083, blk 120, hn -1, ct 36345, np 15, st 0x0, f 0x0, v 1, g 28a3fb87
<7>[ 283.876706][T10391] aufs au_refresh_by_dinfo:663:mount[10391]: DEBUG: positive originally
<7>[ 283.876707][T10391] aufs au_refresh_by_dinfo:676:mount[10391]: DEBUG: positive --> positive, same type
<7>[ 283.876752][T10391] aufs aufs_atomic_open:352:mount[10391]: DEBUG: open_flag 0100000
<7>[ 283.876754][T10391] aufs aufs_atomic_open:353:mount[10391]: DEBUG: dentry
<7>[ 283.876756][T10391] d-1: 0000000075755c48, x86_64-linux-gnu/libmount.so.1?, aufs, cnt 1, flags 0x10000000, unhashed
<7>[ 283.876759][T10391] i-1: err 0
<7>[ 283.876766][T10391] aufs au_lkup_dentry:231:mount[10391]: DEBUG: positive
<7>[ 283.876777][T10391] aufs au_new_inode:414:mount[10391]: DEBUG: i2381
<7>[ 283.876785][T10391] aufs au_new_inode:420:mount[10391]: DEBUG: 8, new 1
<7>[ 283.876798][T10391] aufs aufs_atomic_open:374:mount[10391]: DEBUG: dentry
<7>[ 283.876799][T10391] d-1: 0000000075755c48, x86_64-linux-gnu/libmount.so.1?, aufs, cnt 1, flags 0x600000, hashed
<7>[ 283.876801][T10391] i-1: 00000000bcc9aa1d, i2381, aufs, cnt 1, nl 1, 0120777, sz 17, blk 0, hn -1, ct 55296, np 0, st 0x0, f 0x0, v 2, g d5261ab
<7>[ 283.876805][T10391] d-1: btop 14, bbot 14, bwh -1, bdiropq -1, gen 23, tmp 0
<7>[ 283.876807][T10391] d14: 00000000683fcaae, x86_64-linux-gnu/libmount.so.1?, squashfs, cnt 1, flags 0x680040, hashed
<7>[ 283.876809][T10391] i14: 00000000fbbaf3f7, i2887, squashfs, cnt 2, nl 1, 0120777, sz 17, blk 0, hn -1, ct 55296, np 1, st 0x0, f 0x0, v 0, g 0
<7>[ 283.876813][T10391] aufs au_aopen_no_open:326:mount[10391]: DEBUG: here
<7>[ 283.876815][T10391] aufs aufs_atomic_open:446:mount[10391]: DEBUG: file
<7>[ 283.876816][T10391] f-1: mode 0x1, flags 0100000, cnt 1, v 0, pos 0
<7>[ 283.876819][T10391] d-1: 0000000075755c48, x86_64-linux-gnu/libmount.so.1?, aufs, cnt 2, flags 0x600000, hashed
<7>[ 283.876820][T10391] i-1: 00000000bcc9aa1d, i2381, aufs, cnt 1, nl 1, 0120777, sz 17, blk 0, hn -1, ct 55296, np 0, st 0x0, f 0x0, v 2, g d5261ab
<7>[ 283.876825][T10391] aufs aufs_get_link:1366:mount[10391]: DEBUG: page_get_link
<7>[ 283.876834][T10391] aufs aufs_get_link:1367:mount[10391]: DEBUG: h_dentry
<7>[ 283.876835][T10391] d-1: 00000000683fcaae, x86_64-linux-gnu/libmount.so.1?, squashfs, cnt 2, flags 0x680040, hashed
<7>[ 283.876836][T10391] i-1: 00000000fbbaf3f7, i2887, squashfs, cnt 2, nl 1, 0120777, sz 17, blk 0, hn -1, ct 55296, np 1, st 0x0, f 0x0, v 0, g 0
<7>[ 283.876844][T10391] aufs aufs_open_nondir:90:mount[10391]: DEBUG: libmount.so.1.1.0, f_flags 0x8000, f_mode 0x801d
<7>[ 283.876943][T10391] aufs aufs_open_nondir:90:mount[10391]: DEBUG: libselinux.so.1, f_flags 0x8000, f_mode 0x801d
<7>[ 283.877015][T10391] aufs aufs_get_link:1378:mount[10391]: DEBUG: err -5
<7>[ 283.877088][T10391] aufs aufs_get_link:1378:mount[10391]: DEBUG: err -5
<7>[ 283.878446][T10392] aufs aufs_open_nondir:90:bash[10392]: DEBUG: mount, f_flags 0x8020, f_mode 0x801d
<7>[ 283.878467][T10392] aufs aufs_get_link:1366:bash[10392]: DEBUG: page_get_link
<7>[ 283.878471][T10392] aufs aufs_get_link:1367:bash[10392]: DEBUG: h_dentry
<7>[ 283.878473][T10392] d-1: 000000007e2efab5, lib64/ld-linux-x86-64.so.2?, squashfs, cnt 2, flags 0x680040, hashed
<7>[ 283.878477][T10392] i-1: 00000000ff8c3b4a, i3147, squashfs, cnt 2, nl 1, 0120777, sz 32, blk 0, hn -1, ct 55296, np 1, st 0x0, f 0x0, v 0, g 0
<7>[ 283.878484][T10392] aufs aufs_open_nondir:90:bash[10392]: DEBUG: ld-2.19.so, f_flags 0x8020, f_mode 0x801d
<7>[ 283.878709][T10392] aufs aufs_open_nondir:90:mount[10392]: DEBUG: ld.so.cache, f_flags 0x8000, f_mode 0x801d
<7>[ 283.878734][T10392] aufs aufs_get_link:1366:mount[10392]: DEBUG: page_get_link
<7>[ 283.878738][T10392] aufs aufs_get_link:1367:mount[10392]: DEBUG: h_dentry
<7>[ 283.878740][T10392] d-1: 00000000683fcaae, x86_64-linux-gnu/libmount.so.1?, squashfs, cnt 2, flags 0x680040, hashed
<7>[ 283.878743][T10392] i-1: 00000000fbbaf3f7, i2887, squashfs, cnt 2, nl 1, 0120777, sz 17, blk 0, hn -1, ct 55296, np 1, st 0x0, f 0x0, v 0, g 0
<7>[ 283.878748][T10392] aufs aufs_open_nondir:90:mount[10392]: DEBUG: libmount.so.1.1.0, f_flags 0x8000, f_mode 0x801d
<7>[ 283.878806][T10392] aufs aufs_open_nondir:90:mount[10392]: DEBUG: libselinux.so.1, f_flags 0x8000, f_mode 0x801d
<7>[ 283.878862][T10392] aufs aufs_get_link:1378:mount[10392]: DEBUG: err -5
<7>[ 283.878890][T10392] aufs aufs_get_link:1378:mount[10392]: DEBUG: err -5
<7>[ 283.886528][T10393] aufs aufs_open_nondir:90:bash[10393]: DEBUG: mount, f_flags 0x8020, f_mode 0x801d
<7>[ 283.886568][T10393] aufs aufs_get_link:1366:bash[10393]: DEBUG: page_get_link
<7>[ 283.886577][T10393] aufs aufs_get_link:1367:bash[10393]: DEBUG: h_dentry
<7>[ 283.886579][T10393] d-1: 000000007e2efab5, lib64/ld-linux-x86-64.so.2?, squashfs, cnt 2, flags 0x680040, hashed
<7>[ 283.886585][T10393] i-1: 00000000ff8c3b4a, i3147, squashfs, cnt 2, nl 1, 0120777, sz 32, blk 0, hn -1, ct 55296, np 1, st 0x0, f 0x0, v 0, g 0
<7>[ 283.886632][T10393] aufs aufs_open_nondir:90:bash[10393]: DEBUG: ld-2.19.so, f_flags 0x8020, f_mode 0x801d
<7>[ 283.887012][T10393] aufs aufs_open_nondir:90:mount[10393]: DEBUG: ld.so.cache, f_flags 0x8000, f_mode 0x801d
<7>[ 283.887044][T10393] aufs aufs_get_link:1366:mount[10393]: DEBUG: page_get_link
<7>[ 283.887048][T10393] aufs aufs_get_link:1367:mount[10393]: DEBUG: h_dentry
<7>[ 283.887050][T10393] d-1: 00000000683fcaae, x86_64-linux-gnu/libmount.so.1?, squashfs, cnt 2, flags 0x680040, hashed
<7>[ 283.887053][T10393] i-1: 00000000fbbaf3f7, i2887, squashfs, cnt 2, nl 1, 0120777, sz 17, blk 0, hn -1, ct 55296, np 1, st 0x0, f 0x0, v 0, g 0
<7>[ 283.887061][T10393] aufs aufs_open_nondir:90:mount[10393]: DEBUG: libmount.so.1.1.0, f_flags 0x8000, f_mode 0x801d
<7>[ 283.887141][T10393] aufs aufs_open_nondir:90:mount[10393]: DEBUG: libselinux.so.1, f_flags 0x8000, f_mode 0x801d
<7>[ 283.887209][T10393] aufs aufs_get_link:1378:mount[10393]: DEBUG: err -5
<7>[ 283.887249][T10393] aufs aufs_get_link:1378:mount[10393]: DEBUG: err -5
I noticed that in the kernel between 5.9 and 5.10 (from which the problem started) there were several changes in aufs_remount_fs
, au_remount_refresh
and the like. Maybe the problem started somewhere here?
Artur Piechocki:
Strace from
while true; do strace mount -o remount / ; done
... ... execve("/bin/mount", ["mount", "-o", "remount", "/"], [/* 18 vars */]) = 0 ::: stat("/etc/fstab", {st_mode=S_IFREG|0644, st_size=79, ...}) = 0 open("/etc/fstab", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=79, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f21f000 read(3, "aufs / rootfs sync\t\t0 "..., 1024) = 79
It is wrong. The line in /etc/fstab should be aufs instead of rootfs.
stat("/sbin/mount.rootfs", 0x7ffcb7d1e280) = -1 ENOENT (No such file or directory) stat("/sbin/fs.d/mount.rootfs", 0x7ffcb7d1e280) = -1 ENOENT (No such file or directory) stat("/sbin/fs/mount.rootfs", 0x7ffcb7d1e280) = -1 ENOENT (No such file or directory)
If you change /etc/fstab, these stat(".../mount.rootfs") will change to ".../mount.aufs" (and should be found and executed). But it may NOT be related to the problem.
I enabled debug mode for AUFS, but there are a lot of logs, so I'm pasting a snippet where a -5 error occurs:
::: <7>[ 283.876844][T10391] aufs aufs_open_nondir:90:mount[10391]: DEBUG: libmount.so.1.1.0, f_flags 0x8000, f_mode 0x801d <7>[ 283.876943][T10391] aufs aufs_open_nondir:90:mount[10391]: DEBUG: libselinux.so.1, f_flags 0x8000, f_mode 0x801d <7>[ 283.877015][T10391] aufs aufs_get_link:1378:mount[10391]: DEBUG: err -5 <7>[ 283.877088][T10391] aufs aufs_get_link:1378:mount[10391]: DEBUG: err -5 :::
Why 2? Threaded? Logging order? Anyway, aufs_get_link() got the error EIO, which comes from aufs_read_lock(). aufs_read_lock() gets EIO when the generation of either the root dentry and/or inode are unmatched with the generation of superblock. This may be an unknown bug. I will investigate more.
J. R. Okajima
@sfjro
If you change /etc/fstab, these stat(".../mount.rootfs") will change to ".../mount.aufs" (and should be found and executed). But it may NOT be related to the problem.
Even after switching from rootfs to aufs in fstab and using the mount.aufs
command, the input/output error still persists.
Why 2? Threaded? Logging order?
This issue might be related to the order in which logs are being written via syslog.
This may be an unknown bug. I will investigate more.
Thank you.
If you need any additional logs or debugging information, please let me know. I will also try to analyze the issue on my side.
Artur Piechocki:
- How did you mount your root aufs at the beginning? initramfs + busybox + swtich_root?
yes exacly, it is initramfs + busybox + swtich_root
OK then, did you run "mount --move" for your all branches before you mount aufs? I want to make sure that your branhces (/mnt/live/changes and /mnt/live/iamges/*.lzm) are living there after swtich_root.
J. R. Okajima
Artur Piechocki:
If you need any additional logs or debugging information, please let me know. I will also try to analyze the issue on my side.
Thank you. I'd ask you try this debug patch. It doesn't fix the problem, but will help us to identify the cause. The debug log is LOG_INFO, so you may need to customize your /proc/sys/kernel/printk.
And the base version of this patch is aufs-6.x-rcN for linux-v6.11-rc5. I hope it doesn't matter for you.
J. R. Okajima
diff --git a/fs/aufs/dentry.c b/fs/aufs/dentry.c index 7eae27d1e9b1..bafdd54776aa 100644 --- a/fs/aufs/dentry.c +++ b/fs/aufs/dentry.c @@ -485,7 +485,8 @@ static int au_do_refresh_hdentry(struct dentry dentry, struct dentry parent) err = 0; break; }
pr_info("%pd2??, btop %d\n", dentry, dinfo->di_btop);
AuTraceErr(err); return err; diff --git a/fs/aufs/dinfo.c b/fs/aufs/dinfo.c index 318aedd357e4..b130d8a1e67a 100644 --- a/fs/aufs/dinfo.c +++ b/fs/aufs/dinfo.c @@ -486,6 +486,7 @@ void au_update_dbrange(struct dentry *dentry, int do_put_zero) if (dinfo->di_btop > dinfo->di_bbot) { dinfo->di_btop = -1; dinfo->di_bbot = -1;
diff --git a/fs/aufs/fsctx.c b/fs/aufs/fsctx.c index 008a5aaf11e7..b966e45141ee 100644 --- a/fs/aufs/fsctx.c +++ b/fs/aufs/fsctx.c @@ -63,9 +63,19 @@ static int au_fsctx_reconfigure(struct fs_context *fc) goto out; di_write_lock_child(root);
+pr_info("sigen %u, digen %u, iigen %u\n",
/ au_opts_remount() may return an error / err = au_opts_remount(sb, &a->opts);
+pr_info("sigen %u, digen %u, iigen %u\n",
if (au_ftest_opts(a->opts.flags, REFRESH)) au_remount_refresh(sb, au_ftest_opts(a->opts.flags, REFRESH_IDOP)); diff --git a/fs/aufs/i_op.c b/fs/aufs/i_op.c index 662704a62d96..ee63f9d83f7d 100644 --- a/fs/aufs/i_op.c +++ b/fs/aufs/i_op.c @@ -1332,19 +1332,28 @@ static const char aufs_get_link(struct dentry dentry, struct inode *inode, goto out;
err = aufs_read_lock(dentry, AuLock_IR | AuLock_GEN);
}
err = au_d_hashed_positive(dentry);
}
err = -EINVAL; inode = d_inode(dentry); bindex = au_ibtop(inode); h_inode = au_h_iptr(inode, bindex);
}
err = -EBUSY; h_dentry = NULL; @@ -1357,11 +1366,16 @@ static const char aufs_get_link(struct dentry dentry, struct inode *inode, h_dentry = d_find_any_alias(h_inode); if (IS_ERR(h_dentry)) { err = PTR_ERR(h_dentry);
}
err = 0; AuDbg("%ps\n", h_inode->i_op->get_link);
@sfjro in the aufs_debug_io_error.log file, I have included logs from the enabled debug mode in AUFS (kernel v6.10.6) along with the additional proposed logging for digen and btop. The first occurrence of the input/output
error happened around the timestamp value 952.
OK then, did you run "mount --move" for your all branches before you mount aufs? I want to make sure that your branhces (/mnt/live/changes and /mnt/live/iamges/*.lzm) are living there after swtich_root.
Yes, I perform a mount --move
on my branches before mounting AUFS.
Artur Piechocki:
@sfjro in the aufs_debug_io_error.log file, I have included logs from the enabled debug mode in AUFS (kernel v6.10.6) along with the additional proposed logging for digen and btop. The first occurrence of the
input/output
error happened around the timestamp value 952.
Thanks for the log. But it looks a little weird. "mount -o remount /" should enter aufs/fsctx.c:au_fsctx_reconfigure() which contains pr_info("sigen %u, digen %u, iigen %u\n", ...) two places as previous debug patch.
I am expecting this debug log but cannot find in your aufs_debug_io_error.log. Is the log full version?
J. R. Okajima
@sfjro
The previous log had to be truncated by logrotate. Sorry for that.
I am resending the logs aufs_io_error.log , but without debugging enabled in aufs (echo 1 > /sys/module/aufs/parameters/debug
) because with debugging enabled, it is very difficult to reproduce the problem due to the large amount of generated logs. My system cannot handle processing such a large number of logs.
Anyway let me know if you need the full logs with debugging enabled in aufs, and I will keep trying until I succeed.
Artur Piechocki:
I am resending the logs aufs_io_error.log , but without debugging enabled in aufs (
echo 1 > /sys/module/aufs/parameters/debug
) because with debugging enabled, it is very difficult to reproduce the problem due to the large amount of generated logs. My system cannot handle processing such a large number of logs.
Thanx for the log. But it was not enough. Would you try this additional patch, which enables DEBUG during remount only?
J. R. Okajima
diff --git a/fs/aufs/fsctx.c b/fs/aufs/fsctx.c index 008a5aaf11e7..de4d68b19b36 100644 --- a/fs/aufs/fsctx.c +++ b/fs/aufs/fsctx.c @@ -43,6 +43,7 @@ static int au_fsctx_reconfigure(struct fs_context fc) struct inode inode; struct au_fsctx_opts *a = fc->fs_private;
+au_debug_on(); AuDbg("fc %p\n", fc);
root = fc->root;
@@ -84,6 +95,7 @@ static int au_fsctx_reconfigure(struct fs_context *fc) err = cvt_err(err); AuTraceErr(err);
+au_debug_off(); return err; }
@sfjro additional logs aufs_debug_io_error1.log with debug for remount.
Thank You
Artur Piechocki:
@sfjro additional logs aufs_debug_io_error1.log with debug for remount.
Unfortunately the log is still not enough. I'm afraid several parts of the log are lost. I'd suggest you to try increasing 'log_buf_len' in your kernel command line.
Also 'cron' daemon looks invoked very frequently in a short time. Is it expected behaviour?
J. R. Okajima
@sfjro
What kind of logs do you think are missing (debug from fsctx.c file ?) The last logs were generated while skipping syslog. I'm using the dmesg -n8 && cat /proc/kmsg
command, so it seems strange that log_buf_len
would matter here. All kernel logs are currently shown in the terminal, then I copy the contents of the terminal and put them in a log file (additionally, I set the terminal history to infinite). I do this on purpose to bypass system services such as syslog which would not be able to write anything in the event of an input/output
error.
The cron job is intentionally set to run every minute because the more commands are executed on the system during the remount, the faster I can reproduce the problem.
Artur Piechocki:
What kind of logs do you think are missing (debug from fsctx.c file ?)
Mainly yes. Applying the debug patch, au_fsctx_reconfigure() prints two "sigen %d..." logs. But there is only one log for "sigen 17" in your aufs_debug_io_error1.log.
1162: aufs au_fsctx_reconfigure:80:mount[19848]: sigen 16, digen 16, iigen 16 1164: aufs au_fsctx_reconfigure:88:mount[19848]: sigen 16, digen 16, iigen 16 3069: aufs au_fsctx_reconfigure:88:mount[14731]: sigen 17, digen 17, iigen 17 16422: aufs au_fsctx_reconfigure:80:mount[14732]: sigen 18, digen 18, iigen 18 16425: aufs au_fsctx_reconfigure:88:mount[14732]: sigen 18, digen 18, iigen 18
That is the main reason why I'm afraid the log is not meaningful.
J. R. Okajima
@sfjro
I'm sending the logs again aufs_debug_io_error2.log. I set log_buf_len
to 10M and made sure that each log with sigen information for a given process is duplicated.
Thank You
Artur Piechocki:
I'm sending the logs again aufs_debug_io_error2.log. I set
log_buf_len
to 10M and made sure that each log with sigen information for a given process is duplicated.
OK thanks. But this log still looks weird. It has several "/bin/mount: Input/output error" logs, but no "aufs_get_link: err -5" log at all. I am not sure where EIO comes from. Assuming all logs were collected, EIO has to come from other than aufs. (well, it's doubtful).
Would you apply this another additional debug patch?
diff --git a/fs/fsopen.c b/fs/fsopen.c index 6593ae518115..83355c676ecf 100644 --- a/fs/fsopen.c +++ b/fs/fsopen.c @@ -266,6 +266,10 @@ static int vfs_cmd_reconfigure(struct fs_context fc) down_write(&sb->s_umount); ret = reconfigure_super(fc); up_write(&sb->s_umount); +if(ret)pr_info("%s:%d:%.s[%d]: ret %d\n",
@@ -2912,6 +2916,10 @@ static int do_remount(struct path *path, int ms_flags, int sb_flags, mnt_warn_timestamp_expiry(path, &mnt->mnt);
put_fs_context(fc);
+if(err)pr_info("%s:%d:%.*s[%d]: err %d\n",
diff --git a/fs/super.c b/fs/super.c index 095ba793e10c..d2d261679d5a 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1097,6 +1097,10 @@ int reconfigure_super(struct fs_context *fc)
cancel_readonly: sb_end_ro_state_change(sb); +if(retval)pr_info("%s:%d:%.*s[%d]: retval %d\n",
@@ -1734,6 +1738,10 @@ int reconfigure_single(struct super_block s, ret = reconfigure_super(fc); out: put_fs_context(fc); +if(ret)pr_info("%s:%d:%.s[%d]: ret %d\n",
error" logs, but no "aufs_get_link: err -5" log at all. I am not sure where EIO comes from.
Could it be that debug wasn't enabled in AUFS? As I mentioned, it's difficult to reproduce the issue when debug is enabled, but I kept trying until I succeeded, and the file aufs_debug_io_error2.log now contains another log, this time with debug enabled.
Would you apply this another additional debug patch?
I'm having a bit of a problem here: despite the patch being applied (I double-checked), there is no debug (from the last patch) information in the log. It's as if the condition is always false and never enters the pr_info
function.
However, I hope that the log with debugging enabled (echo 1 > /sys/module/aufs/parameters/debug
) will be now more useful.
Artur Piechocki:
Could it be that debug wasn't enabled in AUFS? As I mentioned, it's difficult to reproduce the issue when debug is enabled, but I kept trying until I succeeded, and the file aufs_debug_io_error2.log now contains another log, this time with debug enabled.
Enabling debug is helpful here. Please be patient. By new aufs_debug_io_error2.log, I am narrowing down the (possible) bug. Here is a first trial to fix. Please test this patch.
I'm having a bit of a problem here: despite the patch being applied (I double-checked), there is no debug (from the last patch) information in the log. It's as if the condition is always false and never enters the
pr_info
function.
Ok, the must be in aufs.
J. R. Okajima
diff --git a/fs/aufs/sbinfo.c b/fs/aufs/sbinfo.c index fc1d958bcaa5..d83e13462061 100644 --- a/fs/aufs/sbinfo.c +++ b/fs/aufs/sbinfo.c @@ -239,8 +239,10 @@ int aufs_read_lock(struct dentry dentry, int flags) err = au_digen_test(dentry, au_sigen(sb)); if (!au_opt_test(au_mntflags(sb), UDBA_NONE)) AuDebugOn(!err && au_dbrange_test(dentry)); +/ else if (!err) err = au_dbrange_test(dentry); +*/ if (unlikely(err)) aufs_read_unlock(dentry, flags); }
@sfjro
Despite applying the potential fix, the problem is still reproducible. I am sending debug logs aufs_debug_io_error_with_potential_fix.log with the latest patch applied.
Please note that the issue only occurs when udba=none
. When udba=notify
is set, the problem does not occur.
Artur Piechocki:
Despite applying the potential fix, the problem is still reproducible. I am sending debug logs aufs_debug_io_error_with_potential_fix.log with the latest patch applied.
OK, thanks for your repeated tests. Now I want you to find all the directories named 'x86_64-linux-gnu', and all the entries (regular file, symlink, etc) named 'libc.so.6' on your all branches. It is to know whether multiple entires exist or not.
And here is another possible/potential fix patch. Please try.
Please note that the issue only occurs when
udba=none
. Whenudba=notify
is set, the problem does not occur.
Yes, my previous patch was meaningless here. Probably I was crazy.
J. R. Okajima
diff --git a/fs/aufs/dcsub.c b/fs/aufs/dcsub.c index 842063a3c314..1e27ba9f42ba 100644 --- a/fs/aufs/dcsub.c +++ b/fs/aufs/dcsub.c @@ -122,9 +122,9 @@ static enum d_walk_ret au_call_dpages_append(void _arg, struct dentry dentry) ret = D_WALK_CONTINUE; if (dentry->d_sb == arg->sb && !IS_ROOT(dentry)
&& (!arg->test || arg->test(dentry, arg->arg))) { +/ && (!arg->test || arg->test(dentry, arg->arg))) { arg->err = au_dpages_append(arg->dpages, dentry, GFP_ATOMIC); if (unlikely(arg->err)) ret = D_WALK_QUIT; diff --git a/fs/aufs/super.c b/fs/aufs/super.c index 07d3412e950f..10a80f4f8c22 100644 --- a/fs/aufs/super.c +++ b/fs/aufs/super.c @@ -577,7 +577,7 @@ static int au_do_refresh_d(struct dentry dentry, unsigned int sigen,
err = 0; parent = dget_parent(dentry);
@sfjro
Unfortunately, the proposed second fix causes a kernel panic. Here is the log aufs_panic_fix2.log
Now I want you to find all the directories named 'x86_64-linux-gnu',
I have two branches that have a directory /lib/x86_64-linux-gnu
and /usr/lib/x86_64-linux-gnu
with various library files.
and all the entries (regular file, symlink, etc) named 'libc.so.6' on
Only one branch has libc.so.6
.
Artur Piechocki:
Unfortunately, the proposed second fix causes a kernel panic. Here is the log aufs_panic_fix2.log
Ok, that was my bad. Please modify fs/aufs/dcsub.c (which is patched by my last patch) like this. (see attached)
and all the entries (regular file, symlink, etc) named 'libc.so.6' on
Only one branch has
libc.so.6
.
And it is a symlink on your 14th branch? What is the target, and where is it?
J. R. Okajima
diff --git a/fs/aufs/dcsub.c b/fs/aufs/dcsub.c index 842063a3c314..ee2a177b4188 100644 --- a/fs/aufs/dcsub.c +++ b/fs/aufs/dcsub.c @@ -86,7 +86,7 @@ static int au_dpages_append(struct au_dcsub_pages *dpages, dpages->ndpage++; }
@@ -122,8 +122,8 @@ static enum d_walk_ret au_call_dpages_append(void _arg, struct dentry dentry) ret = D_WALK_CONTINUE; if (dentry->d_sb == arg->sb && !IS_ROOT(dentry)
@sfjro
And it is a symlink on your 14th branch? What is the target, and where is it?
Yes, it is symlink on 14th branch which target is libc-2.19.so
. Target is also on the same 14th branch.
ls -al /lib/x86_64-linux-gnu/libc.so.6
lrwxrwxrwx 1 root root 12 May 17 2016 /lib/x86_64-linux-gnu/libc.so.6 -> libc-2.19.so
Please modify fs/aufs/dcsub.c (which is patched by my last patch) like this. (see attached)
Now it looks like the fix is working and the issue is gone. Is this the target fix or just for analysis purposes?
Artur Piechocki:
Now it looks like the fix is =E2=80=8B=E2=80=8Bworking and the issue is g= one. Is this the target fix or just for analysis purposes?
That is a good news. The patch is not final one, and I'm still considering the bug scenario. May I ask you more tests?
J. R. Okajima
@sfjro
May I ask you more tests?
Yes, of course. Please write what additional tests I should do.
Artur Piechocki:
Yes, of course. Please write what additional tests I should do.
Thank you very much. The patches I sent were to fix two (or more) possible bugs, and I still don't identify which bug is real. Please REMOVE this patch and test again. If it succeeds, the bug will be narrow down to around the dentry ref-count.
J. R. Okajima
diff --git a/fs/aufs/super.c b/fs/aufs/super.c index 07d3412e950f..10a80f4f8c22 100644 --- a/fs/aufs/super.c +++ b/fs/aufs/super.c @@ -577,7 +577,7 @@ static int au_do_refresh_d(struct dentry *dentry, unsigned int sigen,
err = 0;
parent = dget_parent(dentry);
@sfjro
I removed all previous patches and applied only the last one (changes in au_do_refresh_d
function) you sent and the issue is reproducible.
I hope this helps.
Artur Piechocki:
I removed all previous patches and applied only the last one (changes in
au_do_refresh_d
function) you sent and the issue is reproducible.
No, that is not what I meant. Do not apply the patch for fs/aufs/super.c:au_do_refresh_d() which commented out "!au_digen_test(parent, sigen) &&". Keep all other patches applied and test please.
J. R. Okajima
@sfjro
I am a bit confused about what I should test because there were several proposed patches in this thread (debugging, a potential fix that didn’t work, a fix that caused a kernel panic, and a change that potentially resolved the issue), and now I’m not sure what to revert and what to keep.
In this regard, could you please prepare a patch on the clean AUFS code that I should test?
Artur Piechocki:
In this regard, could you please prepare a patch on the clean AUFS code t= hat I should test?
I'm sorry to make you confused. Here is the patch for clean aufs6.10. Plz test.
J. R. Okajima
diff --git a/fs/aufs/dcsub.c b/fs/aufs/dcsub.c index 842063a3c314..ee2a177b4188 100644 --- a/fs/aufs/dcsub.c +++ b/fs/aufs/dcsub.c @@ -86,7 +86,7 @@ static int au_dpages_append(struct au_dcsub_pages *dpages, dpages->ndpage++; }
@@ -122,8 +122,8 @@ static enum d_walk_ret au_call_dpages_append(void _arg, struct dentry dentry) ret = D_WALK_CONTINUE; if (dentry->d_sb == arg->sb && !IS_ROOT(dentry)
@sfjro
The last proposed patch seems to work, as the issue hasn't recurred.
It appears that the problem lies in au_dcount(dentry)
?
Artur Piechocki:
The last proposed patch seems to work, as the issue hasn't recurred.
It appears that the problem lies in
au_dcount(dentry)
?
Thanks for your tests many times. I agree that dentry ref-count is suspicious, but I don't see the whole story (the root cause) and I'm still checking "git diff v5.9 v5.10".
The fix by removing au_dcount(dentry) will be released after I get the big picture of the bug.
J. R. Okajima
I agree that dentry ref-count is suspicious, but I don't see the whole story (the root cause) and I'm still checking "git diff v5.9 v5.10".
The fix by removing au_dcount(dentry) will be released after I get the big picture of the bug.
Here is a progress brief from me. I'm afraid there is something more about this issue. Applying the previous patch is not enough, it caused another symptoms. They are all related to the ref-count at the remount time. I'm digging deeper but it is not easy.
J. R. Okajima
@sfjro
Thank you for the information. Please let me know if there are any additional specific tests or debugging steps needed.
I also have an additional question, if I may ask it here.
Occasionally, I encounter a kernel warning related to inodes (udba=notify
):
2024 Oct 17 12:03:02 [kern.warning] [ 510.069971][T113674] ------------[ cut here ]------------
2024 Oct 17 12:03:02 [kern.warning] [ 510.069976][T113674] WARNING: CPU: 5 PID: 113674 at fs/inode.c:265 __destroy_inode+0x11b/0x180
2024 Oct 17 12:03:02 [kern.warning] [ 510.069987][T113674] Modules linked in: iscsi_scst(O) scst_vdisk(O) scst(O) zfs(PO) qat_api(O) spl(O) intel_qat(O) uio iptable_filter bpfilter mst_pciconf(O) target_core_iblock
target_core_pscsi iscsi_target_mod target_core_mod bonding ib_iser rdma_cm iw_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse mlx5_ib ib_umad ib_ipoib ib_cm mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core
x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel aesni_intel crypto_simd cryptd rapl intel_cstate mlx5_core igb(O) mlxfw ptp pps_core pci_hyperv_intf be2net button nls_iso8859_1 nls_cp437 ipmi_watchdog ses
sg ipmi_si ipmi_devintf ipmi_msghandler nvme nvme_core mpt3sas(O) raid_class megaraid_sas(O) scsi_transport_sas vfat fat aufs scsi_transport_fc [last unloaded: mst_pci]
2024 Oct 17 12:03:02 [kern.warning] [ 510.070035][T113674] CPU: 5 PID: 113674 Comm: mv Kdump: loaded Tainted: P W O 5.15.167 #24 ffafa14d5ffccdd81f5ea87ebf1c7c0a58d3af83
2024 Oct 17 12:03:02 [kern.warning] [ 510.070038][T113674] Hardware name: Intel Corporation S2600IP/S2600IP, BIOS SE5C600.86B.02.06.0007.082420181029 08/24/2018
2024 Oct 17 12:03:02 [kern.warning] [ 510.070040][T113674] RIP: 0010:__destroy_inode+0x11b/0x180
2024 Oct 17 12:03:02 [kern.warning] [ 510.070044][T113674] Code: ff f0 0f c1 07 83 f8 01 74 27 85 c0 7f b3 be 03 00 00 00 e8 67 b7 34 00 eb a7 0f 0b 0f 0b e9 1b ff ff ff 0f 0b e9 42 ff ff ff <0f> 0b e9 79 ff ff ff be 08 00 00 00 48 83 c7 08 e8 f0 d3 e2 ff eb
2024 Oct 17 12:03:02 [kern.warning] [ 510.070045][T113674] RSP: 0018:ffff889052db7db8 EFLAGS: 00010246
2024 Oct 17 12:03:02 [kern.warning] [ 510.070047][T113674] RAX: ffff88904de90800 RBX: ffff8882f2a1a0d8 RCX: 0000000000000000
2024 Oct 17 12:03:02 [kern.warning] [ 510.070049][T113674] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8882f2a1a0d8
2024 Oct 17 12:03:02 [kern.warning] [ 510.070050][T113674] RBP: ffffffffa00410e0 R08: 0000000000000000 R09: ffff8882f2a1a0d8
2024 Oct 17 12:03:02 [kern.warning] [ 510.070052][T113674] R10: ffff889052db7e28 R11: 0000000000000000 R12: ffff8882e8829258
2024 Oct 17 12:03:02 [kern.warning] [ 510.070053][T113674] R13: ffff8890da3df800 R14: 00000000ffffff9c R15: 0000000000000000
2024 Oct 17 12:03:02 [kern.warning] [ 510.070054][T113674] FS: 00007f4e69796800(0000) GS:ffff889c0ee40000(0000) knlGS:0000000000000000
2024 Oct 17 12:03:02 [kern.warning] [ 510.070056][T113674] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024 Oct 17 12:03:02 [kern.warning] [ 510.070057][T113674] CR2: 00007f4e68c7fa70 CR3: 00000010db93a006 CR4: 00000000000606e0
2024 Oct 17 12:03:02 [kern.warning] [ 510.070059][T113674] Call Trace:
2024 Oct 17 12:03:02 [kern.warning] [ 510.070062][T113674] <TASK>
2024 Oct 17 12:03:02 [kern.warning] [ 510.070065][T113674] ? __warn+0x81/0x100
2024 Oct 17 12:03:02 [kern.warning] [ 510.070070][T113674] ? __destroy_inode+0x11b/0x180
2024 Oct 17 12:03:02 [kern.warning] [ 510.070072][T113674] ? report_bug+0x99/0xc0
2024 Oct 17 12:03:02 [kern.warning] [ 510.070076][T113674] ? handle_bug+0x34/0x80
2024 Oct 17 12:03:02 [kern.warning] [ 510.070079][T113674] ? exc_invalid_op+0x13/0x60
2024 Oct 17 12:03:02 [kern.warning] [ 510.070081][T113674] ? asm_exc_invalid_op+0x16/0x20
2024 Oct 17 12:03:02 [kern.warning] [ 510.070087][T113674] ? __destroy_inode+0x11b/0x180
2024 Oct 17 12:03:02 [kern.warning] [ 510.070089][T113674] ? __destroy_inode+0x80/0x180
2024 Oct 17 12:03:02 [kern.warning] [ 510.070092][T113674] destroy_inode+0x2a/0x70
2024 Oct 17 12:03:02 [kern.warning] [ 510.070094][T113674] __dentry_kill+0xce/0x160
2024 Oct 17 12:03:02 [kern.warning] [ 510.070097][T113674] dput+0x132/0x300
2024 Oct 17 12:03:02 [kern.warning] [ 510.070100][T113674] do_renameat2+0x30d/0x590
2024 Oct 17 12:03:02 [kern.warning] [ 510.070103][T113674] __x64_sys_rename+0x3f/0x50
2024 Oct 17 12:03:02 [kern.warning] [ 510.070105][T113674] do_syscall_64+0x3c/0x90
2024 Oct 17 12:03:02 [kern.warning] [ 510.070107][T113674] entry_SYSCALL_64_after_hwframe+0x6c/0xd6
2024 Oct 17 12:03:02 [kern.warning] [ 510.070110][T113674] RIP: 0033:0x7f4e68c15727
2024 Oct 17 12:03:02 [kern.warning] [ 510.070112][T113674] Code: 75 12 48 89 df e8 49 5c 07 00 85 c0 0f 95 c0 0f b6 c0 f7 d8 5b c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 52 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 31 d7 33 00 f7 d8 64 89 01 48
2024 Oct 17 12:03:02 [kern.warning] [ 510.070114][T113674] RSP: 002b:00007ffe3f251e58 EFLAGS: 00000202 ORIG_RAX: 0000000000000052
2024 Oct 17 12:03:02 [kern.warning] [ 510.070116][T113674] RAX: ffffffffffffffda RBX: 0000000000008000 RCX: 00007f4e68c15727
2024 Oct 17 12:03:02 [kern.warning] [ 510.070117][T113674] RDX: 0000000000000000 RSI: 00007ffe3f2536a0 RDI: 00007ffe3f253656
2024 Oct 17 12:03:02 [kern.warning] [ 510.070119][T113674] RBP: 00007ffe3f252230 R08: 0000000000000001 R09: 0000000000000001
2024 Oct 17 12:03:02 [kern.warning] [ 510.070120][T113674] R10: 00007ffe3f251c20 R11: 0000000000000202 R12: 0000000000000000
2024 Oct 17 12:03:02 [kern.warning] [ 510.070121][T113674] R13: 00007ffe3f252320 R14: 00007ffe3f253656 R15: 0000000000000000
2024 Oct 17 12:03:02 [kern.warning] [ 510.070123][T113674] </TASK>
2024 Oct 17 12:03:02 [kern.warning] [ 510.070123][T113674] ---[ end trace bf738f39f0ebdc45 ]---
2024 Oct 17 12:03:02 [kern.warning] [ 510.130104][T113725] ------------[ cut here ]------------
This warning is triggered by any operation on /tmp (in this case, mv), but the specific command does not seem to matter. Restarting the system clears the issue temporarily, though it does recur after some reboots.
I haven’t found a guaranteed way to reproduce this, but it appears intermittently.
Could this be related to AUFS? Should I report a new issue?
A similar thread can be found here:
https://bugs.launchpad.net/ubuntu/+source/linux-meta/+bug/2048526
Artur Piechocki:
Thank you for the information. Please let me know if there are any additi= onal specific tests or debugging steps needed.
That's kind of you. Currently there is nothing to ask, but when it is necessary I will, since the problem cannot be reproduced on my side.
Occasionally, I encounter a kernel warning related to inodes (
udba=3Dnot= ify
):2024 Oct 17 12:03:02 [kern.warning] [ 510.069971][T113674] ------------[= cut here ]------------ 2024 Oct 17 12:03:02 [kern.warning] [ 510.069976][T113674] WARNING: CPU:= 5 PID: 113674 at fs/inode.c:265 __destroy_inode+0x11b/0x180 ::: This warning is triggered by any operation on /tmp (in this case, mv), bu= t the specific command does not seem to matter. Restarting the system cle= ::: Could this be related to AUFS? Should I report a new issue?
Hmm, is your /tmp either aufs or a writable branch of aufs? Otherwise I don't think aufs is related to this warning. But I've found a possible cause of this warning in aufs and I'm going to address it. The fix will be released in a few weeks. But I am not sure whether that is the real cause and will make the warning disappear.
J. R. Okajima
@sfjro
Hmm, is your /tmp either aufs or a writable branch of aufs?
No, my/tmp
is tmpfs
Otherwise I don't think aufs is related to this warning.
I spent some time testing and after dozens of tests with system restarts, I am sure that the warning (fs/inode.c:265 __destroy_inode
) problem does not occur when udba=none
, so it looks like something related to aufs.
When udba=notify
the warning problem occurs several times out of ten restarts so I have a fairly certain test scenario.
Here is a progress brief from me. I'm afraid there is something more about this issue. Applying the previous patch is not enough, it caused another symptoms. They are all related to the ref-count at the remount time. I'm digging deeper but it is not easy.
Could you please describe what other symptoms you are experiencing?
Artur Piechocki:
I spent some time testing and after dozens of tests with system restarts, I am sure that the warning (
fs/inode.c:265 __destroy_inode
) problem does not occur whenudba=none
, so it looks like something related to aufs.
That is interesting.
> Here is a progress brief from me. > I'm afraid there is something more about this issue. > Applying the previous patch is not enough, it caused another > symptoms. They are all related to the ref-count at the remount time. > I'm digging deeper but it is not easy.
Could you please describe what other symptoms you are experiencing?
In the last patch, I removed "AuDebugOn(au_dcount(dentry) <= 0)" from fs/aufs/dcsub.c. There is one more AuDebugOn() to be removed in fs/aufs/branch.c. Moreover, another crash (null ptr access) happened in fs/aufs/super.c. In au_do_refresh_d(), the passed dentry was already killed unexpectedly. That is really strange and should not happen. I'm still thinking about the root cause.
J. R. Okajima
Artur Piechocki:
I spent some time testing and after dozens of tests with system restarts, I am sure that the warning (
fs/inode.c:265 __destroy_inode
) problem does not occur whenudba=none
, so it looks like something related to aufs.
The warning is related to a link count of inode. As you might know, inodes are the objects per filesystem superblock, which means the inodes of /tmp (tmpfs) are totally different thing that aufs inode. So I am unsure how much aufs is related to the warning.
Reviewing aufs inode link count, I found some suspicious code and I'm trying fix. But still I am not sure whether aufs is related to your warning. Anyway, if you can, please try this patch.
J. R. Okajima
diff --git a/fs/aufs/cpup.c b/fs/aufs/cpup.c index 8fb1f49bbe8e..19a7cfa5213f 100644 --- a/fs/aufs/cpup.c +++ b/fs/aufs/cpup.c @@ -56,7 +56,7 @@ void au_cpup_attr_nlink(struct inode *inode, int force)
the incorrect link count. */
/*
fewer nlink makes find(1) noisy, but larger nlink doesn't. diff --git a/fs/aufs/dir.c b/fs/aufs/dir.c index 11fc9925e66b..2faeabd47db5 100644 --- a/fs/aufs/dir.c +++ b/fs/aufs/dir.c @@ -23,7 +23,7 @@ void au_add_nlink(struct inode dir, struct inode h_dir) nlink += 2; smp_mb(); / for i_nlink / / 0 can happen in revaliding /
void au_sub_nlink(struct inode dir, struct inode h_dir) @@ -38,7 +38,7 @@ void au_sub_nlink(struct inode dir, struct inode h_dir) nlink -= 2; smp_mb(); / for i_nlink / / nlink == 0 means the branch-fs is broken /
loff_t au_dir_size(struct file file, struct dentry dentry) diff --git a/fs/aufs/i_op.c b/fs/aufs/i_op.c index c3ff23f485d7..a4423a23e022 100644 --- a/fs/aufs/i_op.c +++ b/fs/aufs/i_op.c @@ -1173,7 +1173,7 @@ static void au_refresh_iattr(struct inode inode, struct kstat st, n += st->nlink; smp_mb(); / for i_nlink / / 0 can happen /
spin_lock(&inode->i_lock); diff --git a/fs/aufs/vfsub.h b/fs/aufs/vfsub.h index 2258ccc5b595..d5c850eaf387 100644 --- a/fs/aufs/vfsub.h +++ b/fs/aufs/vfsub.h @@ -44,6 +44,16 @@ enum {
/ ---------------------------------------------------------------------- /
+static inline void au_set_nlink(struct inode *inode, unsigned int nlink) +{
static inline void vfsub_drop_nlink(struct inode *inode) { AuDebugOn(!inode->i_nlink);
@sfjro
Unfortunately, with the proposed fix, the warning related to the link count of inode still persists (most operations on /tmp trigger this warning).
However, I noticed that the warning stops appearing every time after the following entry occurs:
[ 220.715176] aufs au_fsctx_fill_super:149:mount[31444]: aufs_dop_noreval [aufs]
[ 216.657439] ------------[ cut here ]------------
[ 216.657447] WARNING: CPU: 29 PID: 29757 at fs/inode.c:265 __destroy_inode+0x11b/0x180
[ 216.657472] Modules linked in: iscsi_scst(O) scst_vdisk(O) scst(O) ib_isert zfs(PO) target_core_file qat_api(O) spl(O) intel_qat(O) uio iptable_filter mst_pciconf(O) target_core_iblock target_core_pscsi iscsi_target_mod target_core_mod ib_iser rdma_cm iw_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse mlx5_ib ib_umad ib_ipoib ib_cm mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel aesni_intel crypto_simd cryptd rapl intel_cstate tg3 mlx5_core libphy ixgbe(O) mlxfw ptp pci_hyperv_intf pps_core button nls_iso8859_1 nls_cp437 ipmi_watchdog sg mpt3sas(O) raid_class scsi_transport_sas ipmi_si ipmi_devintf ipmi_msghandler vfat fat aufs scsi_transport_fc [last unloaded: mst_pci]
[ 216.657570] CPU: 29 PID: 29757 Comm: a2dissite Kdump: loaded Tainted: P W O 5.15.167 #6 05d8f8af315c999c1d68a362530c749cbb22bee6
[ 216.657577] Hardware name: Dell Inc. PowerEdge R720xd/0HJK12, BIOS 2.9.0 12/06/2019
[ 216.657581] RIP: 0010:__destroy_inode+0x11b/0x180
[ 216.657590] Code: ff f0 0f c1 07 83 f8 01 74 27 85 c0 7f b3 be 03 00 00 00 e8 67 b7 34 00 eb a7 0f 0b 0f 0b e9 1b ff ff ff 0f 0b e9 42 ff ff ff <0f> 0b e9 79 ff ff ff be 08 00 00 00 48 83 c7 08 e8 f0 d3 e2 ff eb
[ 216.657594] RSP: 0018:ffff8882d2627e90 EFLAGS: 00010246
[ 216.657599] RAX: ffff888106315000 RBX: ffff8882c16ade98 RCX: 0000000000000001
[ 216.657603] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8882c16ade98
[ 216.657606] RBP: ffffffffa00410e0 R08: 0000000000000006 R09: 0000000000000064
[ 216.657609] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 216.657611] R13: ffff8882c16ade98 R14: 00000000ffffff9c R15: ffff88825405a840
[ 216.657615] FS: 00007f2454d05700(0000) GS:ffff889fffb80000(0000) knlGS:0000000000000000
[ 216.657619] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 216.657623] CR2: 000000002e7b4f78 CR3: 00000002bada0002 CR4: 00000000001706e0
[ 216.657627] Call Trace:
[ 216.657633] <TASK>
[ 216.657637] ? __warn+0x81/0x100
[ 216.657648] ? __destroy_inode+0x11b/0x180
[ 216.657655] ? report_bug+0x99/0xc0
[ 216.657663] ? handle_bug+0x34/0x80
[ 216.657671] ? exc_invalid_op+0x13/0x60
[ 216.657675] ? asm_exc_invalid_op+0x16/0x20
[ 216.657689] ? __destroy_inode+0x11b/0x180
[ 216.657696] destroy_inode+0x2a/0x70
[ 216.657703] do_unlinkat+0x147/0x2f0
[ 216.657710] __x64_sys_unlink+0x3e/0x70
[ 216.657714] do_syscall_64+0x3c/0x90
[ 216.657719] entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[ 216.657727] RIP: 0033:0x7f2453d492f7
[ 216.657731] Code: 48 3d 00 f0 ff ff 77 03 48 98 c3 48 8b 15 89 7b 2c 00 f7 d8 64 89 02 48 83 c8 ff eb eb 66 0f 1f 44 00 00 b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 61 7b 2c 00 f7 d8 64 89 01 48
[ 216.657736] RSP: 002b:00007ffe1e940cb8 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
[ 216.657741] RAX: ffffffffffffffda RBX: 000000002e4d3010 RCX: 00007f2453d492f7
[ 216.657744] RDX: 000000002e4d31a8 RSI: 000000002e4d31a8 RDI: 000000002e500610
[ 216.657746] RBP: 000000002e4d8860 R08: 0000000000000010 R09: 000000002e4d8850
[ 216.657749] R10: 00007ffe1e940a80 R11: 0000000000000206 R12: 0000000000000001
[ 216.657752] R13: 000000002e4d8860 R14: 00007ffe1e940cf8 R15: 000000002e500610
[ 216.657756] </TASK>
[ 216.657758] ---[ end trace daea061fcac360aa ]---
[ 216.661917] ------------[ cut here ]------------
[ 216.661923] WARNING: CPU: 33 PID: 29757 at fs/inode.c:265 __destroy_inode+0x11b/0x180
[ 216.661939] Modules linked in: iscsi_scst(O) scst_vdisk(O) scst(O) ib_isert zfs(PO) target_core_file qat_api(O) spl(O) intel_qat(O) uio iptable_filter mst_pciconf(O) target_core_iblock target_core_pscsi iscsi_target_mod target_core_mod ib_iser rdma_cm iw_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse mlx5_ib ib_umad ib_ipoib ib_cm mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel aesni_intel crypto_simd cryptd rapl intel_cstate tg3 mlx5_core libphy ixgbe(O) mlxfw ptp pci_hyperv_intf pps_core button nls_iso8859_1 nls_cp437 ipmi_watchdog sg mpt3sas(O) raid_class scsi_transport_sas ipmi_si ipmi_devintf ipmi_msghandler vfat fat aufs scsi_transport_fc [last unloaded: mst_pci]
[ 216.662020] CPU: 33 PID: 29757 Comm: a2dissite Kdump: loaded Tainted: P W O 5.15.167 #6 05d8f8af315c999c1d68a362530c749cbb22bee6
[ 216.662027] Hardware name: Dell Inc. PowerEdge R720xd/0HJK12, BIOS 2.9.0 12/06/2019
[ 216.662030] RIP: 0010:__destroy_inode+0x11b/0x180
[ 216.662039] Code: ff f0 0f c1 07 83 f8 01 74 27 85 c0 7f b3 be 03 00 00 00 e8 67 b7 34 00 eb a7 0f 0b 0f 0b e9 1b ff ff ff 0f 0b e9 42 ff ff ff <0f> 0b e9 79 ff ff ff be 08 00 00 00 48 83 c7 08 e8 f0 d3 e2 ff eb
[ 216.662044] RSP: 0018:ffff8882d2627e90 EFLAGS: 00010246
[ 216.662048] RAX: ffff888106315000 RBX: ffff88905669f218 RCX: 0000000000000001
[ 216.662052] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88905669f218
[ 216.662055] RBP: ffffffffa00410e0 R08: 0000000000000003 R09: 0000000000000064
[ 216.662059] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 216.662062] R13: ffff88905669f218 R14: 00000000ffffff9c R15: ffff8882d38d4000
[ 216.662066] FS: 00007f2454d05700(0000) GS:ffff889fffc00000(0000) knlGS:0000000000000000
[ 216.662070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 216.662074] CR2: 000000002e4d7830 CR3: 00000002bada0006 CR4: 00000000001706e0
[ 216.662079] Call Trace:
[ 216.662083] <TASK>
[ 216.662086] ? __warn+0x81/0x100
[ 216.662093] ? __destroy_inode+0x11b/0x180
[ 216.662100] ? report_bug+0x99/0xc0
[ 216.662106] ? handle_bug+0x34/0x80
[ 216.662110] ? exc_invalid_op+0x13/0x60
[ 216.662115] ? asm_exc_invalid_op+0x16/0x20
[ 216.662124] ? __destroy_inode+0x11b/0x180
[ 216.662130] destroy_inode+0x2a/0x70
[ 216.662138] do_unlinkat+0x147/0x2f0
[ 216.662143] __x64_sys_unlink+0x3e/0x70
[ 216.662148] do_syscall_64+0x3c/0x90
[ 216.662153] entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[ 216.662160] RIP: 0033:0x7f2453d492f7
[ 216.662164] Code: 48 3d 00 f0 ff ff 77 03 48 98 c3 48 8b 15 89 7b 2c 00 f7 d8 64 89 02 48 83 c8 ff eb eb 66 0f 1f 44 00 00 b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 61 7b 2c 00 f7 d8 64 89 01 48
[ 216.662169] RSP: 002b:00007ffe1e940cb8 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
[ 216.662174] RAX: ffffffffffffffda RBX: 000000002e4d3010 RCX: 00007f2453d492f7
[ 216.662178] RDX: 000000002e4d31a8 RSI: 000000002e4d31a8 RDI: 000000002e500970
[ 216.662181] RBP: 000000002e4d8868 R08: 000000002e535df8 R09: 0000000000000041
[ 216.662184] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000001
[ 216.662187] R13: 000000002e4d8868 R14: 00007ffe1e940cf8 R15: 000000002e500970
[ 216.662192] </TASK>
[ 216.662194] ---[ end trace daea061fcac360ab ]---
[ 216.890856] scsi host20: iSCSI Initiator over iSER
[ 218.163400] aufs au_fsctx_fill_super:149:mount[29959]: aufs_dop_noreval [aufs]
And after this entry, operations on /tmp no longer trigger the warning until the next restart.
Artur Piechocki:
Unfortunately, with the proposed fix, the warning related to the link count of inode still persists (most operations on /tmp trigger this warning).
Hmm, on my side I cannot reproduce the warning. Currently I'm guessing O_TMPFILE is related. Off course, I may be totally wrong. Anyway would you try this test program?
J. R. Okajima
f() { local dir for dir in . ../rw /tmp ./e_dst do /tmp/artur-nlink "$dir" done }
int main(int argc, char *argv[]) { int err, fd, n; char a[PATH_MAX], fd_path[32];
/* argv[1] must be a dir */
fd = open(argv[1], O_TMPFILE | O_RDWR, S_IRUSR | S_IWUSR);
if (fd < 0) {
perror(argv[1]);
assert(0);
return -1;
}
/* leave this fd alone */
fd = open(argv[1], O_TMPFILE | O_RDWR, S_IRUSR | S_IWUSR);
if (fd < 0) {
perror(argv[1]);
assert(0);
return -1;
}
n = snprintf(a, sizeof(a), "%s/test-%s",
argv[1], program_invocation_short_name);
assert((unsigned long)n < sizeof(a));
n = snprintf(fd_path, sizeof(fd_path), "/proc/self/fd/%d", fd);
assert((unsigned long)n < sizeof(fd_path));
err = linkat(fd, fd_path, AT_FDCWD, a, AT_SYMLINK_FOLLOW);
if (err) {
perror(a);
assert(0);
return -1;
}
err = unlink(a);
if (err) {
perror(a);
assert(0);
return -1;
}
return 0;
@sfjro
The attached program does not lead to the issue. In the kernel logs, I can observe logs like the following:
[ 2751.736381] aufs au_remount_refresh:747:mount[47959]: reset to aufs_dop_noreval [aufs] and aufs_iop_nogetattr [aufs]
[ 2751.751551] aufs au_remount_refresh:747:mount[47966]: reset to aufs_dop [aufs] and aufs_iop [aufs]
[ 2757.112640] aufs au_remount_refresh:747:mount[48112]: reset to aufs_dop_noreval [aufs] and aufs_iop_nogetattr [aufs]
[ 2757.127864] aufs au_remount_refresh:747:mount[48119]: reset to aufs_dop [aufs] and aufs_iop [aufs]
and nothing more is logged.
I’m still wondering what at system startup is causing the warning... I’ll try to reproduce it with a newer kernel version.
Repeated also on kernel 6.6.59:
cat /sys/module/aufs/version
6.6.54-20241104
uname -r
6.6.59
[ 257.653972] ------------[ cut here ]------------
[ 257.653982] WARNING: CPU: 32 PID: 36757 at fs/inode.c:290 __destroy_inode+0x126/0x190
[ 257.653998] Modules linked in: ib_isert zfs(PO) spl(O) qat_api(O) intel_qat(O) uio target_core_file iptable_filter mst_pciconf(O) target_core_iblock target_core_pscsi iscsi_target_mod target_core_mod ib_iser rdma
_cm iw_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse mlx5_ib ib_umad ib_ipoib ib_cm mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel aesni_intel cr
ypto_simd cryptd rapl intel_cstate mlx5_core tg3 mlxfw ixgbe(O) pci_hyperv_intf libphy ptp pps_core button nls_iso8859_1 nls_cp437 ipmi_watchdog sg mpt3sas(O) raid_class scsi_transport_sas ipmi_si ipmi_devintf ipmi_
msghandler vfat fat aufs scsi_transport_fc [last unloaded: mst_pci(O)]
[ 257.654098] CPU: 32 PID: 36757 Comm: a2dissite Kdump: loaded Tainted: P W O 6.6.59 #1 1400000003000000474e5500b73c2e95d83aaad1
[ 257.654105] Hardware name: Dell Inc. PowerEdge R720xd/0HJK12, BIOS 2.9.0 12/06/2019
[ 257.654109] RIP: 0010:__destroy_inode+0x126/0x190
[ 257.654115] Code: ff f0 0f c1 07 83 f8 01 74 27 85 c0 7f b2 be 03 00 00 00 e8 8c c3 3c 00 eb a6 0f 0b 0f 0b e9 15 ff ff ff 0f 0b e9 41 ff ff ff <0f> 0b e9 78 ff ff ff 48 8d 47 08 48 89 fe 48 89 c7 e8 b4 86 dc ff
[ 257.654120] RSP: 0018:ffff8890da79fe88 EFLAGS: 00010246
[ 257.654125] RAX: ffff888110e47000 RBX: ffff888289bde518 RCX: 0000000000000000
[ 257.654129] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888289bde518
[ 257.654132] RBP: 0000000000000000 R08: 0000000000000064 R09: ffff888fffc1db98
[ 257.654136] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 257.654138] R13: ffff888289bde518 R14: 00000000ffffff9c R15: ffff88829a4f3140
[ 257.654142] FS: 00007f13ebb59700(0000) GS:ffff888fffc00000(0000) knlGS:0000000000000000
[ 257.654146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 257.654150] CR2: 0000000020b0b830 CR3: 00000002d40ae004 CR4: 00000000001706e0
[ 257.654154] Call Trace:
[ 257.654159] <TASK>
[ 257.654162] ? __destroy_inode+0x126/0x190
[ 257.654167] ? __warn+0x81/0x130
[ 257.654174] ? __destroy_inode+0x126/0x190
[ 257.654179] ? report_bug+0x171/0x1a0
[ 257.654187] ? handle_bug+0x41/0x70
[ 257.654195] ? exc_invalid_op+0x17/0x70
[ 257.654202] ? asm_exc_invalid_op+0x1a/0x20
[ 257.654211] ? __destroy_inode+0x126/0x190
[ 257.654216] ? __destroy_inode+0x8a/0x190
[ 257.654222] destroy_inode+0x24/0x70
[ 257.654228] do_unlinkat+0x14b/0x300
[ 257.654236] __x64_sys_unlink+0x42/0x70
[ 257.654242] do_syscall_64+0x38/0x80
[ 257.654248] entry_SYSCALL_64_after_hwframe+0x78/0xe2
[ 257.654255] RIP: 0033:0x7f13eab9d2f7
[ 257.654260] Code: 48 3d 00 f0 ff ff 77 03 48 98 c3 48 8b 15 89 7b 2c 00 f7 d8 64 89 02 48 83 c8 ff eb eb 66 0f 1f 44 00 00 b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 61 7b 2c 00 f7 d8 64 89 01 48
[ 257.654265] RSP: 002b:00007fffe7150f88 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
[ 257.654270] RAX: ffffffffffffffda RBX: 0000000020b07010 RCX: 00007f13eab9d2f7
[ 257.654274] RDX: 0000000020b071a8 RSI: 0000000020b071a8 RDI: 0000000020b34910
[ 257.654277] RBP: 0000000020b0c868 R08: 0000000020b69df8 R09: 0000000000000041
[ 257.654280] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000001
[ 257.654283] R13: 0000000020b0c868 R14: 00007fffe7150fc8 R15: 0000000020b34910
[ 257.654288] </TASK>
[ 257.654290] ---[ end trace 0000000000000000 ]---
I'll delve into identifying the exact point during system startup when the issue begins to occur.
Artur Piechocki:
The attached program does not lead to the issue. In the kernel logs, I ca= n observe logs like the following:
Thanks for the test. It means O_TMPFILE is not related to the issue. I have to think more about the cause.
Your log tells us that the command a2dissite triggered this problem. Can you try strace(1) it and identify which file is it trying to unlink? And how is the file created and handled?
J. R. Okajima
@sfjro
The executed command is irrelevant. Most commands that perform operations on /tmp/ trigger a warning. Below are the strace and call trace from the mv
command:
> /root/test_file
strace mv /root/test_file /tmp/
execve("/bin/mv", ["mv", "/root/test_file", "/tmp/"], [/* 16 vars */]) = 0
brk(0) = 0xb56d000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6f5f965000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=59083, ...}) = 0
mmap(NULL, 59083, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f6f5f956000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20c\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=142728, ...}) = 0
mmap(NULL, 2246896, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6f5f522000
mprotect(0x7f6f5f543000, 2097152, PROT_NONE) = 0
mmap(0x7f6f5f743000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x21000) = 0x7f6f5f743000
mmap(0x7f6f5f745000, 6384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f6f5f745000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libacl.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200\37\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=35288, ...}) = 0
mmap(NULL, 2130592, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6f5f319000
mprotect(0x7f6f5f321000, 2093056, PROT_NONE) = 0
mmap(0x7f6f5f520000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7000) = 0x7f6f5f520000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libattr.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\23\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=18640, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6f5f955000
mmap(NULL, 2113912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6f5f114000
mprotect(0x7f6f5f118000, 2093056, PROT_NONE) = 0
mmap(0x7f6f5f317000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f6f5f317000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1738176, ...}) = 0
mmap(NULL, 3844640, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6f5ed69000
mprotect(0x7f6f5ef0a000, 2097152, PROT_NONE) = 0
mmap(0x7f6f5f10a000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a1000) = 0x7f6f5f10a000
mmap(0x7f6f5f110000, 14880, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f6f5f110000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpcre.so.3", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\27\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=448440, ...}) = 0
mmap(NULL, 2543976, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6f5eafb000
mprotect(0x7f6f5eb67000, 2097152, PROT_NONE) = 0
mmap(0x7f6f5ed67000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6c000) = 0x7f6f5ed67000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\16\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=14664, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6f5f954000
mmap(NULL, 2109712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6f5e8f7000
mprotect(0x7f6f5e8fa000, 2093056, PROT_NONE) = 0
mmap(0x7f6f5eaf9000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f6f5eaf9000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320n\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=137384, ...}) = 0
mmap(NULL, 2213008, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6f5e6da000
mprotect(0x7f6f5e6f2000, 2093056, PROT_NONE) = 0
mmap(0x7f6f5e8f1000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f6f5e8f1000
mmap(0x7f6f5e8f3000, 13456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f6f5e8f3000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6f5f953000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6f5f951000
arch_prctl(ARCH_SET_FS, 0x7f6f5f951800) = 0
mprotect(0x7f6f5f10a000, 16384, PROT_READ) = 0
mprotect(0x7f6f5e8f1000, 4096, PROT_READ) = 0
mprotect(0x7f6f5eaf9000, 4096, PROT_READ) = 0
mprotect(0x7f6f5ed67000, 4096, PROT_READ) = 0
mprotect(0x7f6f5f317000, 4096, PROT_READ) = 0
mprotect(0x7f6f5f520000, 4096, PROT_READ) = 0
mprotect(0x7f6f5f743000, 4096, PROT_READ) = 0
mprotect(0x61e000, 4096, PROT_READ) = 0
mprotect(0x7f6f5f967000, 4096, PROT_READ) = 0
munmap(0x7f6f5f956000, 59083) = 0
set_tid_address(0x7f6f5f951ad0) = 22028
set_robust_list(0x7f6f5f951ae0, 24) = 0
rt_sigaction(SIGRTMIN, {0x7f6f5e6e09b0, [], SA_RESTORER|SA_SIGINFO, 0x7f6f5e6e9890}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7f6f5e6e0a40, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f6f5e6e9890}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
statfs("/sys/fs/selinux", 0x7fff03341b80) = -1 ENOENT (No such file or directory)
statfs("/selinux", {f_type=0x61756673, f_bsize=4096, f_blocks=1013432, f_bfree=985503, f_bavail=928991, f_files=262144, f_ffree=255190, f_fsid={0, 0}, f_namelen=242, f_frsize=4096}) = 0
brk(0) = 0xb56d000
brk(0xb58e000) = 0xb58e000
open("/proc/filesystems", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6f5f964000
read(3, "nodev\tsysfs\nnodev\ttmpfs\nnodev\tbd"..., 1024) = 423
read(3, "", 1024) = 0
close(3) = 0
munmap(0x7f6f5f964000, 4096) = 0
geteuid() = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
stat("/tmp/", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=940, ...}) = 0
lstat("/root/test_file", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
lstat("/tmp/test_file", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
geteuid() = 0
rename("/root/test_file", "/tmp/test_file") = -1 EXDEV (Invalid cross-device link)
unlink("/tmp/test_file") = 0
open("/root/test_file", O_RDONLY|O_NOFOLLOW) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
open("/tmp/test_file", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
fstat(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6f5f92f000
read(3, "", 131072) = 0
utimensat(4, NULL, {{1730879555, 937200161}, {1730879555, 937200161}}, 0) = 0
flistxattr(3, NULL, 0) = 0
flistxattr(3, "", 0) = 0
fgetxattr(3, "system.posix_acl_access", 0x7fff03341440, 132) = -1 EOPNOTSUPP (Operation not supported)
fsetxattr(4, "system.posix_acl_access", "\x02\x00\x00\x00\x01\x00\x06\x00\xff\xff\xff\xff\x04\x00\x04\x00\xff\xff\xff\xff \x00\x04\x00\xff\xff\xff\xff", 28, 0) = -1 EOPNOTSUPP (Operation not supported)
fchmod(4, 0100644) = 0
close(4) = 0
close(3) = 0
munmap(0x7f6f5f92f000, 139264) = 0
lstat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
newfstatat(AT_FDCWD, "/root/test_file", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
unlinkat(AT_FDCWD, "/root/test_file", 0) = 0
lseek(0, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
close(0) = 0
close(1) = 0
close(2) = 0
exit_group(0) = ?
+++ exited with 0 +++
[59843.481575] ------------[ cut here ]------------
[59843.481585] WARNING: CPU: 9 PID: 22028 at fs/inode.c:290 __destroy_inode+0x126/0x190
[59843.481605] Modules linked in: ib_isert zfs(PO) spl(O) qat_api(O) intel_qat(O) uio target_core_file iptable_filter mst_pciconf(O) target_core_iblock target_core_pscsi iscsi_target_mod target_core_mod ib_iser rdma_cm iw_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse mlx5_ib ib_umad ib_ipoib ib_cm mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel aesni_intel crypto_simd cryptd rapl intel_cstate mlx5_core tg3 mlxfw ixgbe(O) pci_hyperv_intf libphy ptp pps_core button nls_iso8859_1 nls_cp437 ipmi_watchdog sg mpt3sas(O) raid_class scsi_transport_sas ipmi_si ipmi_devintf ipmi_msghandler vfat fat aufs scsi_transport_fc [last unloaded: mst_pci(O)]
[59843.481709] CPU: 9 PID: 22028 Comm: mv Kdump: loaded Tainted: P W O 6.6.59 #1 1400000003000000474e5500b73c2e95d83aaad1
[59843.481718] Hardware name: Dell Inc. PowerEdge R720xd/0HJK12, BIOS 2.9.0 12/06/2019
[59843.481721] RIP: 0010:__destroy_inode+0x126/0x190
[59843.481729] Code: ff f0 0f c1 07 83 f8 01 74 27 85 c0 7f b2 be 03 00 00 00 e8 8c c3 3c 00 eb a6 0f 0b 0f 0b e9 15 ff ff ff 0f 0b e9 41 ff ff ff <0f> 0b e9 78 ff ff ff 48 8d 47 08 48 89 fe 48 89 c7 e8 b4 86 dc ff
[59843.481735] RSP: 0018:ffff8890ee98be88 EFLAGS: 00010246
[59843.481740] RAX: ffff888110e47000 RBX: ffff8890dbd37558 RCX: 0000000000000001
[59843.481744] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8890dbd37558
[59843.481748] RBP: 0000000000000000 R08: 0000000000000064 R09: ffff889fff91db98
[59843.481751] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[59843.481754] R13: ffff8890dbd37558 R14: 00000000ffffff9c R15: ffff88905024da40
[59843.481758] FS: 00007f6f5f951800(0000) GS:ffff889fff900000(0000) knlGS:0000000000000000
[59843.481763] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[59843.481767] CR2: 00007f6f5eea0450 CR3: 00000010e652e003 CR4: 00000000001706e0
[59843.481771] Call Trace:
[59843.481778] <TASK>
[59843.481781] ? __destroy_inode+0x126/0x190
[59843.481787] ? __warn+0x81/0x130
[59843.481798] ? __destroy_inode+0x126/0x190
[59843.481803] ? report_bug+0x171/0x1a0
[59843.481813] ? handle_bug+0x41/0x70
[59843.481824] ? exc_invalid_op+0x17/0x70
[59843.481831] ? asm_exc_invalid_op+0x1a/0x20
[59843.481842] ? __destroy_inode+0x126/0x190
[59843.481847] ? __destroy_inode+0x8a/0x190
[59843.481854] destroy_inode+0x24/0x70
[59843.481860] do_unlinkat+0x14b/0x300
[59843.481870] __x64_sys_unlinkat+0x37/0x70
[59843.481877] do_syscall_64+0x38/0x80
[59843.481884] entry_SYSCALL_64_after_hwframe+0x78/0xe2
[59843.481891] RIP: 0033:0x7f6f5ee4632d
[59843.481897] Code: 61 7b 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 63 d2 48 63 ff b8 07 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 02 f3 c3 48 8b 15 2a 7b 2c 00 f7 d8 64 89 02
[59843.481901] RSP: 002b:00007fff03341908 EFLAGS: 00000202 ORIG_RAX: 0000000000000107
[59843.481907] RAX: ffffffffffffffda RBX: 000000000b56ec20 RCX: 00007f6f5ee4632d
[59843.481911] RDX: 0000000000000000 RSI: 000000000b56d9f0 RDI: ffffffffffffff9c
[59843.481914] RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
[59843.481917] R10: 00007fff033416d0 R11: 0000000000000202 R12: 000000000b56d960
[59843.481920] R13: 00007fff03341a50 R14: 0000000000000000 R15: 0000000000000000
[59843.481926] </TASK>
[59843.481928] ---[ end trace 0000000000000000 ]---
@sfjro
FYI:
I am now 100% certain that changing udba=notify
introduced this inode warning issue. I checked on my old system (with kernel 4.19) where I had udba=none
, and there were hundreds of restarts without the issue occurring. As soon as I changed to udba=notify,
the problem started appearing.
The warning starts appearing after the switch_root from initrd to the system.
Artur Piechocki:
The executed command is irrelevant. Most commands that perform operations on /tmp/ trigger a warning. Below are the strace and call trace from the
mv
command:
Thanks for the test. I tried reproducing it using aufs6.6, but failed. I'm confused and am wondering some other environmental thing may be related. Could you try 'mv' on newly mounted aufs? The purpose is ignore the 'switch_root'. For example,
$ cd /tmp $ mkdir rw ro u outside $ sudo mount -t aufs -o rw=rw:ro=ro none u $ cd u $ > test_file $ mv test_file /tmp/outside
J. R. Okajima
The warning also appears with the scenario you proposed, but unfortunately not always.
[26128.622723] ------------[ cut here ]------------
[26128.622729] WARNING: CPU: 3 PID: 9529 at fs/inode.c:265 __destroy_inode+0x11b/0x180
[26128.622744] Modules linked in: ib_isert target_core_file iscsi_scst(O) scst_vdisk(O) scst(O) zfs(PO) qat_api(O) spl(O) intel_qat(O) uio mst_pciconf(O) iptable_filter bpfilter target_core_iblock target_core_pscsi iscsi_target_mod target_core_mod ib_iser rdma_cm iw_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse mlx5_ib ib_umad ib_ipoib ib_cm mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel aesni_intel crypto_simd cryptd rapl intel_cstate ixgbe(O) tg3 mlx5_core libphy mlxfw ptp pci_hyperv_intf pps_core button nls_iso8859_1 nls_cp437 ipmi_watchdog sg mpt3sas(O) nvme raid_class scsi_transport_sas nvme_core megaraid_sas(O) ipmi_si ipmi_devintf ipmi_msghandler vfat fat aufs scsi_transport_fc [last unloaded: mst_pci]
[26128.622825] CPU: 3 PID: 9529 Comm: mv Kdump: loaded Tainted: P W O 5.15.167 #24 ffafa14d5ffccdd81f5ea87ebf1c7c0a58d3af83
[26128.622831] Hardware name: Dell Inc. PowerEdge R720xd/0020HJ, BIOS 2.9.0 12/06/2019
[26128.622833] RIP: 0010:__destroy_inode+0x11b/0x180
[26128.622842] Code: ff f0 0f c1 07 83 f8 01 74 27 85 c0 7f b3 be 03 00 00 00 e8 67 b7 34 00 eb a7 0f 0b 0f 0b e9 1b ff ff ff 0f 0b e9 42 ff ff ff <0f> 0b e9 79 ff ff ff be 08 00 00 00 48 83 c7 08 e8 f0 d3 e2 ff eb
[26128.622846] RSP: 0018:ffff8890ba913db8 EFLAGS: 00010246
[26128.622851] RAX: ffff889044977000 RBX: ffff8890c25ee518 RCX: 0000000000000001
[26128.622854] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8890c25ee518
[26128.622857] RBP: ffffffffa00410e0 R08: 0000000000000000 R09: ffff8890c25ee518
[26128.622861] R10: ffff8890ba913e28 R11: 0000000000000000 R12: ffff889065e55018
[26128.622863] R13: ffff8890c261efc0 R14: 00000000ffffff9c R15: 0000000000000000
[26128.622867] FS: 00007fd15c595800(0000) GS:ffff889fff840000(0000) knlGS:0000000000000000
[26128.622871] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[26128.622874] CR2: 00007fd15ba7ea70 CR3: 00000010516fc002 CR4: 00000000001706e0
[26128.622878] Call Trace:
[26128.622882] <TASK>
[26128.622884] ? __warn+0x81/0x100
[26128.622891] ? __destroy_inode+0x11b/0x180
[26128.622897] ? report_bug+0x99/0xc0
[26128.622902] ? handle_bug+0x34/0x80
[26128.622907] ? exc_invalid_op+0x13/0x60
[26128.622911] ? asm_exc_invalid_op+0x16/0x20
[26128.622919] ? __destroy_inode+0x11b/0x180
[26128.622926] ? __destroy_inode+0x80/0x180
[26128.622932] destroy_inode+0x2a/0x70
[26128.622940] __dentry_kill+0xce/0x160
[26128.622946] dput+0x132/0x300
[26128.622952] do_renameat2+0x30d/0x590
[26128.622958] __x64_sys_rename+0x3f/0x50
[26128.622963] do_syscall_64+0x3c/0x90
[26128.622967] entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[26128.622974] RIP: 0033:0x7fd15ba14727
[26128.622978] Code: 75 12 48 89 df e8 49 5c 07 00 85 c0 0f 95 c0 0f b6 c0 f7 d8 5b c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 52 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 31 d7 33 00 f7 d8 64 89 01 48
[26128.622983] RSP: 002b:00007ffcd10bb028 EFLAGS: 00000206 ORIG_RAX: 0000000000000052
[26128.622987] RAX: ffffffffffffffda RBX: 0000000000008000 RCX: 00007fd15ba14727
[26128.622991] RDX: 0000000000000000 RSI: 00007ffcd10bd6ad RDI: 00007ffcd10bd670
[26128.622993] RBP: 00007ffcd10bb400 R08: 0000000000000001 R09: 0000000000000001
[26128.622996] R10: 00007ffcd10badf0 R11: 0000000000000206 R12: 0000000000000000
[26128.622999] R13: 00007ffcd10bb4f0 R14: 00007ffcd10bd670 R15: 0000000000000000
[26128.623004] </TASK>
[26128.623006] ---[ end trace 0c36c7c65f97049b ]---
Sometimes this error appears at system startup, and sometimes it doesn't. In such cases, even your scenario doesn't lead to the issue. Unfortunately, I still haven't figured out what exactly is causing this, similar to what seems to be happening in:
https://bugs.launchpad.net/ubuntu/+source/linux-meta/+bug/2048526
Maybe we could enable debugging in AUFS or the kernel itself to help us get closer to the source of the problem?
Artur Piechocki:
The warning also appears with the scenario you proposed, but unfortunately not always.
Thanks for the test. Please try this patch.
J. R. Okajima
diff --git a/fs/aufs/i_op_add.c b/fs/aufs/i_op_add.c index 3430838e0bc4..70357fda0cc3 100644 --- a/fs/aufs/i_op_add.c +++ b/fs/aufs/i_op_add.c @@ -493,8 +493,8 @@ int aufs_tmpfile(struct mnt_idmap idmap, struct inode dir, goto out_h_file; }
inode->__i_nlink = 2; unlock_new_inode(inode);
root = d_make_root(inode); diff --git a/fs/inode.c b/fs/inode.c index 59fba40a5b85..5fea8207f932 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -285,6 +285,8 @@ void __destroy_inode(struct inode *inode) fsnotify_inode_delete(inode); locks_free_lock_context(inode); if (!inode->i_nlink) { +if (atomic_long_read(&inode->i_sb->s_remove_count) == 0) +pr_info("%s\n", inode->i_sb->s_type->name); WARN_ON(atomic_long_read(&inode->i_sb->s_remove_count) == 0); atomic_long_dec(&inode->i_sb->s_remove_count); }
@sfjro
Thank you Logs with prepared debug patch:
[ 509.212268] aufs
[ 509.212286] ------------[ cut here ]------------
[ 509.212289] WARNING: CPU: 14 PID: 36839 at fs/inode.c:267 __destroy_inode+0x12b/0x190
[ 509.212306] Modules linked in: iscsi_scst(O) scst_vdisk(O) scst(O) ib_isert zfs(PO) qat_api(O) spl(O) intel_qat(O) uio target_core_file iptable_filter mst_pciconf(O) target_core_iblock target_core_pscsi iscsi_target_mod target_core_mod ib_iser rdma_cm iw_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse mlx5_ib ib_umad ib_ipoib ib_cm mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel aesni_intel crypto_simd cryptd rapl intel_cstate tg3 mlx5_core ixgbe(O) mlxfw pci_hyperv_intf libphy ptp pps_core button nls_iso8859_1 nls_cp437 ipmi_watchdog sg mpt3sas(O) raid_class scsi_transport_sas ipmi_si ipmi_devintf ipmi_msghandler vfat fat aufs scsi_transport_fc [last unloaded: mst_pci]
[ 509.212401] CPU: 14 PID: 36839 Comm: mv Kdump: loaded Tainted: P W O 5.15.167 #8 aa83f353699c284dd86b64ef76e5a4c2bcff91bf
[ 509.212409] Hardware name: Dell Inc. PowerEdge R720xd/0HJK12, BIOS 2.9.0 12/06/2019
[ 509.212412] RIP: 0010:__destroy_inode+0x12b/0x190
[ 509.212421] Code: ff f0 0f c1 07 83 f8 01 74 27 85 c0 7f b3 be 03 00 00 00 e8 57 b7 34 00 eb a7 0f 0b 0f 0b e9 0b ff ff ff 0f 0b e9 32 ff ff ff <0f> 0b e9 79 ff ff ff be 08 00 00 00 48 83 c7 08 e8 e0 d3 e2 ff eb
[ 509.212426] RSP: 0018:ffff8881b7f33e90 EFLAGS: 00010246
[ 509.212431] RAX: ffff889045344000 RBX: ffff88815a223798 RCX: 0000000000000000
[ 509.212434] RDX: 0000000000000000 RSI: ffff888fff9dc430 RDI: ffff888fff9dc430
[ 509.212437] RBP: ffffffffa00410e0 R08: 000000000001758e R09: 0000000100004ab6
[ 509.212440] R10: 0000000100004ab6 R11: ffffffff82c57f00 R12: 0000000000000000
[ 509.212443] R13: ffff88815a223798 R14: 00000000ffffff9c R15: ffff8881a68c23c0
[ 509.212447] FS: 00007f0b47d0d800(0000) GS:ffff888fff9c0000(0000) knlGS:0000000000000000
[ 509.212451] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 509.212453] CR2: 00007f0b471b7020 CR3: 00000001bb7d8001 CR4: 00000000001706e0
[ 509.212458] Call Trace:
[ 509.212462] <TASK>
[ 509.212465] ? __warn+0x81/0x100
[ 509.212473] ? __destroy_inode+0x12b/0x190
[ 509.212480] ? report_bug+0x99/0xc0
[ 509.212487] ? handle_bug+0x34/0x80
[ 509.212492] ? exc_invalid_op+0x13/0x60
[ 509.212496] ? asm_exc_invalid_op+0x16/0x20
[ 509.212507] ? __destroy_inode+0x12b/0x190
[ 509.212514] destroy_inode+0x2a/0x70
[ 509.212521] do_unlinkat+0x147/0x2f0
[ 509.212527] __x64_sys_unlinkat+0x33/0x60
[ 509.212531] do_syscall_64+0x3c/0x90
[ 509.212535] entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[ 509.212542] RIP: 0033:0x7f0b4720232d
[ 509.212547] Code: 61 7b 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 63 d2 48 63 ff b8 07 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 02 f3 c3 48 8b 15 2a 7b 2c 00 f7 d8 64 89 02
[ 509.212551] RSP: 002b:00007ffeb458f828 EFLAGS: 00000206 ORIG_RAX: 0000000000000107
[ 509.212556] RAX: ffffffffffffffda RBX: 0000000002071c20 RCX: 00007f0b4720232d
[ 509.212559] RDX: 0000000000000000 RSI: 00000000020709f0 RDI: ffffffffffffff9c
[ 509.212562] RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
[ 509.212564] R10: 00007ffeb458f5f0 R11: 0000000000000206 R12: 0000000002070960
[ 509.212567] R13: 00007ffeb458f970 R14: 0000000000000000 R15: 0000000000000000
[ 509.212571] </TASK>
[ 509.212573] ---[ end trace b3848925034dea49 ]---
[ 511.801001] aufs
[ 511.801015] ------------[ cut here ]------------
[ 511.801017] WARNING: CPU: 34 PID: 36840 at fs/inode.c:267 __destroy_inode+0x12b/0x190
[ 511.801031] Modules linked in: iscsi_scst(O) scst_vdisk(O) scst(O) ib_isert zfs(PO) qat_api(O) spl(O) intel_qat(O) uio target_core_file iptable_filter mst_pciconf(O) target_core_iblock target_core_pscsi iscsi_target_mod target_core_mod ib_iser rdma_cm iw_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse mlx5_ib ib_umad ib_ipoib ib_cm mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel aesni_intel crypto_simd cryptd rapl intel_cstate tg3 mlx5_core ixgbe(O) mlxfw pci_hyperv_intf libphy ptp pps_core button nls_iso8859_1 nls_cp437 ipmi_watchdog sg mpt3sas(O) raid_class scsi_transport_sas ipmi_si ipmi_devintf ipmi_msghandler vfat fat aufs scsi_transport_fc [last unloaded: mst_pci]
[ 511.801109] CPU: 34 PID: 36840 Comm: mv Kdump: loaded Tainted: P W O 5.15.167 #8 aa83f353699c284dd86b64ef76e5a4c2bcff91bf
[ 511.801116] Hardware name: Dell Inc. PowerEdge R720xd/0HJK12, BIOS 2.9.0 12/06/2019
[ 511.801118] RIP: 0010:__destroy_inode+0x12b/0x190
[ 511.801127] Code: ff f0 0f c1 07 83 f8 01 74 27 85 c0 7f b3 be 03 00 00 00 e8 57 b7 34 00 eb a7 0f 0b 0f 0b e9 0b ff ff ff 0f 0b e9 32 ff ff ff <0f> 0b e9 79 ff ff ff be 08 00 00 00 48 83 c7 08 e8 e0 d3 e2 ff eb
[ 511.801131] RSP: 0018:ffff88817eec7e90 EFLAGS: 00010246
[ 511.801135] RAX: ffff889045344000 RBX: ffff88815a223ad8 RCX: 0000000000000000
[ 511.801138] RDX: 0000000000000000 RSI: ffff888fffc5c430 RDI: ffff888fffc5c430
[ 511.801141] RBP: ffffffffa00410e0 R08: 000000000001765b R09: 0000000100004adf
[ 511.801144] R10: 0000000100004adf R11: ffffffff82c57f00 R12: 0000000000000000
[ 511.801146] R13: ffff88815a223ad8 R14: 00000000ffffff9c R15: ffff8881a68c3d40
[ 511.801150] FS: 00007fd82d27b800(0000) GS:ffff888fffc40000(0000) knlGS:0000000000000000
[ 511.801154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 511.801157] CR2: 000000001646caf8 CR3: 00000001bbc08006 CR4: 00000000001706e0
[ 511.801161] Call Trace:
[ 511.801164] <TASK>
[ 511.801167] ? __warn+0x81/0x100
[ 511.801173] ? __destroy_inode+0x12b/0x190
[ 511.801181] ? report_bug+0x99/0xc0
[ 511.801185] ? handle_bug+0x34/0x80
[ 511.801190] ? exc_invalid_op+0x13/0x60
[ 511.801194] ? asm_exc_invalid_op+0x16/0x20
[ 511.801202] ? __destroy_inode+0x12b/0x190
[ 511.801208] destroy_inode+0x2a/0x70
[ 511.801216] do_unlinkat+0x147/0x2f0
[ 511.801221] __x64_sys_unlinkat+0x33/0x60
[ 511.801225] do_syscall_64+0x3c/0x90
[ 511.801229] entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[ 511.801235] RIP: 0033:0x7fd82c77032d
[ 511.801239] Code: 61 7b 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 63 d2 48 63 ff b8 07 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 02 f3 c3 48 8b 15 2a 7b 2c 00 f7 d8 64 89 02
[ 511.801243] RSP: 002b:00007ffd20e9e4e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000107
[ 511.801248] RAX: ffffffffffffffda RBX: 000000001646cc20 RCX: 00007fd82c77032d
[ 511.801251] RDX: 0000000000000000 RSI: 000000001646b9f0 RDI: ffffffffffffff9c
[ 511.801253] RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
[ 511.801256] R10: 00007ffd20e9e2b0 R11: 0000000000000206 R12: 000000001646b960
[ 511.801259] R13: 00007ffd20e9e630 R14: 0000000000000000 R15: 0000000000000000
[ 511.801263] </TASK>
[ 511.801265] ---[ end trace b3848925034dea4a ]---
Artur Piechocki:
Logs with prepared debug patch:
Thanks for the test. Still I'm trying. Would you test this additional patch? Keep the previous patch applied please.
J. R. Okajima
diff --git a/fs/aufs/cpup.c b/fs/aufs/cpup.c index 19a7cfa5213f..1019950ce8ce 100644 --- a/fs/aufs/cpup.c +++ b/fs/aufs/cpup.c @@ -100,7 +100,7 @@ void au_cpup_attr_all(struct inode *inode, int force)
h_inode = au_h_iptr(inode, au_ibtop(inode));
au_cpup_attr_changeable(inode);
I have encountered an issue with the AUFS filesystem that occurs sporadically during system startup. The error message I receive is:
error while loading shared libraries: libc.so.6: cannot open shared object file: Input/output error
Upon further investigation, I found that this issue also occurs when I run a remount loop:
After some time, the following errors appear:
From that point on, it becomes impossible to run any command in the system as all attempts result in an "Input/output error."
The issue occurs on the latest kernel (6.10.6) and AUFS version (6.10-20240722). After conducting some tests, I found that this problem started appearing with kernel 5.10 and has not occurred on kernel 5.9 (aufs 5.9-20210906). Something must have changed starting with kernel 5.10 that causes this issue to appear.
The issue only arises when
udba=none
is set while mounting AUFS. Whenudba=notify
is used, the problem does not occur.I understand that running a remount loop is not typical in normal environments, but the fact that it leads to such critical issues suggests a potential bug in AUFS.
@sfjro , would you be able to run a remount loop on your setup with AUFS using
udba=none
and check if the issue occurs for you as well?