Closed vvidic closed 4 years ago
Hi, could you please try the following patch: https://oss.oracle.com/pipermail/ocfs2-devel/2019-November/014373.html
Alright, so I have just tested with:
commit 5d70d1a82dcb (HEAD -> master, tag: Ubuntu-5.4.0-9.12, origin/master, origin/HEAD) Author: Seth Forshee seth.forshee@canonical.com Date: Mon Dec 16 20:54:20 2019
UBUNTU: Ubuntu-5.4.0-9.12
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Which contains the commits you mentioned:
commit 94b07b6f9e2e Author: Joseph Qi joseph.qi@linux.alibaba.com Date: Fri Nov 22 01:53:52 2019
Revert "fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()"
commit 56e94ea132bb Author: Jia-Ju Bai baijiaju1990@gmail.com Date: Mon Oct 7 00:57:50 2019
fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()
After executing the reproducer... you just need to wait and kernel bug will happen:
[ 7.320804] ocfs2: Registered cluster interface o2cb
[ 7.345918] OCFS2 User DLM kernel interface loaded
[ 7.363120] o2hb: Heartbeat mode set to local
[ 91.876740] ocfs2: Unregistered cluster interface o2cb
[ 92.040182] ocfs2: Registered cluster interface o2cb
[ 92.050382] OCFS2 User DLM kernel interface loaded
[ 92.057580] o2hb: Heartbeat mode set to local
[ 100.101844] random: crng init done
[ 100.101849] random: 7 urandom warning(s) missed due to ratelimiting
[ 100.929650] o2dlm: Joining domain D99E3898F9364C40803F59DCC47A76B8
[ 100.929651] (
[ 100.929653] 0
[ 100.929654] ) 1 nodes
[ 100.930279] JBD2: Ignoring recovery information on journal
[ 100.932340] ocfs2: Mounting device (7,0) on (node 0, slot 0) with ordered data mode.
[ 105.036050] o2dlm: Leaving domain D99E3898F9364C40803F59DCC47A76B8
[ 106.956472] ocfs2: Unmounting device (7,0) on (node 0)
[ 126.664934] BUG: unable to handle page fault for address: ffff9be08882fcf6
[ 126.666177] #PF: supervisor read access in kernel mode
[ 126.667042] #PF: error_code(0x0000) - not-present page
[ 126.667920] PGD 9e201067 P4D 9e201067 PUD 0
[ 126.668655] Oops: 0000 [#1] SMP NOPTI
[ 126.669308] CPU: 3 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.4.3 #1
[ 126.670431] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 126.671793] RIP: 0010:__kmalloc_track_caller+0xa5/0x270
[ 126.672672] Code: 65 49 8b 50 08 65 4c 03 05 d0 b2 97 6a 4d 8b 38 4d 85 ff 0f 84 8b 01 00 00 41 8b 59 20 49 8b 39 48 8d 4a 01 4c 89 f8>
[ 126.675555] RSP: 0018:ffffbc84c0013b90 EFLAGS: 00010286
[ 126.676453] RAX: ffff9be08882fcf6 RBX: ffff9be08882fcf6 RCX: 0000000000000d25
[ 126.677627] RDX: 0000000000000d24 RSI: 0000000000000cc0 RDI: 000000000002f080
[ 126.678787] RBP: ffffbc84c0013bc8 R08: ffff9be0fbbaf080 R09: ffff9be0fb003880
[ 126.679962] R10: ffffbc84c0013d56 R11: 0000000000000000 R12: 0000000000000cc0
[ 126.681206] R13: 0000000000000016 R14: ffff9be0fb003880 R15: ffff9be08882fcf6
[ 126.682498] FS: 00007f00a3535980(0000) GS:ffff9be0fbb80000(0000) knlGS:0000000000000000
[ 126.683951] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 126.685041] CR2: ffff9be08882fcf6 CR3: 0000000139ebc000 CR4: 00000000000406e0
[ 126.686340] Call Trace:
[ 126.686849] ? kstrdup_const+0x28/0x30
[ 126.687576] kstrdup+0x36/0x70
[ 126.688153] kstrdup_const+0x28/0x30
[ 126.688790] __kernfs_new_node+0x4c/0x1b0
[ 126.689492] ? mutex_lock+0x17/0x40
[ 126.690119] ? __kernfs_iattrs.isra.0+0x39/0xc0
[ 126.690896] ? kernfs_xattr_get+0x27/0x50
[ 126.691595] kernfs_new_node+0x3b/0x60
[ 126.692280] __kernfs_create_file+0x2d/0xc0
[ 126.693019] cgroup_addrm_files+0x13e/0x330
[ 126.693754] ? __kernfs_new_node+0x159/0x1b0
[ 126.694494] ? __radix_tree_replace+0x71/0x120
[ 126.695264] ? kernfs_link_sibling+0x9e/0xf0
[ 126.696045] css_populate_dir+0x11a/0x140
[ 126.696750] cgroup_mkdir+0x35a/0x520
[ 126.697409] kernfs_iop_mkdir+0x5f/0x90
[ 126.698086] vfs_mkdir+0x10d/0x1c0
[ 126.698699] do_mkdirat+0xed/0x120
[ 126.699310] __x64_sys_mkdir+0x1f/0x30
[ 126.699991] do_syscall_64+0x55/0x180
[ 126.700639] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 126.701500] RIP: 0033:0x7f00a3880f0b
[ 126.702296] Code: ff ff c3 0f 1f 40 00 48 8b 05 81 8f 0d 00 64 c7 00 5f 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa b8 53 00 00>
[ 126.705950] RSP: 002b:00007ffe29861958 EFLAGS: 00000206 ORIG_RAX: 0000000000000053
[ 126.707490] RAX: ffffffffffffffda RBX: 000055d4d4ac6a10 RCX: 00007f00a3880f0b
[ 126.708810] RDX: 00007ffe29861820 RSI: 00000000000001ed RDI: 000055d4d4b1d0c0
[ 126.710131] RBP: 00007f00a3704199 R08: 0000000000000001 R09: 2f73662f7379732f
[ 126.711429] R10: 702f70756f726763 R11: 0000000000000206 R12: 0000000000000000
[ 126.712758] R13: 000055d4d4ac6a10 R14: 0000000000000073 R15: 0000000000000007
[ 126.714106] Modules linked in: ocfs2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue nls_iso8859_1>
Note: I have commented lots of things from the original test, its a really basic test with mkfs and mount only.
$ cat ocfs2_reproducer.sh
#!/bin/sh
set -e
DISK=/disk
cleanup () {
umount /mnt || true
[ "$LOOP" ] && losetup -d $LOOP
rm -f $DISK $DISK.image || true
# service o2cb stop
sed -i -e 's/^O2CB_ENABLED=.*/O2CB_ENABLED=false/' /etc/default/o2cb
echo cleanup
}
trap "cleanup" 0 2 3 15
# configure cluster
HOSTNAME=$(hostname)
cat >/etc/ocfs2/cluster.conf <<EOF
cluster:
node_count = 1
name = ocfs2
node:
ip_port = 7777
ip_address = 127.0.0.1
number = 0
name = $HOSTNAME
cluster = ocfs2
EOF
# start cluster
sed -i -e 's/^O2CB_ENABLED=.*/O2CB_ENABLED=true/' /etc/default/o2cb
service o2cb restart
# check cluster
echo "=== dlmfs ==="
grep '^ocfs2_dlmfs /dlm' /proc/mounts
echo "=== lsmod ==="
lsmod | grep ocfs2_stack_o2cb
echo "=== o2hbmonitor ==="
pgrep -a o2hbmonitor
# print info
echo "=== o2cluster ==="
o2cluster -r
echo "=== o2cb_ctl ==="
o2cb_ctl -I -n $HOSTNAME
# create test disk
echo "=== losetup ==="
dd if=/dev/zero of=$DISK bs=1M count=200 2>&1
LOOP=$(losetup --find --show $DISK)
# test tools
echo "=== mkfs ==="
mkfs.ocfs2 --cluster-stack=o2cb --cluster-name=ocfs2 $LOOP 2>&1
# echo "=== o2image ==="
# o2image $LOOP $DISK.image
# ls -l $DISK.image
# echo "=== fsck ==="
# fsck.ocfs2 -f -y $LOOP 2>&1
# echo "=== o2cluster ==="
# o2cluster -o $LOOP
# echo "=== tunefs ==="
# tunefs.ocfs2 -L $DISK -N 3 -Q 'Label = %V\nNumSlots = %N\n' $LOOP
# echo "=== debugfs ==="
# debugfs.ocfs2 -R stats $LOOP
# echo "=== o2info ==="
# o2info --volinfo $LOOP
# echo "=== grow ==="
# dd if=/dev/zero of=$DISK bs=1M count=50 seek=200 2>&1
# losetup --set-capacity $LOOP
# tunefs.ocfs2 -S $LOOP
echo "=== mount ==="
mount $LOOP /mnt
df /mnt
# echo "=== mounted ==="
# mounted.ocfs2 -d
# mounted.ocfs2 -f
# echo "=== defragfs ==="
# cp -a /bin /mnt
# defragfs.ocfs2 -v /mnt
Last good test:
Kernel: 5.2.0-15-generic
First failure:
Kernel: 5.3.0-10-generic
Let me know if you want me to try to bisect it.. not sure if its straightforward as 5.2 and 5.3 trees aren't the same for our kernel (I might try with vanilla upstream if needed).
Could you please check if your kernel has the following commit? e581595ea29c ocfs: no need to check return value of debugfs_create functions If so, it has been fixed by: b73eba2a867e ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less Please have a try.
I confirm this solves the problem for me on Debian side. Tested with v5.5-rc3-519-g6327edceb62b
Well, I'm using kernel 5.4.0-9 (which contains that commit) and still got:
[ 5160.614832] BUG: unable to handle page fault for address: ffff97e78969bad6
[ 5160.616217] #PF: supervisor read access in kernel mode
[ 5160.617268] #PF: error_code(0x0000) - not-present page
[ 5160.618289] PGD b4401067 P4D b4401067 PUD 0
[ 5160.619169] Oops: 0000 [#1] SMP NOPTI
[ 5160.619931] CPU: 1 PID: 1416 Comm: modprobe Kdump: loaded Not tainted 5.4.0-9-generic #12-Ubuntu
[ 5160.621632] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 5160.623228] RIP: 0010:__kmalloc_track_caller+0xa1/0x270
[ 5160.624264] Code: 65 49 8b 50 08 65 4c 03 05 e4 15 57 4e 4d 8b 38 4d 85 ff 0f 84 8e 01 00 00 41 8b 59 20 49 8b 39 48 8d 4a 01 4c 89 f8
4c 01 fb <48> 33 1b 49 33 99 70 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74 bd
[ 5160.627747] RSP: 0018:ffffb71d4048ba88 EFLAGS: 00010282
[ 5160.628792] RAX: ffff97e78969bad6 RBX: ffff97e78969bad6 RCX: 00000000000043be
[ 5160.630183] RDX: 00000000000043bd RSI: 0000000000000cc0 RDI: 000000000002f080
[ 5160.631564] RBP: ffffb71d4048bac0 R08: ffff97e7fbaaf080 R09: ffff97e7fb003880
[ 5160.632956] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000cc0
[ 5160.634337] R13: 0000000000000013 R14: ffff97e7fb003880 R15: ffff97e78969bad6
[ 5160.635718] FS: 00007f6c97fea500(0000) GS:ffff97e7fba80000(0000) knlGS:0000000000000000
[ 5160.637287] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5160.638425] CR2: ffff97e78969bad6 CR3: 000000013254a000 CR4: 00000000000406e0
[ 5160.639849] Call Trace:
[ 5160.640414] ? kstrdup_const+0x24/0x30
[ 5160.641223] kstrdup+0x32/0x60
[ 5160.641892] kstrdup_const+0x24/0x30
[ 5160.642660] __kernfs_new_node+0x4c/0x1b0
[ 5160.643512] ? __kernfs_new_node+0x159/0x1b0
[ 5160.644397] kernfs_new_node+0x37/0x60
[ 5160.645197] __kernfs_create_file+0x29/0xc0
[ 5160.646076] sysfs_add_file_mode_ns+0x9f/0x180
[ 5160.647005] internal_create_group+0x115/0x370
[ 5160.647927] sysfs_create_group+0x13/0x20
[ 5160.648781] mod_sysfs_setup+0x3f5/0x620
[ 5160.649630] load_module+0xfcc/0x1210
[ 5160.650410] __do_sys_finit_module+0xbe/0x120
[ 5160.651324] ? __do_sys_finit_module+0xbe/0x120
[ 5160.652270] __x64_sys_finit_module+0x1a/0x20
[ 5160.653193] do_syscall_64+0x57/0x190
[ 5160.653972] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 5160.655018] RIP: 0033:0x7f6c97bd395d
[ 5160.655780] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 03 e5 0c 00 f7 d8 64 89 01 48
[ 5160.659365] RSP: 002b:00007ffdf2ee1878 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 5160.660874] RAX: ffffffffffffffda RBX: 000055e5c686cb50 RCX: 00007f6c97bd395d
[ 5160.662299] RDX: 0000000000000000 RSI: 000055e5c4d9a348 RDI: 0000000000000005
[ 5160.663719] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000
[ 5160.665144] R10: 0000000000000005 R11: 0000000000000246 R12: 000055e5c4d9a348
[ 5160.666564] R13: 0000000000000000 R14: 000055e5c686ccd0 R15: 000055e5c686cb50
I'll check with latest upstream, like vvidic did, and check. If it is good then I'll go ahead and bisect as I seem to have a good reproducer.
On Mon, Jan 13, 2020 at 09:04:48AM -0800, Rafael David Tinoco wrote:
Well, I'm using kernel 5.4.0-9 (which contains that commit) and still got:
AFAICT, the fix is in a newer version than 5.4:
$ git describe e581595ea29c v5.2-5650-ge581595ea29c
$ git describe b73eba2a867e v5.5-rc3-224-gb73eba2a867e
-- Valentin
Oh, thanks vvidic! I gave a really fast look and saw e581595ea29c only, thinking it was the referred fix. Now it all makes more sense. I'll ask for a kernel stable fix on our side.
Thanks a lot!
On Mon, Jan 13, 2020 at 11:00:13AM -0800, Rafael David Tinoco wrote:
Oh, thanks vvidic! I gave a really fast look and saw e581595ea29c only, thinking it was the referred fix. Now it all makes more sense. I'll ask for a kernel stable fix on our side.
I think 5.4 is LTS on the kernel side, please include it there too.
-- Valentin
Will do. For future reference if anyone finds this:
https://bugs.launchpad.net/ubuntu/+source/ocfs2-tools/+bug/1852122
While testing ocfs2-tools I noticed that kernel 5.3.9 enters a crash loop after some ocfs2 errors so I wonder if this is a know bug somewhere?