sfjro / aufs-standalone

29 stars 13 forks source link

Deadlock while remounting nested bound mountpoint #40

Open arko-pl opened 5 months ago

arko-pl commented 5 months ago

Hello,

I noticed a deadlock while trying to remount nested and bound mountpoint. It is reproducible with different underlying filesystems. Steps to reproduce (assuming /union is a simple aufs r/w mount point):

mkdir /union/mnt/new mount -o bind /union /union/mnt/new mount -o remount /union/mnt/new after last command console is locked.

At the time of writing last kernel that works is 5.4. Issue is observed also on 5.15 LTS kernel and 6.1 LTS kernel.

There is dump of tasks from the 6.1.92 kernel:

<4>[  335.001920][T16359] ------------[ cut here ]------------
<4>[  335.002811][T16359] DEBUG_RWSEMS_WARN_ON((rwsem_owner(sem) != current) && !rwsem_test_oflags(sem, RWSEM_NONSPINNABLE)): count = 0x0, magic = 0xffff88810fde8050, owner = 0x1, curr 0xffff88800dba8000, list empty
<4>[  335.005549][T16359] WARNING: CPU: 3 PID: 16359 at kernel/locking/rwsem.c:1371 up_write+0x124/0x190
<4>[  335.007259][T16359] Modules linked in: sch_htb iscsi_scst(O) scst_vdisk(O) scst(O) dlm iptable_filter bpfilter target_core_iblock target_core_pscsi iscsi_target_mod target_core_mod bonding ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse e1000 button nls_cp437 nls_iso8859_1 virtio_scsi virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev sg virtio virtio_ring pata_acpi ata_generic vfat fat aufs scsi_transport_fc
<4>[  335.013543][T16359] CPU: 3 PID: 16359 Comm: mount Kdump: loaded Tainted: G           O       6.1.92 #16 87b4303e5ae2d4a7f6c994aa88151842d80c2918
<4>[  335.015889][T16359] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
<4>[  335.017275][T16359] RIP: 0010:up_write+0x124/0x190
<4>[  335.018216][T16359] Code: f8 38 99 82 c6 05 18 38 0c 02 01 48 39 c2 48 c7 c2 89 38 99 82 48 c7 c0 29 12 9a 82 48 0f 45 c2 48 8b 55 00 50 e8 8c 0b f9 ff <0f> 0b 59 eb 8f 48 8b 55 58 48 8d 45 58 4c 8b 45 08 48 c7 c6 8e 38
<4>[  335.021080][T16359] RSP: 0018:ffff888011857d80 EFLAGS: 00010282
<4>[  335.021968][T16359] RAX: 0000000000000000 RBX: ffff88810c958750 RCX: ffff8881549ac6c8
<4>[  335.023136][T16359] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff8881549ac6c0
<4>[  335.024292][T16359] RBP: ffff88810fde8050 R08: ffffffff82fe9868 R09: 0000000000027ffb
<4>[  335.025450][T16359] R10: 0000000000000293 R11: ffffffff82f29880 R12: ffff88810cf64800
<4>[  335.026654][T16359] R13: ffff88800dba8000 R14: 000000000000000b R15: ffff88810cf6b000
<4>[  335.028031][T16359] FS:  00007f2529cc7840(0000) GS:ffff888154980000(0000) knlGS:0000000000000000
<4>[  335.029338][T16359] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  335.030503][T16359] CR2: 00007fffe6aa9f80 CR3: 000000000eec0000 CR4: 00000000000006e0
<4>[  335.031703][T16359] Call Trace:
<4>[  335.032379][T16359]  <TASK>
<4>[  335.033195][T16359]  ? __warn+0x79/0xc0
<4>[  335.033902][T16359]  ? up_write+0x124/0x190
<4>[  335.034654][T16359]  ? report_bug+0xf9/0x130
<4>[  335.035439][T16359]  ? prb_read_valid+0x17/0x20
<4>[  335.036243][T16359]  ? handle_bug+0x42/0x70
<4>[  335.037085][T16359]  ? exc_invalid_op+0x14/0x70
<4>[  335.038096][T16359]  ? asm_exc_invalid_op+0x16/0x20
<4>[  335.038976][T16359]  ? up_write+0x124/0x190
<4>[  335.039735][T16359]  di_write_unlock+0x2b/0x40 [aufs 02e4ef5b7c7da454108981099dd5cc5175c944e7]
<4>[  335.041177][T16359]  au_remount_refresh+0xe4/0x500 [aufs 02e4ef5b7c7da454108981099dd5cc5175c944e7]
<4>[  335.042754][T16359]  au_fsctx_reconfigure+0x11e/0x130 [aufs 02e4ef5b7c7da454108981099dd5cc5175c944e7]
<4>[  335.044362][T16359]  reconfigure_super+0xcb/0x280
<4>[  335.045193][T16359]  path_mount+0x986/0xb70
<4>[  335.045936][T16359]  __x64_sys_mount+0x107/0x140
<4>[  335.046805][T16359]  do_syscall_64+0x3b/0x90
<4>[  335.047821][T16359]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<4>[  335.048966][T16359] RIP: 0033:0x7f2529390d8a
<4>[  335.049722][T16359] Code: 48 8b 0d 01 c1 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c0 2b 00 f7 d8 64 89 01 48
<4>[  335.052880][T16359] RSP: 002b:00007ffdca417ac8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
<4>[  335.054215][T16359] RAX: ffffffffffffffda RBX: 00000000017b9060 RCX: 00007f2529390d8a
<4>[  335.055709][T16359] RDX: 00000000017b92d0 RSI: 00000000017c6030 RDI: 00000000017c6010
<4>[  335.057014][T16359] RBP: 0000000000000000 R08: 00000000017c1a80 R09: 00007f25292f399a
<4>[  335.058483][T16359] R10: 0000000000200030 R11: 0000000000000202 R12: 00000000017c6010
<4>[  335.059807][T16359] R13: 00000000017b92d0 R14: 0000000000000000 R15: 0000000000000003
<4>[  335.061067][T16359]  </TASK>
<4>[  335.061616][T16359] irq event stamp: 0
<4>[  335.062445][T16359] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
<4>[  335.063590][T16359] hardirqs last disabled at (0): [<ffffffff8108a7cb>] copy_process+0x90b/0x1eb0
<4>[  335.065005][T16359] softirqs last  enabled at (0): [<ffffffff8108a7cb>] copy_process+0x90b/0x1eb0
<4>[  335.066414][T16359] softirqs last disabled at (0): [<0000000000000000>] 0x0
<4>[  335.067811][T16359] ---[ end trace 0000000000000000 ]---
<3>[  365.386687][   T36] INFO: task snmpd:21963 blocked for more than 30 seconds.
<3>[  365.391150][   T36]       Tainted: G        W  O       6.1.92 #16
<3>[  365.394968][   T36] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[  365.398658][   T36] task:snmpd           state:D stack:0     pid:21963 ppid:1      flags:0x00000000
<6>[  365.401369][   T36] Call Trace:
<6>[  365.401936][   T36]  <TASK>
<6>[  365.402466][   T36]  __schedule+0x2d7/0xad0
<6>[  365.403209][   T36]  schedule+0x57/0xa0
<6>[  365.403852][   T36]  schedule_preempt_disabled+0x11/0x20
<6>[  365.404700][   T36]  rwsem_down_read_slowpath+0x28d/0x580
<6>[  365.405555][   T36]  down_read+0x63/0x140
<6>[  365.406231][   T36]  si_read_lock+0x40/0x100 [aufs 02e4ef5b7c7da454108981099dd5cc5175c944e7]
<6>[  365.407586][   T36]  ? lock_acquire+0x138/0x2b0
<6>[  365.408623][   T36]  ? lock_acquire+0x138/0x2b0
<6>[  365.409407][   T36]  aufs_permission+0x44/0x2f0 [aufs 02e4ef5b7c7da454108981099dd5cc5175c944e7]
<6>[  365.410791][   T36]  ? lock_release+0x1ad/0x2d0
<6>[  365.411521][   T36]  inode_permission.part.0+0x113/0x1b0
<6>[  365.412370][   T36]  link_path_walk.part.0+0x2a9/0x3d0
<6>[  365.413209][   T36]  ? path_init+0x3b9/0x4b0
<6>[  365.413906][   T36]  path_openat+0xb2/0xaa0
<6>[  365.414601][   T36]  ? _raw_spin_unlock+0x1f/0x30
<6>[  365.415351][   T36]  do_filp_open+0xa9/0x150
<6>[  365.416042][   T36]  ? _raw_spin_unlock+0x1f/0x30
<6>[  365.416819][   T36]  ? alloc_fd+0x12d/0x200
<6>[  365.417800][   T36]  do_sys_openat2+0x9b/0x160
<6>[  365.418522][   T36]  __x64_sys_open+0x55/0xa0
<6>[  365.419243][   T36]  do_syscall_64+0x3b/0x90
<6>[  365.419923][   T36]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6>[  365.420805][   T36] RIP: 0033:0x7f1aa3fb3960
<6>[  365.421484][   T36] RSP: 002b:00007ffef1f9bac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
<6>[  365.422730][   T36] RAX: ffffffffffffffda RBX: 0000000001f91060 RCX: 00007f1aa3fb3960
<6>[  365.423890][   T36] RDX: 00000000000001b6 RSI: 0000000000000000 RDI: 00007f1aa4968a52
<6>[  365.425057][   T36] RBP: 00007ffef1f9bb50 R08: 0000000000000008 R09: 0000000000000001
<6>[  365.426225][   T36] R10: 00007f1aa4970c93 R11: 0000000000000246 R12: 0000000000000008
<6>[  365.427446][   T36] R13: 0000000000000001 R14: 00007ffef1f9c140 R15: 00007f1aa4f64b00
<6>[  365.428837][   T36]  </TASK>
<3>[  365.429398][   T36] INFO: task kbtest:9102 blocked for more than 30 seconds.
<3>[  365.430472][   T36]       Tainted: G        W  O       6.1.92 #16
<3>[  365.431759][   T36] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[  365.433029][   T36] task:kbtest          state:D stack:0     pid:9102  ppid:9083   flags:0x00000000
<6>[  365.434443][   T36] Call Trace:
<6>[  365.434975][   T36]  <TASK>
<6>[  365.435459][   T36]  __schedule+0x2d7/0xad0
<6>[  365.436130][   T36]  schedule+0x57/0xa0
<6>[  365.436818][   T36]  schedule_preempt_disabled+0x11/0x20
<6>[  365.437840][   T36]  rwsem_down_read_slowpath+0x28d/0x580
<6>[  365.438717][   T36]  down_read+0x63/0x140
<6>[  365.439362][   T36]  si_read_lock+0x40/0x100 [aufs 02e4ef5b7c7da454108981099dd5cc5175c944e7]
<6>[  365.440613][   T36]  ? lock_acquire+0x138/0x2b0
<6>[  365.441332][   T36]  ? lock_acquire+0x138/0x2b0
<6>[  365.442046][   T36]  aufs_permission+0x44/0x2f0 [aufs 02e4ef5b7c7da454108981099dd5cc5175c944e7]
<6>[  365.443367][   T36]  ? lock_release+0x1ad/0x2d0
<6>[  365.444083][   T36]  inode_permission.part.0+0x113/0x1b0
<6>[  365.444906][   T36]  link_path_walk.part.0+0x2a9/0x3d0
<6>[  365.445703][   T36]  ? path_init+0x3b9/0x4b0
<6>[  365.446397][   T36]  path_lookupat+0x43/0x1d0
<6>[  365.447310][   T36]  filename_lookup+0xcb/0x1d0
<6>[  365.448057][   T36]  ? lock_acquire+0x138/0x2b0
<6>[  365.448775][   T36]  ? kmem_cache_alloc+0x43a/0x530
<6>[  365.449548][   T36]  ? getname_flags.part.0+0x29/0x1b0
<6>[  365.453661][   T36]  ? strncpy_from_user+0x1e/0x140
<6>[  365.454478][   T36]  ? getname_flags.part.0+0x45/0x1b0
<6>[  365.455278][   T36]  user_path_at_empty+0x3a/0x60
<6>[  365.456020][   T36]  do_faccessat+0x7c/0x270
<6>[  365.456761][   T36]  do_syscall_64+0x3b/0x90
<6>[  365.457687][   T36]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6>[  365.458607][   T36] RIP: 0033:0x7f2b83cdfc57
<6>[  365.459376][   T36] RSP: 002b:00007ffda0325398 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
<6>[  365.460642][   T36] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2b83cdfc57
<6>[  365.461831][   T36] RDX: 00000000006eb980 RSI: 0000000000000000 RDI: 00000000004bfb00
<6>[  365.463100][   T36] RBP: 00007ffda03253a0 R08: 0000000000000000 R09: 0000000000000000
<6>[  365.464288][   T36] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000407c5c
<6>[  365.465475][   T36] R13: 00007ffda03254b0 R14: 0000000000000000 R15: 0000000000000000
<6>[  365.466739][   T36]  </TASK>
<3>[  365.467482][   T36] INFO: task mount:16359 blocked for more than 30 seconds.
<3>[  365.468564][   T36]       Tainted: G        W  O       6.1.92 #16
<3>[  365.469877][   T36] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[  365.471146][   T36] task:mount           state:D stack:0     pid:16359 ppid:13291  flags:0x00004000
<6>[  365.472560][   T36] Call Trace:
<6>[  365.473106][   T36]  <TASK>
<6>[  365.473590][   T36]  __schedule+0x2d7/0xad0
<6>[  365.474274][   T36]  schedule+0x57/0xa0
<6>[  365.474894][   T36]  schedule_preempt_disabled+0x11/0x20
<6>[  365.475714][   T36]  rwsem_down_read_slowpath+0x28d/0x580
<6>[  365.476588][   T36]  down_read_nested+0x61/0x140
<6>[  365.477607][   T36]  di_read_lock+0x25/0x130 [aufs 02e4ef5b7c7da454108981099dd5cc5175c944e7]
<6>[  365.478878][   T36]  au_do_refresh+0x2e/0x80 [aufs 02e4ef5b7c7da454108981099dd5cc5175c944e7]
<6>[  365.480157][   T36]  au_remount_refresh+0x205/0x500 [aufs 02e4ef5b7c7da454108981099dd5cc5175c944e7]
<6>[  365.481566][   T36]  au_fsctx_reconfigure+0x11e/0x130 [aufs 02e4ef5b7c7da454108981099dd5cc5175c944e7]
<6>[  365.483030][   T36]  reconfigure_super+0xcb/0x280
<6>[  365.483774][   T36]  path_mount+0x986/0xb70
<6>[  365.484448][   T36]  __x64_sys_mount+0x107/0x140
<6>[  365.485214][   T36]  do_syscall_64+0x3b/0x90
<6>[  365.485903][   T36]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6>[  365.486863][   T36] RIP: 0033:0x7f2529390d8a
<6>[  365.487801][   T36] RSP: 002b:00007ffdca417ac8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
<6>[  365.489044][   T36] RAX: ffffffffffffffda RBX: 00000000017b9060 RCX: 00007f2529390d8a
<6>[  365.490242][   T36] RDX: 00000000017b92d0 RSI: 00000000017c6030 RDI: 00000000017c6010
<6>[  365.491414][   T36] RBP: 0000000000000000 R08: 00000000017c1a80 R09: 00007f25292f399a
<6>[  365.492607][   T36] R10: 0000000000200030 R11: 0000000000000202 R12: 00000000017c6010
<6>[  365.493778][   T36] R13: 00000000017b92d0 R14: 0000000000000000 R15: 0000000000000003
<6>[  365.494966][   T36]  </TASK>
<3>[  365.495472][   T36] INFO: task modprobe:16385 blocked for more than 30 seconds.
<3>[  365.496624][   T36]       Tainted: G        W  O       6.1.92 #16
<3>[  365.497976][   T36] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[  365.499280][   T36] task:modprobe        state:D stack:0     pid:16385 ppid:16123  flags:0x00000000
<6>[  365.500700][   T36] Call Trace:
<6>[  365.501246][   T36]  <TASK>
<6>[  365.501738][   T36]  __schedule+0x2d7/0xad0
<6>[  365.502448][   T36]  schedule+0x57/0xa0
<6>[  365.503075][   T36]  schedule_preempt_disabled+0x11/0x20
<6>[  365.503900][   T36]  rwsem_down_read_slowpath+0x28d/0x580
<6>[  365.504743][   T36]  down_read+0x63/0x140
<6>[  365.505393][   T36]  si_read_lock+0x40/0x100 [aufs 02e4ef5b7c7da454108981099dd5cc5175c944e7]
<6>[  365.506736][   T36]  aufs_lookup.part.0.isra.0+0x1b/0x230 [aufs 02e4ef5b7c7da454108981099dd5cc5175c944e7]
<6>[  365.508248][   T36]  __lookup_slow+0xff/0x1b0
<6>[  365.508962][   T36]  walk_component+0xe5/0x160
<6>[  365.509691][   T36]  path_lookupat+0x73/0x1d0
<6>[  365.510430][   T36]  filename_lookup+0xcb/0x1d0
<6>[  365.511154][   T36]  ? lock_acquire+0x138/0x2b0
<6>[  365.511874][   T36]  ? lock_release+0x1ad/0x2d0
<6>[  365.512593][   T36]  ? lock_release+0x1ad/0x2d0
<6>[  365.513310][   T36]  ? fs_reclaim_acquire+0x51/0xe0
<6>[  365.514081][   T36]  vfs_statx+0x8d/0x170
<6>[  365.514745][   T36]  vfs_fstatat+0x54/0x70
<6>[  365.515403][   T36]  __do_sys_newfstatat+0x26/0x60
<6>[  365.516152][   T36]  ? trace_hardirqs_on+0x2c/0xd0
<6>[  365.516925][   T36]  ? syscall_enter_from_user_mode+0x1d/0x50
<6>[  365.517864][   T36]  do_syscall_64+0x3b/0x90
<6>[  365.518556][   T36]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6>[  365.519467][   T36] RIP: 0033:0x7ff5b5ae167a
<6>[  365.520144][   T36] RSP: 002b:00007ffd21dde028 EFLAGS: 00000246 ORIG_RAX: 0000000000000106
<6>[  365.521362][   T36] RAX: ffffffffffffffda RBX: 000055696e6a921b RCX: 00007ff5b5ae167a
<6>[  365.522528][   T36] RDX: 00007ffd21dde180 RSI: 000055696e6a921b RDI: 0000000000000003
<6>[  365.523684][   T36] RBP: 000055696d21e054 R08: 0000000000000000 R09: 000000000000000e
<6>[  365.524843][   T36] R10: 0000000000000000 R11: 0000000000000246 R12: 000055696d426368
<6>[  365.526001][   T36] R13: 000055696e6a9120 R14: 000055696e6a9010 R15: 000055696e6a9208
<6>[  365.527230][   T36]  </TASK>
sfjro commented 5 months ago

Hello Arkadiusz,

"Arkadiusz B.":

mkdir /union/mnt/new mount -o bind /union /union/mnt/new mount -o remount /union/mnt/new after last command console is locked.

I tried the same command sequnece on vanilla 6.1.0 + my current aufs6.1, but could not reproduce. I will try again with closer environtmen to yours later.

And did you install aufs-util.git on your system?

J. R. Okajima

arko-pl commented 5 months ago

And did you install aufs-util.git on your system?

I got outdated version, but after update I'm also reproducing this issue.

sfjro commented 5 months ago

"Arkadiusz B.":

And did you install aufs-util.git on your system?

I got outdated version, but after update I'm also reproducing this issue.

I tried reproducing again on plain linux-v6.1.92 + aufs6.1, but could not reproduce the problem.

The kernel log in your first mail shows

For snmpd[21963] and kbtest[9102], it is understandable that they stopped working since mount[16359] is still running. Then why mount[16359] stopped working? It was warned as the semaphore owner is different, and then stopped working. But the stopped point is ahead from the first warning, which means after the first warning mount[16359] kept running and moved ahead. And then stopped with holding a semaphore. Is this the scenario? Really? I don't understand.

Hmm, I don't know why the semaphore owner is different. It has to be same. In the remount process, aufs au_fsctx_reconfigure() function acquires the semaphore and au_remount_refresh() (in the same process) releases the semaphore temporary. I guess LOCKDEP produces the warning here. It is weird.

Did you apply some other patches to your kernel? I guess you applied lockdep-debug.patch in aufs-standalone.git. If you didn't, the kernel build should fail, or another warning would be produced much earlier and LOCKDEP would be off.

J. R. Okajima

arko-pl commented 5 months ago

Did you apply some other patches to your kernel?

No, only the standard set of patches was applied. Here is how I'm reproducing this from scratch:

mkdir /union
mkdir /mnt/image
dd if=/dev/zero of=/tmp/image bs=1M count=8
mkfs.ext4 /tmp/image
mount -o loop /tmp/image /mnt/image
mount -t aufs -o br:/mnt/image=rw none /union
mkidr /union/usr
cp /bin/bash /union/usr/
mkdir -p /union/mnt/test
mount -o remount /union/mnt/test
sfjro commented 5 months ago

"Arkadiusz B.":

Here is how I'm reproducing this from scratch:

mkdir /union

dd if=/dev/zero of=/tmp/image bs=1M count=8
mkfs.ext4 /tmp/image
mount -o loop /tmp/image /mnt/image
mount -t aufs -o br:/mnt/image=rw none /union
mkidr /union/usr
cp /bin/bash /union/usr/
mkdir -p /union/mnt/test
mount -o remount /union/mnt/test

Hmm, that is really strange. The last "remount" should return an error saying "that is not a mount point" or something. Did you forget mount -o bind /union /union/mnt/test just before the last "remount"?

Anyway I tried using loopback mounted ext4 on v6.1.92 again, but could not reproduce.

Something is totally broken. I don't think it sane that the semaphore onwer changes silently. I'm not sure whether this would help us or not, what will happen if you try mount -o move /union /union/mnt/test and "remount" instead of "bind"?

J. R. Okajima

arko-pl commented 5 months ago

Did you forget mount -o bind /union /union/mnt/test just before the last "remount"?

Sorry, my "enter" key got stuck, sent the message and closed the ticket... This is correct set of commands:

mkdir -p /mnt/image
mkdir /union
dd if=/dev/zero of=/tmp/image bs=1M count=8
mkfs.ext4 /tmp/image
mount -o loop /tmp/image /mnt/image
mount -t aufs -o br:/mnt/image=rw none /union
mkidr /union/usr
cp /bin/bash /union/usr/
mkdir -p /union/mnt/test
mount -o bind /union/usr /union/mnt/test
mount -o remount /union/mnt/test

mount -o move doesn't work it ends with "invalid argument".

There is one more thing to add. The aufs is mounted at the initrd stage where busybox is used. But I'm also reproducing on running system without busybox.

sfjro commented 5 months ago

"Arkadiusz B.":

Sorry, my "enter" key got stuck, sent the message and closed the ticket... This is correct set of commands:

I see. I guess "bind"ing a subdir onto another (sub)subdir is the key. I was truing binding the ROOT dir onto a subdir (and failed reproducing). Now I will think about why older aufs could handle it. Give me some time.

J. R. Okajima

arko-pl commented 5 months ago

Great, thank you.

sfjro commented 5 months ago

I guess "bind"ing a subdir onto another (sub)subdir is the key. I was truing binding the ROOT dir onto a subdir (and failed reproducing). Now I will think about why older aufs could handle it.

In aufs5.10, aufs hired fs_context in mainline and it passes a different dentry (the command line parameter, /union/mnt/test in your case). Aufs should not trust it is the root dentry in aufs super_block, I think. Would you test this patch?

J. R. Okajima

diff --git a/fs/aufs/fsctx.c b/fs/aufs/fsctx.c index 43b21910bc67..73d0cbe5b2c9 100644 --- a/fs/aufs/fsctx.c +++ b/fs/aufs/fsctx.c @@ -47,6 +47,7 @@ static int au_fsctx_reconfigure(struct fs_context *fc)

root = fc->root;
sb = root->d_sb;
arko-pl commented 5 months ago

Would you test this patch? I tested it and could not reproduce issue anymore. Thank you :+1:

sfjro commented 5 months ago

"Arkadiusz B.":

I tested it and could not reproduce issue anymore. Thank you :+1:

Thanx for testing. The patch will be merged after a few weeks. I'm testing several kernel versions now to release on next Monday. It is not for this "root" dentry bug, but another AIO bug. The fix for this "root" dentry bug will be released after that.

J. R. Okajima

arko-pl commented 5 months ago

Great news. Thank you.

sfjro commented 5 months ago

------- Blind-Carbon-Copy

From: "J. R. Okajima" @.> To: @. Subject: aufs6 GIT release (v6.10-rc5), aufs6.1--aufs6.5 will end MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: @.> Date: Mon, 01 Jul 2024 06:20:39 +0900 Message-ID: @.>

o Bugfix

o News

J. R. Okajima


------- End of Blind-Carbon-Copy