kdave / btrfs-progs

Development of userspace BTRFS tools
GNU General Public License v2.0
563 stars 243 forks source link

btrfs replace start failed #362

Open icebluey opened 3 years ago

icebluey commented 3 years ago

btrfs-progs: v5.11.1 linux kernel: 5.10.19

I tried several times:

# btrfs replace start -B 1 /dev/sde /mnt
ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": No space left on device

 or 

# btrfs replace start -B 1 /dev/sde /mnt
ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt": Read-only file system

But there was still space left.

# btrfs fi usa /mnt
Overall:
    Device size:           4.00GiB
    Device allocated:          3.96GiB
    Device unallocated:       42.00MiB
    Device missing:          0.00B
    Used:              2.94GiB
    Free (estimated):        283.62MiB  (min: 283.62MiB)
    Free (statfs, df):       282.62MiB
    Data ratio:               2.00
    Metadata ratio:           2.00
    Global reserve:        3.25MiB  (used: 0.00B)
    Multiple profiles:              no

dmesg output:

[  751.615411] BTRFS warning (device sdc): devid 1 uuid a8d08928-1048-40c3-a41d-e1027b00c126 is missing
[  751.616556] BTRFS warning (device sdc): devid 1 uuid a8d08928-1048-40c3-a41d-e1027b00c126 is missing
[  764.331884] BTRFS info (device sdc): dev_replace from <missing disk> (devid 1) to /dev/sde started
[  764.332591] ------------[ cut here ]------------
[  764.332594] BTRFS: Transaction aborted (error -28)
[  764.332675] RIP: 0010:btrfs_create_pending_block_groups+0x23e/0x2e0
[  764.332677] Code: 48 0f ba a8 40 0a 00 00 02 72 21 41 83 fd fb 0f 84 85 00 00 00 41 83 fd e2 74 7f 44 89 ee 48 c7 c7 88 fb a3 97 e8 d9 ee 72 00 <0f> 0b 44 89 e9 ba 4c 08 00 00 48 c7 c6 80 9f 66 97 48 89 ef e8 9f
[  764.332679] RSP: 0018:ffffab7142127bd0 EFLAGS: 00010286
[  764.332681] RAX: 0000000000000000 RBX: ffff9c7e871eed08 RCX: ffff9c7ebce18a88
[  764.332682] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9c7ebce18a80
[  764.332682] RBP: ffff9c7e849ba6e8 R08: ffff9c7ebffc7da8 R09: 0000000000027ffb
[  764.332683] R10: 00000000ffff8000 R11: 3fffffffffffffff R12: 0000000000000004
[  764.332684] R13: 00000000ffffffe4 R14: ffff9c7e849ba740 R15: ffff9c7e84fc2000
[  764.332687] FS:  00007f25fe9078c0(0000) GS:ffff9c7ebce00000(0000) knlGS:0000000000000000
[  764.332688] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  764.332689] CR2: 00007fafbf61fbd0 CR3: 0000000008f12000 CR4: 00000000000006f0
[  764.332696] Call Trace:
[  764.332706]  btrfs_run_delayed_refs+0x8f/0x200
[  764.332710]  commit_cowonly_roots+0xa9/0x2d0
[  764.332714]  ? btrfs_qgroup_account_extents+0xbe/0x220
[  764.332715]  btrfs_commit_transaction+0x56b/0xb00
[  764.332717]  ? start_transaction+0xe0/0x590
[  764.332797]  btrfs_dev_replace_by_ioctl.cold+0x22d/0x276
[  764.332804]  btrfs_ioctl+0x28a9/0x3040
[  764.332811]  ? do_sigaction+0x1c6/0x240
[  764.332815]  ? __x64_sys_ioctl+0x83/0xb0
[  764.332817]  __x64_sys_ioctl+0x83/0xb0
[  764.332823]  do_syscall_64+0x33/0x80
[  764.332827]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  764.332831] RIP: 0033:0x7f25fdc9a307
[  764.332833] Code: 44 00 00 48 8b 05 69 1b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 1b 2d 00 f7 d8 64 89 01 48
[  764.332834] RSP: 002b:00007ffcfbed9688 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  764.332836] RAX: ffffffffffffffda RBX: 00007ffcfbedb612 RCX: 00007f25fdc9a307
[  764.332837] RDX: 00007ffcfbed9ac0 RSI: 00000000ca289435 RDI: 0000000000000003
[  764.332838] RBP: 0000000000000004 R08: 0000000000000000 R09: 00007ffcfbed95e0
[  764.332839] R10: 0000000000000008 R11: 0000000000000246 R12: 0000561754e99050
[  764.332839] R13: 0000561754e99050 R14: 0000000000000001 R15: 0000000000000003
[  764.332842] ---[ end trace fa99b89d8fe292ea ]---
[  764.332846] BTRFS: error (device sdc) in btrfs_create_pending_block_groups:2124: errno=-28 No space left
[  764.332904] BTRFS info (device sdc): forced readonly
[  764.332953] BTRFS warning (device sdc): Skipping commit of aborted transaction.
[  764.332955] BTRFS: error (device sdc) in cleanup_transaction:1941: errno=-28 No space left
[  764.333023] ------------[ cut here ]------------
[  764.333063] RIP: 0010:btrfs_dev_replace_by_ioctl.cold+0x231/0x276
[  764.333065] Code: 93 36 ff 4c 89 ff e8 1d c3 84 ff 8b 04 24 e9 bb 12 8a ff 49 8b 54 24 40 48 83 c2 10 e9 d7 fe ff ff e8 32 d3 81 ff 85 c0 74 02 <0f> 0b 49 8b 4c 24 78 49 8b 74 24 70 6a 01 31 d2 45 31 c9 4c 8d 85
[  764.333066] RSP: 0018:ffffab7142127d78 EFLAGS: 00010286
[  764.333067] RAX: 00000000ffffffe4 RBX: ffff9c7e84fc5000 RCX: 000000000000595f
[  764.333068] RDX: 000000000000595e RSI: ffffffff971872f2 RDI: 0000000000032370
[  764.333069] RBP: ffff9c7e84fc2000 R08: 0000000000000000 R09: ffffab7142127c98
[  764.333070] R10: 0000000000000001 R11: ffffffffffffc000 R12: ffff9c7e854d8a00
[  764.333071] R13: ffff9c7e81418d00 R14: ffff9c7e84fc2b10 R15: ffff9c7e854d9a00
[  764.333073] FS:  00007f25fe9078c0(0000) GS:ffff9c7ebce00000(0000) knlGS:0000000000000000
[  764.333074] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  764.333075] CR2: 00007fafbf61fbd0 CR3: 0000000008f12000 CR4: 00000000000006f0
[  764.333080] Call Trace:
[  764.333084]  btrfs_ioctl+0x28a9/0x3040
[  764.333088]  ? do_sigaction+0x1c6/0x240
[  764.333091]  ? __x64_sys_ioctl+0x83/0xb0
[  764.333092]  __x64_sys_ioctl+0x83/0xb0
[  764.333095]  do_syscall_64+0x33/0x80
[  764.333097]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  764.333098] RIP: 0033:0x7f25fdc9a307
[  764.333100] Code: 44 00 00 48 8b 05 69 1b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 1b 2d 00 f7 d8 64 89 01 48
[  764.333101] RSP: 002b:00007ffcfbed9688 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  764.333103] RAX: ffffffffffffffda RBX: 00007ffcfbedb612 RCX: 00007f25fdc9a307
[  764.333104] RDX: 00007ffcfbed9ac0 RSI: 00000000ca289435 RDI: 0000000000000003
[  764.333105] RBP: 0000000000000004 R08: 0000000000000000 R09: 00007ffcfbed95e0
[  764.333106] R10: 0000000000000008 R11: 0000000000000246 R12: 0000561754e99050
[  764.333106] R13: 0000561754e99050 R14: 0000000000000001 R15: 0000000000000003
[  764.333108] ---[ end trace fa99b89d8fe292eb ]---
[  764.333146] BTRFS warning (device sdc): failed setting block group ro: -30
Zygo commented 3 years ago

If there is only one metadata BG (this is not clear from the truncated btrfs fi usage output, but I'm guessing there is either only one or only one that has free space) and no more metadata chunks can be allocated, then replace is not possible. Replace will lock the only writable metadata BG, no additional writable BGs can be allocated, and then the filesystem cannot be modified any more, including to do the necessary metadata updates to replace a disk.

Increasing the size of the device may suffice as a workaround, or adding at least 2 devices for more space before replacing the missing disk.

The kernel might need to ensure that there is enough space for the global reserve without counting the contribution from the N metadata block groups that have the most free space, where N = 1 (balance) + number_of_disks (for scrub or replace to lock one BG on each disk). That would mean there is more space dedicated to metadata on such a tiny filesystem, but would allow RAID profiles to work on small filesystem sizes.

icebluey commented 3 years ago

@Zygo I found another problem #363