koverstreet / bcachefs

Other
670 stars 69 forks source link

BUG on `device remove` [f96d03f0e1e3] #560

Closed ojab closed 1 year ago

ojab commented 1 year ago

Version

f96d03f0e1e3c500e4ea295d07b6a817f878999d bcachefs tool version v0.1-692-gcfa816b

Tried to bcachefs device remove /dev/sdd1, first time it waited a long time (10+ minutes) and failed with Remove failed: error flushing journal: ERESTARTSYS, second time I encoutered BUG.

Generic info

bcachefs fs usage
TBD, hangs until reboot
$ bcachefs show-super /dev/sdc1 
External UUID:                              360fc60c-8c44-4f3e-9cc4-fbaeee9e7c3b
Internal UUID:                              bc05affd-9fd1-4eb5-b497-3f7956ac57d2
Device index:                               2
Label:                                      
Version:                                    snapshot_trees
Oldest version on disk:                     snapshot_trees
Created:                                    Fri Jun 16 22:38:16 2023
Sequence number:                            160
Superblock size:                            5608
Clean:                                      0
Devices:                                    4
Sections:                                   members,replicas_v0,quota,disk_groups,clean,journal_seq_blacklist,journal_v2,counters
Features:                                   zstd,journal_seq_blacklist_v3,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                            alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                               4.00 KiB
  btree_node_size:                          256 KiB
  errors:                                   continue [ro] panic 
  metadata_replicas:                        2
  data_replicas:                            1
  metadata_replicas_required:               1
  data_replicas_required:                   1
  encoded_extent_max:                       64.0 KiB
  metadata_checksum:                        none crc32c [crc64] xxhash 
  data_checksum:                            none [crc32c] crc64 xxhash 
  compression:                              [none] lz4 gzip zstd 
  background_compression:                   none lz4 gzip [zstd] 
  str_hash:                                 crc32c crc64 [siphash] 
  metadata_target:                          ssd
  foreground_target:                        ssd
  background_target:                        hdd
  promote_target:                           ssd
  erasure_code:                             0
  inodes_32bit:                             1
  shard_inode_numbers:                      1
  inodes_use_key_cache:                     1
  gc_reserve_percent:                       5
  gc_reserve_bytes:                         0 B
  root_reserve_percent:                     0
  wide_macs:                                0
  acl:                                      1
  usrquota:                                 1
  grpquota:                                 1
  prjquota:                                 1
  journal_flush_delay:                      1000
  journal_flush_disabled:                   0
  journal_reclaim_delay:                    100
  journal_transaction_names:                1
  nocow:                                    0

members (size 232):
  Device:                                   0
    UUID:                                   62f3139c-4515-4e6a-9aa3-24f598263ece
    Size:                                   18.1 TiB
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                37881853
    Last mount:                             Fri Jul  7 20:39:05 2023
    State:                                  rw
    Label:                                  hdd1 (1)
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user
    Discard:                                0
    Freespace initialized:                  1
  Device:                                   1
    UUID:                                   4c1c7eff-f1e9-44b8-bcac-186fb4aa2367
    Size:                                   18.1 TiB
    Bucket size:                            1.00 MiB
    First bucket:                           0
    Buckets:                                18940926
    Last mount:                             Fri Jul  7 20:39:05 2023
    State:                                  rw
    Label:                                  hdd2 (2)
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user,cached
    Discard:                                0
    Freespace initialized:                  1
  Device:                                   2
    UUID:                                   0f63e7ee-528b-4374-8a2c-99c0226ba4ff
    Size:                                   233 GiB
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                476948
    Last mount:                             Fri Jul  7 20:39:05 2023
    State:                                  ro
    Label:                                  ssd1 (4)
    Data allowed:                           journal,btree,user
    Has data:                               (none)
    Discard:                                1
    Freespace initialized:                  1
  Device:                                   3
    UUID:                                   b5e3a7e3-00df-4a57-8339-b4d6752ca5f5
    Size:                                   233 GiB
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                476948
    Last mount:                             Fri Jul  7 20:39:05 2023
    State:                                  ro
    Label:                                  ssd2 (5)
    Data allowed:                           journal,btree,user
    Has data:                               (none)
    Discard:                                1
    Freespace initialized:                  1

Kernel bugs Backtrace:

[First `bcachefs device remove /dev/sdd1`]
bcachefs (sdd1): Remove failed: error flushing journal: ERESTARTSYS
perf: interrupt took too long (3178 > 3143), lowering kernel.perf_event_max_sample_rate to 62000
[Second `bcachefs device remove /dev/sdd1`]
------------[ cut here ]------------
kernel BUG at fs/bcachefs/replicas.c:495!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 2 PID: 8396 Comm: bcachefs Tainted: G        W          6.4.0-ojab-02596-gf96d03f0e1e3 #9 51367348616d463f59e04f68177265a5049c28b2
RIP: 0010:bch2_replicas_gc_start+0x185/0x1b0 [bcachefs]
Code: 4c 89 e7 e8 fd 62 d8 dc 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 45 31 d2 e9 1b d5 d8 dc <0f> 0b 31 ff e9 15 ff ff ff 4c 89 e7 e8 ca 62 d8 dc 49 8d b5 a4 01
RSP: 0018:ffffa6e8d0747c28 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffff89abb4300000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff89abb4300670
R13: ffff89abb4300000 R14: ffff89abb43004a0 R15: 0000000000000003
FS:  00007f4b85834a00(0000) GS:ffff89b010680000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe404523000 CR3: 00000003bf0de000 CR4: 00000000003506e0
Call Trace:
<TASK>
? die+0x43/0xb0
? do_trap+0x158/0x160
? bch2_replicas_gc_start+0x185/0x1b0 [bcachefs 4a84a6a9e98c892a425599d3665ee8044e6e46f2]
bch2_replicas_gc_start+0x185/0x1b0:
bch2_replicas_gc_start at /home/ojab/src/linux-stable/fs/bcachefs/replicas.c:495 (discriminator 1)
? bch2_replicas_gc_start+0x185/0x1b0 [bcachefs 4a84a6a9e98c892a425599d3665ee8044e6e46f2]
bch2_replicas_gc_start+0x185/0x1b0:
bch2_replicas_gc_start at /home/ojab/src/linux-stable/fs/bcachefs/replicas.c:495 (discriminator 1)
? do_error_trap+0x83/0x100
? bch2_replicas_gc_start+0x185/0x1b0 [bcachefs 4a84a6a9e98c892a425599d3665ee8044e6e46f2]
bch2_replicas_gc_start+0x185/0x1b0:
bch2_replicas_gc_start at /home/ojab/src/linux-stable/fs/bcachefs/replicas.c:495 (discriminator 1)
? exc_invalid_op+0x53/0x80
? bch2_replicas_gc_start+0x185/0x1b0 [bcachefs 4a84a6a9e98c892a425599d3665ee8044e6e46f2]
bch2_replicas_gc_start+0x185/0x1b0:
bch2_replicas_gc_start at /home/ojab/src/linux-stable/fs/bcachefs/replicas.c:495 (discriminator 1)
? asm_exc_invalid_op+0x16/0x20
? bch2_replicas_gc_start+0x185/0x1b0 [bcachefs 4a84a6a9e98c892a425599d3665ee8044e6e46f2]
bch2_replicas_gc_start+0x185/0x1b0:
bch2_replicas_gc_start at /home/ojab/src/linux-stable/fs/bcachefs/replicas.c:495 (discriminator 1)
? bch2_replicas_gc_start+0x23/0x1b0 [bcachefs 4a84a6a9e98c892a425599d3665ee8044e6e46f2]
bch2_replicas_gc_start+0x23/0x1b0:
bch2_replicas_gc_start at /home/ojab/src/linux-stable/fs/bcachefs/replicas.c:495
bch2_journal_flush_device_pins+0x122/0x250 [bcachefs 4a84a6a9e98c892a425599d3665ee8044e6e46f2]
bch2_journal_flush_device_pins+0x122/0x250:
bch2_journal_flush_device_pins at /home/ojab/src/linux-stable/fs/bcachefs/journal_reclaim.c:847
bch2_dev_remove+0xf7/0x3e0 [bcachefs 4a84a6a9e98c892a425599d3665ee8044e6e46f2]
bch2_dev_remove+0xf7/0x3e0:
bch2_dev_remove at /home/ojab/src/linux-stable/fs/bcachefs/super.c:1482
bch2_fs_ioctl+0x4b7/0xa90 [bcachefs 4a84a6a9e98c892a425599d3665ee8044e6e46f2]
bch2_fs_ioctl+0x4b7/0xa90:
bch2_ioctl_disk_remove at /home/ojab/src/linux-stable/fs/bcachefs/chardev.c:218
(inlined by) bch2_fs_ioctl at /home/ojab/src/linux-stable/fs/bcachefs/chardev.c:670
__x64_sys_ioctl+0xbe/0xe0
do_syscall_64+0x5b/0x90
entry_SYSCALL_64_after_hwframe+0x4b/0xb5
RIP: 0033:0x7f4b832ed677
Code: 00 00 90 48 8b 05 19 c8 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 c7 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc169eb598 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4b832ed677
RDX: 00007ffc169eb5b0 RSI: 000000004010bc05 RDI: 0000000000000005
RBP: 00007ffc169ebcea R08: 000055dede4fd470 R09: 000055dede4fd450
R10: 0000000000000007 R11: 0000000000000246 R12: 00007ffc169eb748
R13: 000055dedd1097c0 R14: 000000000000001c R15: 0000000000000000
</TASK>
Modules linked in: nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter tun overlay bcachefs lz4_compress mean_and_variance lz4_decompress pps_ldisc pps_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel curve25519_x86_64 libcurve25519_generic libchacha sit tunnel4 ip_tunnel af_packet bridge stp llc ip6table_nat ip6table_filter ip6_tables xt_MASQUERADE xt_conntrack iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables bpfilter tcp_bbr sch_fq_codel efivarfs nls_iso8859_1 nls_cp437 vfat fat btrfs blake2b_generic libcrc32c xor lzo_compress zlib_deflate raid6_pq zlib_inflate lzo_decompress cdc_mbim cdc_wdm cdc_ncm cdc_ether input_leds joydev r8152 mousedev ax88796b amdgpu hid_generic asix phylink usbhid selftests usbnet hid mii ath10k_pci ath10k_core edac_mce_amd kvm_amd ath bfq kvm mac80211 mfd_core irqbypass crc32_pclmul iommu_v2 crc32c_intel polyval_clmulni gpu_sched
polyval_generic gf128mul i2c_algo_bit r8169 libarc4 sha512_ssse3 drm_buddy snd_hda_codec_generic snd_hda_codec_hdmi wmi_bmof evdev snd_hda_intel realtek cfg80211 xhci_pci snd_intel_dspcfg aesni_intel drm_suballoc_helper crypto_simd snd_hda_codec mdio_devres cryptd xhci_hcd snd_hwdep snd_hda_core libphy drm_display_helper rapl rfkill efi_pstore usbcore snd_pcm sp5100_tco cec snd_timer drm_ttm_helper mpt3sas ccp acpi_cpufreq watchdog snd ttm ahci k10temp raid_class sha1_generic soundcore usb_common libahci scsi_transport_sas hwmon i2c_piix4 8250 tpm_crb 8250_base tpm_tis video serial_mctrl_gpio tpm_tis_core serial_core rtc_cmos wmi tpm backlight gpio_amdpt rng_core gpio_generic button unix
---[ end trace 0000000000000000 ]---
RIP: 0010:bch2_replicas_gc_start+0x185/0x1b0 [bcachefs]
Code: 4c 89 e7 e8 fd 62 d8 dc 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 45 31 d2 e9 1b d5 d8 dc <0f> 0b 31 ff e9 15 ff ff ff 4c 89 e7 e8 ca 62 d8 dc 49 8d b5 a4 01
RSP: 0018:ffffa6e8d0747c28 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffff89abb4300000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff89abb4300670
R13: ffff89abb4300000 R14: ffff89abb43004a0 R15: 0000000000000003
FS:  00007f4b85834a00(0000) GS:ffff89b010680000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe404523000 CR3: 00000003bf0de000 CR4: 00000000003506e0
koverstreet commented 1 year ago

I just pushed a fix for the BUG() to the testing branch - we'll have to do more work to debug the hang