koverstreet / bcachefs

Other
643 stars 71 forks source link

[6.7-rc4] Emergency RO (about to insert invalid key in data update path) #618

Open Madouura opened 7 months ago

Madouura commented 7 months ago

Not really sure what caused this. bcachefs-tools is on 1.3.3.

bcachefs show-super

External UUID:                              7b12f60f-5a7d-4e06-a915-60ff5e794ef2
Internal UUID:                              44e9d681-d028-4320-8191-2a5f2ef5e2ae
Device index:                               0
Label:                                      root
Version:                                    1.3: rebalance_work
Version upgrade complete:                   1.3: rebalance_work
Oldest version on disk:                     1.3: rebalance_work
Created:                                    Thu Nov 16 13:53:08 2023

Sequence number:                            276
Superblock size:                            6272
Clean:                                      0
Devices:                                    3
Sections:                                   members_v1,crypt,replicas_v0,disk_groups,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors
Features:                                   zstd,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                            alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                               4.00 KiB
  btree_node_size:                          256 KiB
  errors:                                   continue [ro] panic 
  metadata_replicas:                        2
  data_replicas:                            2
  metadata_replicas_required:               1
  data_replicas_required:                   1
  encoded_extent_max:                       64.0 KiB
  metadata_checksum:                        none crc32c crc64 [xxhash] 
  data_checksum:                            none crc32c crc64 [xxhash] 
  compression:                              zstd
  background_compression:                   zstd
  str_hash:                                 crc32c crc64 [siphash] 
  metadata_target:                          ssd
  foreground_target:                        ssd
  background_target:                        hdd
  promote_target:                           ssd
  erasure_code:                             0
  inodes_32bit:                             1
  shard_inode_numbers:                      1
  inodes_use_key_cache:                     1
  gc_reserve_percent:                       8
  gc_reserve_bytes:                         0 B
  root_reserve_percent:                     2
  wide_macs:                                0
  acl:                                      1
  usrquota:                                 0
  grpquota:                                 0
  prjquota:                                 0
  journal_flush_delay:                      1000
  journal_flush_disabled:                   0
  journal_reclaim_delay:                    100
  journal_transaction_names:                1
  version_upgrade:                          [compatible] incompatible none 
  nocow:                                    0

members_v2 (size 376):
  Device:                                   0
    Label:                                  ssd1 (1)
    UUID:                                   dc54fd83-c9d9-4f45-bf95-8a3875149a6f
    Size:                                   928 GiB
    read errors:                            0
    write errors:                           0
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                1899544
    Last mount:                             Mon Dec 11 02:08:46 2023

    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user,cached
    Discard:                                1
    Freespace initialized:                  1
  Device:                                   1
    Label:                                  ssd2 (2)
    UUID:                                   375e0964-5443-49ab-be76-668b6934b78b
    Size:                                   868 GiB
    read errors:                            0
    write errors:                           0
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                1776664
    Last mount:                             Mon Dec 11 02:08:46 2023

    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user,cached
    Discard:                                1
    Freespace initialized:                  1
  Device:                                   2
    Label:                                  hdd1 (4)
    UUID:                                   e60c9ec1-981c-46b4-a4fb-188cad1811cf
    Size:                                   7.28 TiB
    read errors:                            0
    write errors:                           0
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                15261768
    Last mount:                             Mon Dec 11 02:08:46 2023

    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               journal,user
    Discard:                                0
    Freespace initialized:                  1

crypt (size 64):
  KFD:               0
  scrypt n:          14
  scrypt r:          3
  scrypt p:          4

replicas_v0 (size 40):
  user: 2 [0 2] btree: 2 [0 1] cached: 1 [0] journal: 2 [0 2] user: 2 [0 1] user: 2 [1 2] cached: 1 [1] journal: 2 [0 1]

dmesg

[39960.622415] about to insert invalid key in data update path
[39960.622417] old: u64s 10 type extent 604066652:3912:U32_MAX len 72 ver 37785646: durability: 2 crc: c_size 64 size 72 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 0:1543187:480 gen 70 ptr: 1:48257:480 gen 79 rebalance: target hdd compression zstd
[39960.622418] k:   u64s 10 type extent 604066652:4032:U32_MAX len 128 ver 37785647: durability: 2 crc: c_size 56 size 128 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 0:1543187:544 gen 70 ptr: 1:48257:544 gen 79 rebalance: target hdd compression zstd
[39960.622419] new: u64s 13 type extent 604066652:3912:U32_MAX len 8 ver 37785647: durability: 2 crc: c_size 56 size 128 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 0:1543187:544 gen 70 cached rebalance: target hdd compression zstd crc: c_size 64 size 72 offset 64 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:81013:672 gen 76 ptr: 2:161991:856 gen 7
[39960.622428] bcachefs (7b12f60f-5a7d-4e06-a915-60ff5e794ef2): fatal error - emergency read only
[39960.622478] ------------[ cut here ]------------
[39960.622480] kernel BUG at fs/bcachefs/io_write.c:529!
[39960.622486] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[39960.622489] CPU: 21 PID: 47460 Comm: kworker/u65:9 Tainted: G        W  O       6.7.0-rc4 #1-NixOS
[39960.622492] Hardware name: ASUS System Product Name/ROG CROSSHAIR VIII DARK HERO, BIOS 4702 10/20/2023
[39960.622494] Workqueue: bcachefs bch2_write_point_do_index_updates [bcachefs]
[39960.622535] RIP: 0010:__bch2_write_index+0x261/0x270 [bcachefs]
[39960.622562] Code: c1 48 c1 e1 09 e8 1f bf 60 c3 eb 93 be 1e 00 00 00 44 89 e7 e8 90 8b fc ff 84 c0 75 82 4d 8b b5 38 01 00 00 e9 5f ff ff ff 90 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90
[39960.622564] RSP: 0018:ffffc90029c77e00 EFLAGS: 00010246
[39960.622567] RAX: ffff88815ee33420 RBX: ffff88811d200000 RCX: 0000000000000048
[39960.622569] RDX: ffff88815ee33420 RSI: 0000000000000009 RDI: 0038001a8188ffff
[39960.622570] RBP: 0000000000000048 R08: 0000000000000000 R09: ffffc90029c77958
[39960.622572] R10: ffffc90029c77de8 R11: ffffffff86338a68 R12: 0000000000000000
[39960.622573] R13: ffff88815ee33290 R14: ffff88815ee333d8 R15: dead000000000100
[39960.622575] FS:  0000000000000000(0000) GS:ffff888feec80000(0000) knlGS:0000000000000000
[39960.622577] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39960.622579] CR2: 000010c553df5000 CR3: 0000000383e92000 CR4: 0000000000f50ef0
[39960.622580] PKRU: 55555554
[39960.622582] Call Trace:
[39960.622584]  <TASK>
[39960.622588]  ? die+0x36/0x90
[39960.622592]  ? do_trap+0xda/0x100
[39960.622595]  ? __bch2_write_index+0x261/0x270 [bcachefs]
[39960.622621]  ? do_error_trap+0x6a/0x90
[39960.622623]  ? __bch2_write_index+0x261/0x270 [bcachefs]
[39960.622648]  ? exc_invalid_op+0x51/0x70
[39960.622651]  ? __bch2_write_index+0x261/0x270 [bcachefs]
[39960.622675]  ? asm_exc_invalid_op+0x1a/0x20
[39960.622681]  ? __bch2_write_index+0x261/0x270 [bcachefs]
[39960.622706]  bch2_write_point_do_index_updates+0xb1/0x160 [bcachefs]
[39960.622731]  process_one_work+0x176/0x340
[39960.622735]  worker_thread+0x27b/0x3a0
[39960.622738]  ? __pfx_worker_thread+0x10/0x10
[39960.622740]  kthread+0xd7/0x100
[39960.622743]  ? __pfx_kthread+0x10/0x10
[39960.622746]  ret_from_fork+0x34/0x50
[39960.622749]  ? __pfx_kthread+0x10/0x10
[39960.622751] bcachefs (7b12f60f-5a7d-4e06-a915-60ff5e794ef2): __bch2_btree_write_buffer_flush: insert error EIO
[39960.622752]  ret_from_fork_asm+0x1b/0x30
[39960.622759]  </TASK>
[39960.622760] Modules linked in: sch_ingress tcp_diag af_key nfnetlink_log xfrm_user xfrm_algo xfrm_interface xfrm6_tunnel tunnel4 udp_diag tunnel6 inet_diag vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci veth overlay ext4 mbcache jbd2 iptable_mangle xt_CHECKSUM iptable_nat xt_MASQUERADE nf_nat iptable_filter qrtr snd_seq_dummy snd_hrtimer snd_seq rfcomm af_packet cmac algif_hash algif_skcipher af_alg bnep msr iwlmvm mac80211 libarc4 nls_iso8859_1 xone_dongle(O) nls_cp437 vfat snd_hda_codec_realtek fat iwlwifi xone_gip(O) snd_hda_codec_generic snd_usb_audio snd_hda_codec_hdmi btusb btrtl snd_hda_intel btintel snd_usbmidi_lib snd_intel_dspcfg edac_mce_amd btbcm snd_intel_sdw_acpi snd_rawmidi ucsi_ccg btmtk snd_hda_codec intel_rapl_msr edac_core typec_ucsi snd_seq_device snd_hda_core intel_rapl_common typec crc32_pclmul joydev mousedev evdev mc polyval_clmulni r8169 eeepc_wmi snd_hwdep cfg80211 bluetooth polyval_generic igb roles mac_hid uas cdc_acm gf128mul asus_ec_sensors
[39960.622823]  ghash_clmulni_intel asus_wmi snd_pcm battery realtek sp5100_tco ledtrig_audio ptp mdio_devres ecdh_generic sparse_keymap snd_timer pps_core platform_profile tpm_crb ecc watchdog crc16 libphy rfkill dca mxm_wmi wmi_bmof rapl i2c_designware_pci snd i2c_ccgx_ucsi k10temp i2c_piix4 soundcore tpm_tis tpm_tis_core tiny_power_button button xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat nf_tables nfnetlink sch_fq_codel uinput ctr atkbd libps2 serio vivaldi_fmap loop tun tap macvlan bridge stp llc kvm_amd ccp kvm irqbypass nct6775 nct6775_core hwmon_vid fuse efi_pstore configfs zstd zram efivarfs dmi_sysfs ip_tables x_tables autofs4 poly1305_generic libpoly1305 poly1305_x86_64 chacha_generic chacha_x86_64 libchacha bcachefs libcrc32c crc32c_generic xor lz4hc_compress lz4_compress raid6_pq dm_crypt aes_generic cbc encrypted_keys trusted asn1_encoder tee tpm rng_core hid_logitech_hidpp hid_logitech_dj usb_storage input_leds
[39960.622894]  led_class hid_generic usbhid hid sd_mod ahci xhci_pci libahci xhci_pci_renesas xhci_hcd libata crc32c_intel sha512_ssse3 sha512_generic nvme sha256_ssse3 sha1_ssse3 usbcore aesni_intel nvme_core scsi_mod libaes crypto_simd cryptd t10_pi crc64_rocksoft crc_t10dif crct10dif_generic crct10dif_pclmul scsi_common usb_common crc64 crct10dif_common rtc_cmos dm_mod dax amdgpu i2c_algo_bit drm_ttm_helper ttm agpgart video wmi drm_exec drm_suballoc_helper amdxcp drm_buddy gpu_sched drm_display_helper drm_kms_helper drm backlight firmware_class
[39960.622942] ---[ end trace 0000000000000000 ]---
[39961.924499] RIP: 0010:__bch2_write_index+0x261/0x270 [bcachefs]
[39961.924563] Code: c1 48 c1 e1 09 e8 1f bf 60 c3 eb 93 be 1e 00 00 00 44 89 e7 e8 90 8b fc ff 84 c0 75 82 4d 8b b5 38 01 00 00 e9 5f ff ff ff 90 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90
[39961.924568] RSP: 0018:ffffc90029c77e00 EFLAGS: 00010246
[39961.924572] RAX: ffff88815ee33420 RBX: ffff88811d200000 RCX: 0000000000000048
[39961.924576] RDX: ffff88815ee33420 RSI: 0000000000000009 RDI: 0038001a8188ffff
[39961.924578] RBP: 0000000000000048 R08: 0000000000000000 R09: ffffc90029c77958
[39961.924581] R10: ffffc90029c77de8 R11: ffffffff86338a68 R12: 0000000000000000
[39961.924583] R13: ffff88815ee33290 R14: ffff88815ee333d8 R15: dead000000000100
[39961.924585] FS:  0000000000000000(0000) GS:ffff888feec80000(0000) knlGS:0000000000000000
[39961.924589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39961.924591] CR2: 000010c553df5000 CR3: 0000000383e92000 CR4: 0000000000f50ef0
[39961.924594] PKRU: 55555554
koverstreet commented 7 months ago

Your subject refers to an error that isn't in the log - do you still have that log too?

What kernel version are you on?

Madouura commented 7 months ago

What kernel version are you on?

It's in the title, Linux 6.7-rc4.

do you still have that log too?

I've rebooted (and fsck'd) since then, do I just get it from journalctl or dmesg?

koverstreet commented 7 months ago

Well, since it went emergency RO, systemd journal wouldn't have been able to save it, alas.

You'll have to wait for it to happen again, then grab it from dmesg (without rebooting!) and save it manually - e.g. by scping it to another machine.

RX14 commented 3 weeks ago

I just got basically the same thing on 6.9.7

[449036.160447] about to insert invalid key in data update path
[449036.160454] old: u64s 10 type extent 134514883:1776:U32_MAX len 16 ver 64804986: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress lz4 ptr: 6:567152:528 gen 1 ptr: 7:668446:520 gen 1 rebalance: target hdd compression zstd:15
[449036.160458] k:   u64s 10 type extent 134514883:1784:U32_MAX len 16 ver 64805228: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress lz4 ptr: 6:567152:536 gen 1 ptr: 7:668446:528 gen 1 rebalance: target hdd compression zstd:15
[449036.160462] new: u64s 14 type extent 134514883:1776:U32_MAX len 8 ver 64805228: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress lz4 ptr: 6:567152:536 gen 1 cached ptr: 7:668446:528 gen 1 cached rebalance: target hdd compression zstd:15 crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:182418:40 gen 6 ptr: 2:1286464:424 gen 2
[449036.163824] bcachefs (f7fa14ed-5a8e-4b14-b39a-8b5c21f8bc25): fatal error - emergency read only
[449036.164289] ------------[ cut here ]------------
[449036.164292] kernel BUG at fs/bcachefs/io_write.c:532!
[449036.164718] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[449036.165141] CPU: 8 PID: 562219 Comm: kworker/u80:0 Not tainted 6.9.7-arch1-1 #1 44783200744f92500e6484c6d93590bc19db4a83
[449036.165620] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[449036.166077] Workqueue: bcachefs bch2_write_point_do_index_updates [bcachefs]
[449036.166633] RIP: 0010:__bch2_write_index+0x26d/0x270 [bcachefs]
[449036.167188] Code: 09 e8 97 d0 03 00 5a eb 88 be 1e 00 00 00 44 89 e7 e8 07 5b fc ff 84 c0 0f 85 73 ff ff ff 4d 8b b5 38 01 00 00 e9 50 ff ff ff <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f
[449036.168192] RSP: 0018:ffff9ffae3643e00 EFLAGS: 00010246
[449036.168699] RAX: ffff93c80df58428 RBX: 0000000000000010 RCX: 0000000000000010
[449036.169218] RDX: ffff93c80df58428 RSI: 0000000000000009 RDI: 0098093cc893ffff
[449036.169747] RBP: ffff93c815200000 R08: 0000000000000000 R09: 00000000113d99fc
[449036.170280] R10: 047b703c2f843f00 R11: 00047b703c2f843f R12: 0000000000000000
[449036.170835] R13: ffff93c80df58298 R14: ffff93c80df583e0 R15: dead000000000100
[449036.171383] FS:  0000000000000000(0000) GS:ffff93cf5fa00000(0000) knlGS:0000000000000000
[449036.171940] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[449036.172502] CR2: 000064e240a4dfe0 CR3: 000000011e37c000 CR4: 0000000000350ef0
[449036.173078] Call Trace:
[449036.173653]  <TASK>
[449036.174226]  ? __die_body.cold+0x19/0x27
[449036.174819]  ? die+0x2e/0x50
[449036.175404]  ? do_trap+0xca/0x110
[449036.175991]  ? do_error_trap+0x6a/0x90
[449036.176596]  ? __bch2_write_index+0x26d/0x270 [bcachefs aa0637810d467c8e6cf072acf6a70476543ba202]
[449036.177281]  ? exc_invalid_op+0x50/0x70
[449036.177933]  ? __bch2_write_index+0x26d/0x270 [bcachefs aa0637810d467c8e6cf072acf6a70476543ba202]
[449036.178798]  ? asm_exc_invalid_op+0x1a/0x20
[449036.179485]  ? __bch2_write_index+0x26d/0x270 [bcachefs aa0637810d467c8e6cf072acf6a70476543ba202]
[449036.180215]  bch2_write_point_do_index_updates+0xb9/0x160 [bcachefs aa0637810d467c8e6cf072acf6a70476543ba202]
[449036.180964]  process_one_work+0x18e/0x350
[449036.181631]  worker_thread+0x2eb/0x410
[449036.182314]  ? __pfx_worker_thread+0x10/0x10
[449036.183011]  kthread+0xd2/0x100
[449036.183720]  ? __pfx_kthread+0x10/0x10
[449036.184412]  ret_from_fork+0x34/0x50
[449036.185095]  ? __pfx_kthread+0x10/0x10
[449036.185781]  ret_from_fork_asm+0x1a/0x30
[449036.186477]  </TASK>
[449036.187215] Modules linked in: tcp_diag udp_diag inet_diag xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat bridge stp llc xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables overlay nf_conntrack_netlink wireguard curve25519_x86_64 libchacha20poly1305 nf_conntrack libcurve25519_generic ip6_udp_tunnel nf_defrag_ipv6 udp_tunnel nf_defrag_ipv4 nfnetlink_acct tls nfsv4 dns_resolver cbc cts rpcsec_gss_krb5 nfs netfs rpcrdma rdma_cm iw_cm ib_cm ib_core poly1305_generic libpoly1305 poly1305_x86_64 chacha_generic chacha_x86_64 libchacha bcachefs lz4hc_compress lz4_compress xor raid6_pq libcrc32c intel_rapl_msr intel_rapl_common snd_hda_codec_generic snd_hda_intel crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_intel_sdw_acpi polyval_clmulni polyval_generic snd_hda_codec gf128mul ghash_clmulni_intel snd_hda_core sha512_ssse3 snd_hwdep iTCO_wdt sha256_ssse3 intel_pmc_bxt snd_pcm sha1_ssse3 iTCO_vendor_support snd_timer aesni_intel joydev cfg80211 snd mousedev crypto_simd i2c_i801 psmouse pcspkr cryptd soundcore rfkill
[449036.187360]  i2c_smbus lpc_ich mac_hid nfsd nfs_acl tcp_bbr lockd auth_rpcgss grace ptp_kvm ptp pps_core sunrpc dm_mod tun loop nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci qemu_fw_cfg ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid virtio_net serio_raw virtio_scsi net_failover virtio_console failover atkbd virtio_balloon virtio_rng virtio_blk libps2 vivaldi_fmap bochs nvme drm_vram_helper crc32c_intel nvme_core drm_ttm_helper intel_agp nvme_auth virtio_pci xhci_pci ttm i8042 virtio_pci_legacy_dev intel_gtt xhci_pci_renesas virtio_pci_modern_dev serio
[449036.196839] ---[ end trace 0000000000000000 ]---

I have older kernel logs since my rootfs isn't bcachefs, however I can't see anything relevent to this.

Unsure if this is relevant but I get a strange "compressed and incompressible extents" listing in

$ cat /sys/fs/bcachefs/f7fa14ed-5a8e-4b14-b39a-8b5c21f8bc25/compression_stats
type              compressed    uncompressed     average extent size
none                1.24 GiB        1.24 GiB                5.18 KiB
lz4_old                  0 B             0 B                     0 B
gzip                     0 B             0 B                     0 B
lz4                 5.25 GiB        6.80 GiB                49.5 KiB
zstd                 148 GiB         251 GiB                46.1 KiB
incompressible      1.07 TiB        1.07 TiB                34.5 KiB
7066 compressed & incompressible extents

After a reboot, the filesystem mounted without issue, the compressed and incompressible extents stayed (6859 of them).