koverstreet / bcachefs

Other
675 stars 70 forks source link

Bcachefs crash (Oops) in LZ4HC_compress_generic+0x3b3/0x1b90 #753

Open thememika opened 3 hours ago

thememika commented 3 hours ago

🥝Kernel: 6.10.0 Hello, developers. Ive just decided to enable compression on a production filesystem because it was critically running out of space. I did it by:

echo "lz4:15" > /sys/fs/bcachefs/MY_UID/options/compression

The filesystem was under heavy use at that moment. bch-rebalance thread became 100% CPU ,I think it means that compression has started.
But after less than a minute I got a crash of a kworker process in a workqueue.

[509621.528103] [T10653] BUG: unable to handle page fault for address: ffffc9002a03d000
[509621.528109] [T10653] #PF: supervisor write access in kernel mode
[509621.528110] [T10653] #PF: error_code(0x0002) - not-present page
[509621.528111] [T10653] PGD 100000067 P4D 100000067 PUD 1002f2067 PMD 9b9af8067 PTE 0
[509621.528116] [T10653] Oops: Oops: 0002 [#1] PREEMPT_RT SMP
[509621.528118] [T10653] CPU: 7 PID: 10653 Comm: kworker/u80:4 Kdump: loaded Tainted: G        W  OE      6.10.0-rc6-blahaj-rt+ #14
[509621.528121] [T10653] Hardware name: OEM X79G/X79G, BIOS 4.6.5 08/02/2022
[509621.528122] [T10653] Workqueue: writeback wb_workfn (flush-bcachefs-2)
[509621.528127] [T10653] RIP: 0010:LZ4HC_compress_generic+0x3b3/0x1b90
[509621.528131] [T10653] Code: ea ff 00 00 00 48 83 c0 01 c6 40 ff ff 81 fa fe 00 00 00 7f ea 88 10 48 83 c0 01 48 8d 14 30 49 8b 0b 48 83 c0 08 49 83 c3 08 <48> 89 48 f8 48 39 d0 72 ec 48 8b 9d 78 ff ff ff 48 8b 45 d0 44 8b
[509621.528133] [T10653] RSP: 0018:ffffc9002a4cb430 EFLAGS: 00010296
[509621.528135] [T10653] RAX: ffffc9002a03d001 RBX: ffffc9002a02d000 RCX: 2704138e8569e88d
[509621.528136] [T10653] RDX: ffffc9002a03cffa RSI: 000000000000fef9 RDI: ffffc90029f7cefd
[509621.528137] [T10653] RBP: ffffc9002a4cb540 R08: 0000000000000004 R09: 0000000000000001
[509621.528138] [T10653] R10: 0000000000000002 R11: ffffc90029f7cf00 R12: ffff888211da0000
[509621.528139] [T10653] R13: ffffc9002a02d000 R14: 0000000000000000 R15: 0000000000010000
[509621.528140] [T10653] FS:  0000000000000000(0000) GS:ffff888fff9c0000(0000) knlGS:0000000000000000
[509621.528141] [T10653] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[509621.528143] [T10653] CR2: ffffc9002a03d000 CR3: 00000007e58b7003 CR4: 00000000001706f0
[509621.528144] [T10653] Call Trace:
[509621.528146] [T10653]  <TASK>
[509621.528147] [T10653]  ? show_regs.part.0+0x22/0x24
[509621.528152] [T10653]  ? __die_body.cold+0x8/0x1c
[509621.528155] [T10653]  ? __die+0x2e/0x40
[509621.528157] [T10653]  ? page_fault_oops+0x102/0x270
[509621.528159] [T10653]  ? search_bpf_extables+0xdb/0xf0
[509621.528161] [T10653]  ? search_exception_tables+0x60/0x70
[509621.528164] [T10653]  ? fixup_exception+0x32/0x450
[509621.528166] [T10653]  ? kernelmode_fixup_or_oops.isra.0+0x45/0x50
[509621.528167] [T10653]  ? bad_area_nosemaphore+0x119/0x160
[509621.528169] [T10653]  ? do_kern_addr_fault+0x7b/0x90
[509621.528171] [T10653]  ? exc_page_fault+0x194/0x1a0
[509621.528175] [T10653]  ? asm_exc_page_fault+0x2b/0x30
[509621.528179] [T10653]  ? LZ4HC_compress_generic+0x3b3/0x1b90
[509621.528183] [T10653]  LZ4_compress_HC+0x94/0xb0
[509621.528185] [T10653]  attempt_compress+0x205/0x270
[509621.528190] [T10653]  ? kvmalloc_node_noprof+0x3f/0xe0
[509621.528193] [T10653]  ? mempool_kvmalloc+0x1e/0x20
[509621.528196] [T10653]  ? mempool_alloc_noprof+0x45/0x140
[509621.528199] [T10653]  bch2_bio_compress+0x21a/0x5a0
[509621.528201] [T10653]  __bch2_write+0x1791/0x1ba0
[509621.528205] [T10653]  bch2_write+0x19e/0x440
[509621.528208] [T10653]  ? bch2_write+0x19e/0x440
[509621.528211] [T10653]  __bch2_writepage+0x452/0x910
[509621.528213] [T10653]  ? bchfs_read.isra.0+0xd00/0xd00
[509621.528214] [T10653]  write_cache_pages+0x54/0x90
[509621.528218] [T10653]  bch2_writepages+0x59/0xe0
[509621.528219] [T10653]  ? rt_spin_unlock+0x1b/0x50
[509621.528222] [T10653]  do_writepages+0xd1/0x230
[509621.528223] [T10653]  ? psi_group_change+0x1c5/0x410
[509621.528227] [T10653]  __writeback_single_inode+0x44/0x340
[509621.528229] [T10653]  writeback_sb_inodes+0x222/0x5d0
[509621.528233] [T10653]  __writeback_inodes_wb+0x54/0x160
[509621.528234] [T10653]  wb_writeback+0x1ff/0x370
[509621.528236] [T10653]  ? get_nr_inodes+0x56/0xa0
[509621.528239] [T10653]  wb_workfn+0x1e6/0x480
[509621.528241] [T10653]  ? finish_task_switch.isra.0+0x8f/0x2b0
[509621.528244] [T10653]  process_one_work+0x188/0x360
[509621.528247] [T10653]  worker_thread+0x350/0x4e0
[509621.528249] [T10653]  ? max_active_store+0xe0/0xe0
[509621.528252] [T10653]  kthread+0xe5/0x110
[509621.528254] [T10653]  ? kthread_park+0x90/0x90
[509621.528256] [T10653]  ret_from_fork+0x3a/0x60
[509621.528258] [T10653]  ? kthread_park+0x90/0x90
[509621.528260] [T10653]  ret_from_fork_asm+0x11/0x20
[509621.528263] [T10653]  </TASK>
[509621.528264] [T10653] Modules linked in: quota_v2 nls_iso8859_1 tls 88x2bu(OE) option usb_wwan usbserial cdc_acm netlink_diag yellowfb(OE) whitefb(OE) redfb(OE) purplefb(OE) pinkfb(OE) peachfb(OE) orangefb(OE) magentafb(OE) indigofb(OE) greenfb(OE) cyanfb(OE) brownfb(OE) bluefb(OE) blackfb(OE) missmikatools(OE) mikakernelm cfg80211 8021q garp mrp stp llc nft_chain_nat xt_MASQUERADE xt_nat nf_nat nft_limit ipt_REJECT nf_reject_ipv4 xt_recent xt_limit xt_pkttype xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype xt_tcpudp nft_compat nf_tables x_tables nfnetlink intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel uvcvideo cdc_ether usbnet snd_hda_codec_realtek snd_hda_codec_generic videobuf2_vmalloc kvm snd_usb_audio snd_hda_codec_hdmi snd_hda_scodec_component uvc snd_hda_intel videobuf2_memops snd_intel_dspcfg videobuf2_v4l2 crct10dif_pclmul snd_intel_sdw_acpi videodev crc32_pclmul snd_hda_codec ghash_clmulni_intel sha512_ssse3 sha256_ssse3 iTCO_wdt intel_pmc_bxt
[509621.528306] [T10653]  missmikafs sha1_ssse3 iTCO_vendor_support aesni_intel snd_hda_core crypto_simd cryptd snd_usbmidi_lib rapl snd_rawmidi psmouse videobuf2_common r8152 intel_cstate snd_seq_device snd_hwdep mc snd_pcm r8169 mii pcspkr serio_raw snd_timer snd i2c_i801 ioatdma realtek i2c_mux lpc_ich i2c_smbus dca soundcore mac_hid
[509621.528321] [T10653] CR2: ffffc9002a03d000
[509621.528323] [T10653] ---[ end trace 0000000000000000 ]---
[509621.528410] [T10653] pstore: backend (erst) writing error (-28)
[509621.528410] [T10653] RIP: 0010:LZ4HC_compress_generic+0x3b3/0x1b90
[509621.528410] [T10653] Code: ea ff 00 00 00 48 83 c0 01 c6 40 ff ff 81 fa fe 00 00 00 7f ea 88 10 48 83 c0 01 48 8d 14 30 49 8b 0b 48 83 c0 08 49 83 c3 08 <48> 89 48 f8 48 39 d0 72 ec 48 8b 9d 78 ff ff ff 48 8b 45 d0 44 8b
[509621.528410] [T10653] RSP: 0018:ffffc9002a4cb430 EFLAGS: 00010296
[509621.528410] [T10653] RAX: ffffc9002a03d001 RBX: ffffc9002a02d000 RCX: 2704138e8569e88d
[509621.528410] [T10653] RDX: ffffc9002a03cffa RSI: 000000000000fef9 RDI: ffffc90029f7cefd
[509621.528410] [T10653] RBP: ffffc9002a4cb540 R08: 0000000000000004 R09: 0000000000000001
[509621.528410] [T10653] R10: 0000000000000002 R11: ffffc90029f7cf00 R12: ffff888211da0000
[509621.528410] [T10653] R13: ffffc9002a02d000 R14: 0000000000000000 R15: 0000000000010000
[509621.528410] [T10653] FS:  0000000000000000(0000) GS:ffff888fff9c0000(0000) knlGS:0000000000000000
[509621.528410] [T10653] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[509621.528410] [T10653] CR2: ffffc9002a03d000 CR3: 00000007e58b7003 CR4: 00000000001706f0
[509621.528410] [T10653] note: kworker/u80:4[10653] exited with irqs disabled

I believe that last your function was attempt_compress+0x205/0x270. Then crash in the lz4 module/code.
I don't know if it is your bug, or an issue with configuration. My configuration is this:

Device:                                     (unknown device)
External UUID:                             <uuid>
Internal UUID:                             <uuid>
Magic number:                              <uuid>
Device index:                              0
Label:                                     <name>
Version:                                   1.7: mi_btree_bitmap
Version upgrade complete:                  1.7: mi_btree_bitmap
Oldest version on disk:                    1.7: mi_btree_bitmap
Created:                                   <date>
Sequence number:                           34
Time of last write:                        Sat Sep 28 12:12:12 2024
Superblock size:                           4.66 KiB/1.00 MiB
Clean:                                     0
Devices:                                   1
Sections:                                  members_v1,replicas_v0,quota,clean,journal_v2,counters,members_v2,errors,ext,downgrade
Features:                                  lz4,zstd,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                           alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                              4.00 KiB
  btree_node_size:                         256 KiB
  errors:                                  continue fix_safe panic [ro] 
  metadata_replicas:                       1
  data_replicas:                           1
  metadata_replicas_required:              1
  data_replicas_required:                  1
  encoded_extent_max:                      64.0 KiB
  metadata_checksum:                       none [crc32c] crc64 xxhash 
  data_checksum:                           none [crc32c] crc64 xxhash 
  compression:                             lz4:15
  background_compression:                  none
  str_hash:                                [crc32c] crc64 siphash 
  metadata_target:                         none
  foreground_target:                       none
  background_target:                       none
  promote_target:                          none
  erasure_code:                            0
  inodes_32bit:                            1
  shard_inode_numbers:                     1
  inodes_use_key_cache:                    1
  gc_reserve_percent:                      8
  gc_reserve_bytes:                        0 B
  root_reserve_percent:                    0
  wide_macs:                               0
  acl:                                     1
  usrquota:                                1
  grpquota:                                1
  prjquota:                                1
  journal_flush_delay:                     1000
  journal_flush_disabled:                  0
  journal_reclaim_delay:                   100
  journal_transaction_names:               1
  version_upgrade:                         [compatible] incompatible none 
  nocow:                                   0

members_v2 (size 160):
Device:                                    0
  Label:                                   (none)
  UUID:                                    <uuid>
  Size:                                    119 GiB
  read errors:                             0
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 488386
  Last mount:                              Sun Sep 22 14:53:58 2024
  Last superblock write:                   34
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user
  Btree allocated bitmap blocksize:        4.00 MiB
  Btree allocated bitmap:                  0000000000000000000000000000000100000000000000000000000100011011
  Durability:                              1
  Discard:                                 1
  Freespace initialized:                   1

errors (size 8):

Thank you very much!

The filesystem is still up and it accepts reads writes. bch-rebalance is also still running at 100%. I don't know if a reboot is required anytime soon. Just the manual sync() calls don't return.

thememika commented 2 hours ago

UPD: bch-rebalance process has now just crashed in the same function (LZ4HC_compress_generic). I decided to find out about this function in source code

# scripts/faddr2line ./vmlinux LZ4HC_compress_generic+0x3b3/0x1b90
LZ4HC_compress_generic+0x3b3/0x1b90:
LZ4_copy8 at lib/lz4/lz4defs.h:158
(inlined by) LZ4_wildCopy at lib/lz4/lz4defs.h:180
(inlined by) LZ4HC_encodeSequence at lib/lz4/lz4hc_compress.c:296                               (inlined by) LZ4HC_compress_generic at lib/lz4/lz4hc_compress.c:402

Also for attempt_compress:

attempt_compress+0x205/0x270:
attempt_compress at fs/bcachefs/compress.c:323 (discriminator 2)

I find you make call to the lz4 function here:

int ret = LZ4_compress_HC(
   src,  dst,   src_len, dst_len, 
   compression.level,workspace
);

And that function crashes on this code:

static FORCE_INLINE void LZ4_copy8(void *dst, const void *src) {
  #if LZ4_ARCH64  // this was True during kernel build
    U64 a = get_unaligned((const U64 *)src);
    /* THE CRASH IS HERE -> */      put_unaligned(a, (U64 *)dst);
  #else
    U32 a = get_unaligned((const U32 *)src);
    U32 b = get_unaligned((const U32 *)src + 1);

    ...

Respectively this is put_unaligned which is a macro in include/asm-generic/unaligned.h:

#define put_unaligned(val, ptr) __put_unaligned_t(typeof(*(ptr)), (val), (ptr))

Respectively, macro "__put_unaligned_t" is finally defined in the same file:

#define __put_unaligned_t(type, val, ptr) do {                  \
    struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr);        \
    __pptr->x = (val);                          \
} while (0)