koverstreet / bcachefs

Other
701 stars 73 forks source link

Compression will cause crash! Crash in LZ4HC_compress_generic — NOT fixed #775

Open thememika opened 2 weeks ago

thememika commented 2 weeks ago

Hello, this issue already existed in bcachefs and my original post was https://github.com/koverstreet/bcachefs/issues/753. It seemed to be absolutely fixed after I backported many commits which followed the issue. My current commit I use is 0f25eb4b60771f08fbcca878a8f7f88086d0c885 ("Rework logged op handling", branch bcachefs-for-upstream, 2024-10-04 20:25:32 -0400). Recently I have decided to compress some files on my filesystem and I have chosen lz4:6. I used background_compression attr for each of these files. It was done in the evening and I expected it to compress files overnight. But at 5:00 morning this happened:

[217141.096281] [  T971] BUG: unable to handle page fault for address: ffffc9006d892000
[217141.096287] [  T971] #PF: supervisor write access in kernel mode
[217141.096289] [  T971] #PF: error_code(0x0002) - not-present page
[217141.096291] [  T971] PGD 100000067 P4D 100000067 PUD 42a5c6067 PMD 4cd47b067 PTE 0
[217141.096296] [  T971] Oops: Oops: 0002 [#1] PREEMPT_RT SMP
[217141.096299] [  T971] CPU: 9 UID: 0 PID: 971 Comm: bch-rebalance/n Kdump: loaded Tainted: G        W          6.12.0-missmika-lts-rt-2+ #33
[217141.096303] [  T971] Tainted: [W]=WARN
[217141.096305] [  T971] RIP: 0010:LZ4HC_compress_generic+0x3b3/0x1b90
[217141.096310] [  T971] Code: ea ff 00 00 00 48 83 c0 01 c6 40 ff ff 81 fa fe 00 00 00 7f ea 88 10 48 83 c0 01 48 8d 14 30 49 8b 0b 48 83 c0 08 49 83 c3 08 <48> 89 48 f8 48 39 d0 72 ec 48 8b 9d 78 ff ff ff 48 8b 45 d0 44 8b
[217141.096312] [  T971] RSP: 0018:ffffc9005accf2d0 EFLAGS: 00010296
[217141.096314] [  T971] RAX: ffffc9006d892001 RBX: ffffc9006d882000 RCX: 366d7292583855a2
[217141.096316] [  T971] RDX: ffffc9006d891ffa RSI: 000000000000fef9 RDI: ffffc9006d880efd
[217141.096317] [  T971] RBP: ffffc9005accf3e0 R08: 0000000000000004 R09: 0000000000000001
[217141.096318] [  T971] R10: 0000000000000002 R11: ffffc9006d880f00 R12: ffff888d49220000
[217141.096319] [  T971] R13: ffffc9006d882000 R14: 0000000000000000 R15: 0000000000010000
[217141.096321] [  T971] FS:  0000000000000000(0000) GS:ffff888ffd200000(0000) knlGS:0000000000000000
[217141.096323] [  T971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[217141.096324] [  T971] CR2: ffffc9006d892000 CR3: 0000000007522005 CR4: 00000000001726f0
[217141.096326] [  T971] Call Trace:
[217141.096328] [  T971]  <TASK>
[217141.096330] [  T971]  ? show_regs.part.0+0x22/0x24
[217141.096336] [  T971]  ? __die_body.cold+0x8/0x1c
[217141.096340] [  T971]  ? __die+0x2e/0x40
[217141.096343] [  T971]  ? page_fault_oops+0x102/0x280
[217141.096347] [  T971]  ? search_bpf_extables+0x119/0x200
[217141.096351] [  T971]  ? search_exception_tables+0x60/0x70
[217141.096356] [  T971]  ? fixup_exception+0x32/0x450
[217141.096358] [  T971]  ? kernelmode_fixup_or_oops.isra.0+0x45/0x50
[217141.096358] [  T971]  ? __bad_area_nosemaphore+0x169/0x1a0
[217141.096358] [  T971]  ? bad_area_nosemaphore+0x16/0x20
[217141.096358] [  T971]  ? do_kern_addr_fault+0x7b/0x90
[217141.096358] [  T971]  ? exc_page_fault+0x28e/0x2c0
[217141.096358] [  T971]  ? asm_exc_page_fault+0x2b/0x30
[217141.096358] [  T971]  ? LZ4HC_compress_generic+0x3b3/0x1b90
[217141.096358] [  T971]  LZ4_compress_HC+0x94/0xb0
[217141.096358] [  T971]  attempt_compress+0x205/0x270
[217141.096358] [  T971]  ? __kvmalloc_node_noprof+0x3c/0xd0
[217141.096358] [  T971]  ? mempool_kvmalloc+0x1e/0x20
[217141.096358] [  T971]  ? mempool_alloc_noprof+0x5b/0x170
[217141.096358] [  T971]  bch2_bio_compress+0x220/0x5b0
[217141.096358] [  T971]  __bch2_write+0x17ad/0x1bd0
[217141.096358] [  T971]  bch2_write+0x1b7/0x460
[217141.096358] [  T971]  ? bch2_write+0x1b7/0x460
[217141.096358] [  T971]  ? bch2_trans_unlock_long+0x29/0x1c0
[217141.096358] [  T971]  bch2_data_update_read_done+0x88/0x90
[217141.096358] [  T971]  bch2_moving_ctxt_do_pending_writes+0x101/0x2c0
[217141.096358] [  T971]  bch2_move_ratelimit+0x1e0/0x5e0
[217141.096358] [  T971]  ? bch2_move_data_btree+0x3bd/0x570
[217141.096358] [  T971]  bch2_move_data_btree+0x173/0x570
[217141.096358] [  T971]  ? bch2_fs_quota_read+0x6f0/0x6f0
[217141.096358] [  T971]  ? __bch2_btree_path_set_pos+0x278/0x630
[217141.096358] [  T971]  ? bch2_move_data_btree+0x154/0x570
[217141.096358] [  T971]  __bch2_move_data+0xd9/0x1f0
[217141.096358] [  T971]  ? __bch2_move_data+0xd9/0x1f0
[217141.096358] [  T971]  ? bch2_fs_quota_read+0x6f0/0x6f0
[217141.096358] [  T971]  do_rebalance+0x4f0/0x8a0
[217141.096358] [  T971]  ? do_rebalance+0x82/0x8a0
[217141.096358] [  T971]  ? do_rebalance+0x517/0x8a0
[217141.096358] [  T971]  ? do_rebalance+0x8a0/0x8a0
[217141.096358] [  T971]  bch2_rebalance_thread+0x53/0x80
[217141.096358] [  T971]  ? bch2_rebalance_thread+0x49/0x80
[217141.096358] [  T971]  ? irq_cpu_rmap_add+0x140/0x140
[217141.096358] [  T971]  kthread+0xe8/0x120
[217141.096358] [  T971]  ? kthread_park+0x90/0x90
[217141.096358] [  T971]  ret_from_fork+0x3a/0x60
[217141.096358] [  T971]  ? kthread_park+0x90/0x90
[217141.096358] [  T971]  ret_from_fork_asm+0x11/0x20
[217141.096358] [  T971]  </TASK>
[217141.096358] [  T971] Modules linked in: cfg80211 missmikatoolsintree mikakernelm 8021q garp mrp stp llc nft_chain_nat xt_MASQUERADE xt_nat nf_nat nft_limit ipt_REJECT nf_reject_ipv4 xt_recent xt_limit xt_pkttype xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype xt_tcpudp nft_compat nf_tables x_tables nfnetlink intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul cdc_ether crc32_pclmul ghash_clmulni_intel usbnet missmikafs sha512_ssse3 sha256_ssse3 sha1_ssse3 snd_hda_codec_realtek aesni_intel crypto_simd snd_hda_codec_generic cryptd snd_hda_scodec_component snd_hda_codec_hdmi rapl uvcvideo snd_usb_audio intel_cstate videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videobuf2_common snd_hda_intel snd_usbmidi_lib snd_intel_dspcfg videodev snd_intel_sdw_acpi snd_rawmidi psmouse snd_seq_device snd_hda_codec r8152 mc pcspkr serio_raw mii snd_hda_core iTCO_wdt intel_pmc_bxt iTCO_vendor_support snd_hwdep i2c_i801 snd_pcm i2c_mux i2c_smbus snd_timer lpc_ich snd ioatdma dca soundcore mac_hid
[217141.096358] [  T971] CR2: ffffc9006d892000
[217141.096358] [  T971] ---[ end trace 0000000000000000 ]---
[217141.096358] [  T971] pstore: backend (erst) writing error (-28)
[217141.096358] [  T971] RIP: 0010:LZ4HC_compress_generic+0x3b3/0x1b90
[217141.096358] [  T971] Code: ea ff 00 00 00 48 83 c0 01 c6 40 ff ff 81 fa fe 00 00 00 7f ea 88 10 48 83 c0 01 48 8d 14 30 49 8b 0b 48 83 c0 08 49 83 c3 08 <48> 89 48 f8 48 39 d0 72 ec 48 8b 9d 78 ff ff ff 48 8b 45 d0 44 8b
[217141.096358] [  T971] RSP: 0018:ffffc9005accf2d0 EFLAGS: 00010296
[217141.096358] [  T971] RAX: ffffc9006d892001 RBX: ffffc9006d882000 RCX: 366d7292583855a2
[217141.096358] [  T971] RDX: ffffc9006d891ffa RSI: 000000000000fef9 RDI: ffffc9006d880efd
[217141.096358] [  T971] RBP: ffffc9005accf3e0 R08: 0000000000000004 R09: 0000000000000001
[217141.096358] [  T971] R10: 0000000000000002 R11: ffffc9006d880f00 R12: ffff888d49220000
[217141.096358] [  T971] R13: ffffc9006d882000 R14: 0000000000000000 R15: 0000000000010000
[217141.096358] [  T971] FS:  0000000000000000(0000) GS:ffff888ffd200000(0000) knlGS:0000000000000000
[217141.096358] [  T971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[217141.096358] [  T971] CR2: ffffc9006d892000 CR3: 0000000007522005 CR4: 00000000001726f0
[217141.096358] [  T971] note: bch-rebalance/n[971] exited with irqs disabled

This is my configuration.

Device:                                     (unknown device)
Device index:                              0
Version:                                   1.13: inode_has_child_snapshots
Version upgrade complete:                  1.13: inode_has_child_snapshots
Oldest version on disk:                    1.7: mi_btree_bitmap
Sequence number:                           514
Superblock size:                           4.54 KiB/1.00 MiB
Clean:                                     0
Devices:                                   1
Sections:                                  members_v1,replicas_v0,quota,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade
Features:                                  lz4,zstd,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                           alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                              4.00 KiB
  btree_node_size:                         256 KiB
  errors:                                  continue [fix_safe] panic ro 
  metadata_replicas:                       1
  data_replicas:                           1
  metadata_replicas_required:              1
  data_replicas_required:                  1
  encoded_extent_max:                      64.0 KiB
  metadata_checksum:                       none [crc32c] crc64 xxhash 
  data_checksum:                           none [crc32c] crc64 xxhash 
  compression:                             none
  background_compression:                  none
  str_hash:                                [crc32c] crc64 siphash 
  metadata_target:                         none
  foreground_target:                       none
  background_target:                       none
  promote_target:                          none
  erasure_code:                            0
  inodes_32bit:                            1
  shard_inode_numbers:                     1
  inodes_use_key_cache:                    1
  gc_reserve_percent:                      8
  gc_reserve_bytes:                        0 B
  root_reserve_percent:                    0
  wide_macs:                               0
  promote_whole_extents:                   1
  acl:                                     1
  usrquota:                                1
  grpquota:                                1
  prjquota:                                1
  journal_flush_delay:                     1000
  journal_flush_disabled:                  0
  journal_reclaim_delay:                   100
  journal_transaction_names:               1
  allocator_stuck_timeout:                 30
  version_upgrade:                         [compatible] incompatible none 
  nocow:                                   0

members_v2 (size 160):
Device:                                    0
  Label:                                   (none)
  Size:                                    358 GiB
  read errors:                             5340
  write errors:                            0
  checksum errors:                         0
  seqread iops:                            0
  seqwrite iops:                           0
  randread iops:                           0
  randwrite iops:                          0
  Bucket size:                             256 KiB
  First bucket:                            0
  Buckets:                                 1465159
  Last mount:                              Sat Nov  1 17:09:39 2024
  Last superblock write:                   514
  State:                                   rw
  Data allowed:                            journal,btree,user
  Has data:                                journal,btree,user
  Btree allocated bitmap blocksize:        16.0 MiB
  Btree allocated bitmap:                  0000000000000000000000000001000000000001100000100000000100110011
  Durability:                              1
  Discard:                                 1
  Freespace initialized:                   1

errors (size 72):
lru_entry_bad                               4               Sun Oct 13 21:54:21 2024
inode_unreachable                           20              Sat Oct 12 10:30:23 2024
deleted_inode_but_clean                     12              Fri Nov  1 16:52:10 2024
dirent_to_missing_inode                     15              Sat Oct 12 10:29:17 2024

Thanks. UPD: I'm now using latest commit of bcachefs-for-upstream, nothing changed. And this crash happens ALWAYS, immediately after "done starting filesystem"

thememika commented 2 weeks ago

Sorry that was a mistype in the name. It's not about encryption

thememika commented 2 weeks ago

In fs/bcachefs/compress.c:407:

workspace = mempool_alloc(&c->compress_workspace[compression_type], GFP_NOFS);

You never check for allocation failure afterwards. And attempt_compress also doesn't when you pass workspace to it. And so all the way until LZ4_compress_HC() or LZ4_compress_destSize() who also do not. Can it be the source of the problem? That would explain why this problem is more likely to be reproduced when host is under memory pressure. Ohh I hope everything was so simple. I'll try to add a check and rebuild. But I'm still unsure how to properly return error there after alloc

thememika commented 2 weeks ago

Updated to your latest commit, then also added printk() on mempool_alloc failure. The printk wasn't hit — my theory was wrong. After about 3 minutes since FS start, same crash occured ...

thememika commented 2 weeks ago

@koverstreet I have to use my FS with rebalance thread dead... is this a known problem, is it being worked on?

thememika commented 2 weeks ago

If anyone experiences this problem, you can try mount opts noquota,nogrpquota,noprjquota,fsck,fix_errors, and after FSCK, the rebalance work will be done smoothly — wait until the task bch-rebalance/<blkdev> drops to 0% CPU. Then you can unmount. Then mount as always, with quotas. It was the magic temporary fix for me.