koverstreet / bcachefs

Other
643 stars 71 forks source link

btree trans held srcu lock (delaying memory reclaim) for X seconds #636

Open rubiksdot opened 6 months ago

rubiksdot commented 6 months ago

I have an 11 day copy (using cp) going on from a compressive btrfs filesystem to its replacement (background) compressing bcachefs filesystem. Both source and destination filesystems are on a spinning_rust.mdraid6.lvm stack. Kernel is 6.7.0.

Recently noticed the following in dmesg:

1072165.615689][T338756] ------------[ cut here ]------------
[1072165.616689][T338756] btree trans held srcu lock (delaying memory reclaim) for 35 seconds
[1072165.616706][T338756] WARNING: CPU: 3 PID: 338756 at fs/bcachefs/btree_iter.c:2851 bch2_trans_srcu_unlock+0x13c/0x150
[1072165.618203][T338756] Modules linked in: amdgpu drm_exec amdxcp mfd_core drm_buddy gpu_sched drm_ttm_helper ttm drm_suballoc_helper drm_display_helper cec i2c_algo_bit
[1072165.619665][T338756] CPU: 3 PID: 338756 Comm: bch-rebalance/d Tainted: G        W          6.7.0-local.20240108-141511 #3 bc70bf1190793ec24076c7b434222cb87bd2b68e
[1072165.620444][T338756] Hardware name: System manufacturer System Product Name/PRIME A320M-A, BIOS 5603 07/28/2020
[1072165.621429][T338756] RIP: 0010:bch2_trans_srcu_unlock+0x13c/0x150
[1072165.622236][T338756] Code: 8a bc bc 01 48 b8 cf f7 53 e3 a5 9b c4 20 48 c7 c7 c8 d4 23 83 48 29 ca 48 d1 ea 48 f7 e2 48 89 d6 48 c1 ee 04 e8 a4 a6 9d ff <0f> 0b e9 62 ff ff ff 0f 0b e9 71 ff ff ff 66 0f 1f 44 00 00 90 90
[1072165.623925][T338756] RSP: 0018:ffffc900097dfab8 EFLAGS: 00010246
[1072165.624761][T338756] RAX: 0000000000000000 RBX: ffff8881a1020000 RCX: 0000000000000000
[1072165.625594][T338756] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[1072165.626433][T338756] RBP: ffff88800f100000 R08: 0000000000000000 R09: 0000000000000000
[1072165.627272][T338756] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881a1020268
[1072165.628165][T338756] R13: ffff888114304b00 R14: ffff88800f100000 R15: 0000000000000000
[1072165.629090][T338756] FS:  0000000000000000(0000) GS:ffff888218f80000(0000) knlGS:0000000000000000
[1072165.629985][T338756] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1072165.630860][T338756] CR2: 00005589fbbebfc0 CR3: 0000000113982000 CR4: 00000000003506f0
[1072165.631735][T338756] Call Trace:
[1072165.632645][T338756]  <TASK>  
[1072165.633513][T338756]  ? bch2_trans_srcu_unlock+0x13c/0x150
[1072165.634406][T338756]  ? __warn+0x97/0x150
[1072165.635284][T338756]  ? bch2_trans_srcu_unlock+0x13c/0x150
[1072165.636213][T338756]  ? report_bug+0x196/0x1c0
[1072165.637116][T338756]  ? handle_bug+0x3c/0x70
[1072165.637795][T338756]  ? exc_invalid_op+0x23/0xa0
[1072165.638412][T338756]  ? asm_exc_invalid_op+0x1a/0x20
[1072165.639036][T338756]  ? bch2_trans_srcu_unlock+0x13c/0x150
[1072165.640024][T338756]  ? bch2_trans_srcu_unlock+0x13c/0x150
[1072165.641144][T338756]  ? bch2_trans_begin+0x136/0x610
[1072165.641769][T338756]  bch2_trans_begin+0x5a1/0x610
[1072165.642194][T338756]  ? srso_return_thunk+0x5/0x5f
[1072165.642603][T338756]  ? bch2_move_ratelimit+0x1d3/0x4a0
[1072165.643078][T338756]  ? do_rebalance+0x9c/0x8a0
[1072165.643523][T338756]  do_rebalance+0x1a6/0x8a0
[1072165.644094][T338756]  ? srso_return_thunk+0x5/0x5f
[1072165.644755][T338756]  ? local_clock_noinstr+0x5f/0xe0
[1072165.645203][T338756]  ? srso_return_thunk+0x5/0x5f
[1072165.645620][T338756]  ? srso_return_thunk+0x5/0x5f
[1072165.646031][T338756]  ? __bch2_trans_get+0x1c7/0x240
[1072165.646447][T338756]  ? srso_return_thunk+0x5/0x5f
[1072165.646863][T338756]  ? bch2_moving_ctxt_init+0x146/0x1d0
[1072165.647276][T338756]  ? __pfx_bch2_rebalance_thread+0x10/0x10
[1072165.647687][T338756]  bch2_rebalance_thread+0x61/0xb0
[1072165.648143][T338756]  ? bch2_rebalance_thread+0x57/0xb0
[1072165.648781][T338756]  kthread+0xcb/0xf0
[1072165.649223][T338756]  ? __pfx_kthread+0x10/0x10
[1072165.649637][T338756]  ret_from_fork+0x4b/0x60
[1072165.650040][T338756]  ? __pfx_kthread+0x10/0x10
[1072165.650438][T338756]  ret_from_fork_asm+0x1b/0x30
[1072165.650825][T338756]  </TASK>
[1072165.651190][T338756] ---[ end trace 0000000000000000 ]---

They happen repeatedly over the time period with various lenths for the amount of seconds the lock is held:

[1072165.616689][T338756] btree trans held srcu lock (delaying memory reclaim) for 35 seconds
[1072203.446919][T338756] btree trans held srcu lock (delaying memory reclaim) for 35 seconds
[987322.716475][T338756] btree trans held srcu lock (delaying memory reclaim) for 32 seconds
[988747.415842][T338756] btree trans held srcu lock (delaying memory reclaim) for 33 seconds
[988784.972137][T338756] btree trans held srcu lock (delaying memory reclaim) for 34 seconds
[1022480.192454][T338756] btree trans held srcu lock (delaying memory reclaim) for 30 seconds
[1022827.619529][T338756] btree trans held srcu lock (delaying memory reclaim) for 28 seconds
[1022860.251859][T338756] btree trans held srcu lock (delaying memory reclaim) for 30 seconds
[1045697.263945][T338756] btree trans held srcu lock (delaying memory reclaim) for 41 seconds
[1050440.571205][T338756] btree trans held srcu lock (delaying memory reclaim) for 45 seconds
[1050506.233990][T338756] btree trans held srcu lock (delaying memory reclaim) for 47 seconds
[1050557.943661][T338756] btree trans held srcu lock (delaying memory reclaim) for 48 seconds
[1051065.683168][T338756] btree trans held srcu lock (delaying memory reclaim) for 45 seconds
[1059120.342700][T338756] btree trans held srcu lock (delaying memory reclaim) for 35 seconds
[1059157.764569][T338756] btree trans held srcu lock (delaying memory reclaim) for 34 seconds
[1059912.780666][T493005] btree trans held srcu lock (delaying memory reclaim) for 11 seconds
[1070534.688086][T338756] btree trans held srcu lock (delaying memory reclaim) for 37 seconds
[942996.970787][T338756] btree trans held srcu lock (delaying memory reclaim) for 25 seconds
[943457.006985][T338756] btree trans held srcu lock (delaying memory reclaim) for 24 seconds
[943485.306989][T338756] btree trans held srcu lock (delaying memory reclaim) for 25 seconds
[862010.676101][T338756] btree trans held srcu lock (delaying memory reclaim) for 25 seconds
[879291.542771][T338756] btree trans held srcu lock (delaying memory reclaim) for 24 seconds
[879332.769206][T338756] btree trans held srcu lock (delaying memory reclaim) for 36 seconds
[879359.169416][T338756] btree trans held srcu lock (delaying memory reclaim) for 24 seconds
[888243.932052][T445667] btree trans held srcu lock (delaying memory reclaim) for 12 seconds
[897208.648821][T338756] btree trans held srcu lock (delaying memory reclaim) for 23 seconds
[897567.649348][T338756] btree trans held srcu lock (delaying memory reclaim) for 25 seconds
[897596.182694][T338756] btree trans held srcu lock (delaying memory reclaim) for 25 seconds
[807126.829154][T338756] btree trans held srcu lock (delaying memory reclaim) for 18 seconds
[807408.440627][T338756] btree trans held srcu lock (delaying memory reclaim) for 17 seconds
[807430.281827][T338756] btree trans held srcu lock (delaying memory reclaim) for 20 seconds
[716019.375240][T338756] btree trans held srcu lock (delaying memory reclaim) for 16 seconds
[717292.148021][T338756] btree trans held srcu lock (delaying memory reclaim) for 17 seconds
[717312.356714][T338756] btree trans held srcu lock (delaying memory reclaim) for 17 seconds
[583148.372608][T338756] btree trans held srcu lock (delaying memory reclaim) for 20 seconds
[583707.918748][T338756] btree trans held srcu lock (delaying memory reclaim) for 23 seconds
[583732.861901][T338756] btree trans held srcu lock (delaying memory reclaim) for 21 seconds
[506554.833025][T338756] btree trans held srcu lock (delaying memory reclaim) for 21 seconds
[507287.997304][T338756] btree trans held srcu lock (delaying memory reclaim) for 23 seconds
[507309.740678][T338756] btree trans held srcu lock (delaying memory reclaim) for 19 seconds
[421029.221155][T338756] btree trans held srcu lock (delaying memory reclaim) for 18 seconds
[421789.036106][T338756] btree trans held srcu lock (delaying memory reclaim) for 24 seconds
[421811.712858][T338756] btree trans held srcu lock (delaying memory reclaim) for 22 seconds
[459474.122133][T338756] btree trans held srcu lock (delaying memory reclaim) for 14 seconds
[460966.056123][T338756] btree trans held srcu lock (delaying memory reclaim) for 15 seconds
[460982.601206][T338756] btree trans held srcu lock (delaying memory reclaim) for 14 seconds
[352311.650515][T338756] btree trans held srcu lock (delaying memory reclaim) for 12 seconds
[352620.679891][T338756] btree trans held srcu lock (delaying memory reclaim) for 13 seconds
[352635.720085][T338756] btree trans held srcu lock (delaying memory reclaim) for 13 seconds
$ grep BCACHE /boot/config
CONFIG_BCACHE=y
# CONFIG_BCACHE_DEBUG is not set
# CONFIG_BCACHE_ASYNC_REGISTRATION is not set
CONFIG_FS_MBCACHE=y
CONFIG_BCACHEFS_FS=y
CONFIG_BCACHEFS_QUOTA=y
CONFIG_BCACHEFS_ERASURE_CODING=y
CONFIG_BCACHEFS_POSIX_ACL=y
# CONFIG_BCACHEFS_DEBUG_TRANSACTIONS is not set
# CONFIG_BCACHEFS_DEBUG is not set
# CONFIG_BCACHEFS_TESTS is not set
# CONFIG_BCACHEFS_LOCK_TIME_STATS is not set
# CONFIG_BCACHEFS_NO_LATENCY_ACCT is not set
$ bcachefs show-super <bcachefs device in question>
External UUID:                              4340e628-3daa-492b-aaf7-9844988b4159
Internal UUID:                              b8f32aa5-b0a1-459b-b1ef-22c424911ff0
Device index:                               0
Label:                                      tmptmp
Version:                                    1.3: rebalance_work
Version upgrade complete:                   1.3: rebalance_work
Oldest version on disk:                     1.3: rebalance_work
Created:                                    Wed Jan 10 14:46:57 2024
Sequence number:                            11
Superblock size:                            4512
Clean:                                      0
Devices:                                    1
Sections:                                   members_v1,replicas_v0,clean,journal_v2,counters,members_v2,errors
Features:                                   lz4,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                            alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                               4.00 KiB
  btree_node_size:                          256 KiB
  errors:                                   continue [ro] panic 
  metadata_replicas:                        1
  data_replicas:                            1
  metadata_replicas_required:               1
  data_replicas_required:                   1
  encoded_extent_max:                       64.0 KiB
  metadata_checksum:                        none crc32c [crc64] xxhash 
  data_checksum:                            none crc32c [crc64] xxhash 
  compression:                              none
  background_compression:                   lz4
  str_hash:                                 crc32c crc64 [siphash] 
  metadata_target:                          none
  foreground_target:                        none
  background_target:                        none
  promote_target:                           none
  erasure_code:                             0
  inodes_32bit:                             1
  shard_inode_numbers:                      1
  inodes_use_key_cache:                     1
  gc_reserve_percent:                       8
  gc_reserve_bytes:                         0 B
  root_reserve_percent:                     0
  wide_macs:                                0
  acl:                                      1
  usrquota:                                 0
  grpquota:                                 0
  prjquota:                                 0
  journal_flush_delay:                      1000
  journal_flush_disabled:                   0
  journal_reclaim_delay:                    100
  journal_transaction_names:                1
  version_upgrade:                          [compatible] incompatible none 
  nocow:                                    0

members_v2 (size 136):
  Device:                                   0
    Label:                                  (none)
    UUID:                                   801e22b2-c3a0-4e78-8d8b-442324459bbe
    Size:                                   2.00 TiB
    read errors:                            0
    write errors:                           0
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                4194304
    Last mount:                             Wed Jan 10 14:47:09 2024
    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user
    Discard:                                0
    Freespace initialized:                  1

replicas_v0 (size 24):
  btree: 1 [0] journal: 1 [0] user: 1 [0]

Biggest (related) question I have is: will this result in data loss. I probably have another 10 days to the copy left, so... :)

rubiksdot commented 6 months ago

Gah. This is what I get for doing this before I finish my coffee, on a Sunday.

Duplicate of #605

poelzi commented 1 month ago

Are you sure. This is from reblance and the other was from openat syscalls - I know nothing about the intrenals ;)

I got this:

[31523.251432] ------------[ cut here ]------------
[31523.251448] btree trans held srcu lock (delaying memory reclaim) for 309 seconds
[31523.251530] WARNING: CPU: 2 PID: 25059 at fs/bcachefs/btree_iter.c:2871 bch2_trans_srcu_unlock+0x140/0x168 [bcachefs]
[31523.251727] Modules linked in: bcachefs lz4hc_compress lz4_compress xor xor_neon raid6_pq nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype overlay ledtrig_netdev r8169 cfg80211 rfkill raspberrypi_cpufreq 8021q garp mrp clk_raspberrypi reset_raspberrypi raspberrypi_hwmon bcm2711_thermal broadcom bcm_phy_lib pcie_brcmstb crct10dif_ce genet nvmem_rmem nls_iso8859_1 nls_cp437 uio_pdrv_genirq uio xt_conntrack ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat nf_tables nfnetlink sch_fq_codel xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter veth tap macvlan bridge stp llc drm fuse backlight ip_tables x_tables dm_mod dax
[31523.251872] CPU: 2 PID: 25059 Comm: bch-rebalance/7 Not tainted 6.9.1 #1-NixOS
[31523.251879] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[31523.251883] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[31523.251890] pc : bch2_trans_srcu_unlock+0x140/0x168 [bcachefs]
[31523.252005] lr : bch2_trans_srcu_unlock+0x140/0x168 [bcachefs]
[31523.252107] sp : ffff800080b6b5f0
[31523.252110] x29: ffff800080b6b5f0 x28: 0000000000000000 x27: ffff09345d9281f8
[31523.252120] x26: ffff800080b6bb40 x25: ffff800080b6be30 x24: 0000000000000001
[31523.252129] x23: dead000000000100 x22: dead000000000122 x21: ffff800080b6be20
[31523.252139] x20: ffff09362acc0000 x19: ffff09356d360000 x18: ffffffffffffffff
[31523.252147] x17: 0000000000000020 x16: ffffb7337f0a9860 x15: ffff800080b6b220
[31523.252156] x14: ffffb733830455ad x13: ffffb733830455a1 x12: ffffb73382a53340
[31523.252165] x11: 0000000000000001 x10: 0000000000000001 x9 : ffffb7337f145758
[31523.252174] x8 : c0000000ffffefff x7 : ffffb733829fb170 x6 : 0000000000000001
[31523.252182] x5 : ffff09363ee90d88 x4 : 0000000000000000 x3 : 0000000000000027
[31523.252191] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff09354117b540
[31523.252200] Call trace:
[31523.252204]  bch2_trans_srcu_unlock+0x140/0x168 [bcachefs]
[31523.252308]  bch2_trans_unlock_long+0x28/0x40 [bcachefs]
[31523.252409]  bch2_moving_ctxt_do_pending_writes+0x6c/0x248 [bcachefs]
[31523.252515]  bch2_data_update_init+0x458/0xea0 [bcachefs]
[31523.252617]  bch2_move_extent+0x3f4/0x9c0 [bcachefs]
[31523.252721]  do_rebalance_extent+0x230/0x608 [bcachefs]
[31523.252826]  do_rebalance+0x268/0x730 [bcachefs]
[31523.252929]  bch2_rebalance_thread+0x70/0xc0 [bcachefs]
[31523.253032]  kthread+0xec/0xf8
[31523.253046]  ret_from_fork+0x10/0x20
[31523.253054] ---[ end trace 0000000000000000 ]---