koverstreet / bcachefs

Other
699 stars 73 forks source link

btree_path_relock_fail [cd72194e001b] #450

Open colttt opened 2 years ago

colttt commented 2 years ago

Please search for duplicates

Version

Linux bcachefs 5.19.0-cd72194e001b-bcachefs #1 SMP PREEMPT_DYNAMIC Sat Aug 27 22:34:31 CEST 2022 x86_64 GNU/Linux

Generic info Provide the output of:

bcachefs fs usage
bcachefs fs usage
Filesystem: f80927af-08c7-4e41-b7ab-17a31d30a03a
Size:                  4969049767424
Used:                   822672320512
Online reserved:           181255168

Data type       Required/total  Devices
reserved:       1/1                [] 257253916672
btree:          1/2             [sdf sdi]                  490209280
btree:          1/2             [sdc sdg]                  419430400
btree:          1/2             [sdb sdf]                  387448832
btree:          1/2             [sdd sdj]                  571998208
btree:          1/2             [sdj sdk]                  359137280
btree:          1/2             [sdb sdk]                  188219392
btree:          1/2             [sdd sde]                  878706688
btree:          1/2             [sde sdi]                  273678336
btree:          1/2             [sdg sdj]                  634912768
btree:          1/2             [sdb sdd]                  743964672
btree:          1/2             [sdb sdi]                  547880960
btree:          1/2             [sdc sde]                  571998208
btree:          1/2             [sdc sdj]                  565182464
btree:          1/2             [sdd sdg]                  908591104
btree:          1/2             [sde sdf]                  974651392
btree:          1/2             [sde sdk]                  762314752
btree:          1/2             [sdf sdk]                  421527552
btree:          1/2             [sdi sdj]                  736100352
btree:          1/2             [sdb sdc]                  184025088
btree:          1/2             [sdb sde]                  508559360
btree:          1/2             [sdb sdg]                  856162304
btree:          1/2             [sdb sdj]                 1096810496
btree:          1/2             [sdc sdd]                   98041856
btree:          1/2             [sdc sdf]                 1412431872
btree:          1/2             [sdc sdi]                   82837504
btree:          1/2             [sdc sdk]                 1159200768
btree:          1/2             [sdd sdf]                  396361728
btree:          1/2             [sdd sdi]                  839909376
btree:          1/2             [sdd sdk]                  120061952
btree:          1/2             [sde sdg]                  362283008
btree:          1/2             [sde sdj]                  173539328
btree:          1/2             [sdf sdg]                   31981568
btree:          1/2             [sdf sdj]                  401080320
btree:          1/2             [sdg sdi]                  720896000
btree:          1/2             [sdg sdk]                  599785472
btree:          1/2             [sdi sdk]                  903872512
user:           1/2             [sdd sdk]                10317415424
user:           1/2             [sdb sdk]                26419536896
user:           1/2             [sdf sdj]                 6963166208
user:           1/2             [sdd sde]                 8595555328
user:           1/2             [sde sdj]                21213729792
user:           1/2             [sdg sdk]                34328058880
user:           1/2             [sdb sdf]                 4565458944
user:           1/2             [sdc sdg]                 7822145536
user:           1/2             [sdd sdi]                15753495552
user:           1/2             [sde sdg]                10602381312
user:           1/2             [sdf sdg]                21286385664
user:           1/2             [sdg sdi]                 4608747520
user:           1/2             [sdi sdk]                 5505317888
user:           1/2             [sdb sdd]                 8256819200
user:           1/2             [sdb sdi]                 2756539392
user:           1/2             [sdc sde]                 3014968320
user:           1/2             [sdc sdj]                16235658240
user:           1/2             [sdd sdg]                10062692352
user:           1/2             [sdd sdj]                21016419328
user:           1/2             [sde sdf]                 7547357184
user:           1/2             [sde sdi]                14610874368
user:           1/2             [sde sdk]                 6406190080
user:           1/2             [sdf sdi]                29908935680
user:           1/2             [sdf sdk]                12842311680
user:           1/2             [sdg sdj]                 8690582528
user:           1/2             [sdi sdj]                20889462784
user:           1/2             [sdj sdk]                 3875561472
user:           1/2             [sdb sdc]                 3073236992
user:           1/2             [sdb sde]                39565374464
user:           1/2             [sdb sdg]                14157358080
user:           1/2             [sdb sdj]                12748164096
user:           1/2             [sdc sdd]                30575054848
user:           1/2             [sdc sdf]                21375458304
user:           1/2             [sdc sdi]                17610496000
user:           1/2             [sdc sdk]                11889465344
user:           1/2             [sdd sdf]                 7052819456

(no label) (device 0):           sdb              rw
                                data         buckets    fragmented
  free:                            0         1015334
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                  2256535552            6946    1385168896
  user:                  55771244032          114172    4087765504
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  erasure coded:                   0               0
  capacity:             600126783488         1144651

(no label) (device 1):           sdc              rw
                                data         buckets    fragmented
  free:                            0         1015470
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                  2246574080            6946    1395130368
  user:                  55798241792          114036    3989464576
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  erasure coded:                   0               0
  capacity:             600126783488         1144651

(no label) (device 2):           sdd              rw
                                data         buckets    fragmented
  free:                            0         1015387
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                  2278817792            6956    1368129536
  user:                  55815135744          114109    4010843648
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  erasure coded:                   0               0
  capacity:             600126783488         1144651

(no label) (device 3):           sde              rw
                                data         buckets    fragmented
  free:                            0         1015403
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                  2252865536            6902    1365770240
  user:                  55778215424          114147    4067686912
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  erasure coded:                   0               0
  capacity:             600126783488         1144651

(no label) (device 4):           sdf              rw
                                data         buckets    fragmented
  free:                            0         1015431
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                  2257846272            6953    1387528192
  user:                  55770946560          114068    4033537024
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  erasure coded:                   0               0
  capacity:             600126783488         1144651

(no label) (device 5):           sdg              rw
                                data         buckets    fragmented
  free:                            0         1015457
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                  2267021312            6985    1395130368
  user:                  55779175936          114010    3994898944
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  erasure coded:                   0               0
  capacity:             600126783488         1144651

(no label) (device 6):           sdi              rw
                                data         buckets    fragmented
  free:                            0         1015332
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                  2297692160            7058    1402732544
  user:                  55821934592          114062    3979403264
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  erasure coded:                   0               0
  capacity:             600126783488         1144651

(no label) (device 7):           sdj              rw
                                data         buckets    fragmented
  free:                            0         1015393
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                  2269380608            6962    1380712448
  user:                  55816372224          114097    4003315712
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  erasure coded:                   0               0
  capacity:             600126783488         1144651

(no label) (device 8):           sdk              rw
                                data         buckets    fragmented
  free:                            0         1015395
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                  2257059840            6918    1369964544
  user:                  55791928832          114139    4049779200
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  erasure coded:                   0               0
  capacity:             600126783488         1144651

bcachefs show-super /dev/sdb
External UUID:                  f80927af-08c7-4e41-b7ab-17a31d30a03a
Internal UUID:                  6b635eb9-1a3d-4df1-acaf-13654ef10558
Device index:                   0
Label:                          
Version:                        backpointers
Oldest version on disk:         backpointers
Created:                        Sat Aug 27 23:35:46 2022
Sequence number:                115
Superblock size:                7488
Clean:                          0
Devices:                        9
Sections:                       members,replicas_v0,clean,journal_v2,counters
Features:                       new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                   512 B
  btree_node_size:              256 KiB
  errors:                       continue [ro] panic
  metadata_replicas:            2
  data_replicas:                2
  metadata_replicas_required:   1
  data_replicas_required:       1
  encoded_extent_max:           64.0 KiB
  metadata_checksum:            none [crc32c] crc64 xxhash
  data_checksum:                none [crc32c] crc64 xxhash
  compression:                  [none] lz4 gzip zstd
  background_compression:       [none] lz4 gzip zstd
  str_hash:                     crc32c crc64 [siphash]
  metadata_target:              none
  foreground_target:            none
  background_target:            none
  promote_target:               none
  erasure_code:                 0
  inodes_32bit:                 1
  shard_inode_numbers:          1
  inodes_use_key_cache:         1
  gc_reserve_percent:           8
  gc_reserve_bytes:             0 B
  root_reserve_percent:         0
  wide_macs:                    0
  acl:                          1
  usrquota:                     0
  grpquota:                     0
  prjquota:                     0
  journal_flush_delay:          1000
  journal_flush_disabled:       0
  journal_reclaim_delay:        100

members (size 512):
  Device:                       0
    UUID:                       7b8618ab-0f19-4fde-b8fb-bf4e464526c6
    Size:                       559 GiB
    Bucket size:                512 KiB
    First bucket:               0
    Buckets:                    1144651
    Last mount:                 Sat Aug 27 23:40:35 2022
    State:                      rw
    Label:                      (none)
    Data allowed:               journal,btree,user
    Has data:                   journal,btree,user
    Discard:                    0
    Freespace initialized:      1
  Device:                       1
    UUID:                       6721ee33-7235-4e5c-8c31-7c2801e7177d
    Size:                       559 GiB
    Bucket size:                512 KiB
    First bucket:               0
    Buckets:                    1144651
    Last mount:                 Sat Aug 27 23:40:35 2022
    State:                      rw
    Label:                      (none)
    Data allowed:               journal,btree,user
    Has data:                   journal,btree,user
    Discard:                    0
    Freespace initialized:      1
  Device:                       2
    UUID:                       3b6baa7a-0d72-4984-9d8a-b16090a33d25
    Size:                       559 GiB
    Bucket size:                512 KiB
    First bucket:               0
    Buckets:                    1144651
    Last mount:                 Sat Aug 27 23:40:35 2022
    State:                      rw
    Label:                      (none)
    Data allowed:               journal,btree,user
    Has data:                   journal,btree,user
    Discard:                    0
    Freespace initialized:      1
  Device:                       3
    UUID:                       32a1a8ff-e8af-43cc-8e0e-c5a95ea3663d
    Size:                       559 GiB
    Bucket size:                512 KiB
    First bucket:               0
    Buckets:                    1144651
    Last mount:                 Sat Aug 27 23:40:35 2022
    State:                      rw
    Label:                      (none)
    Data allowed:               journal,btree,user
    Has data:                   journal,btree,user
    Discard:                    0
    Freespace initialized:      1
  Device:                       4
    UUID:                       e65cc3ee-5023-43ee-a496-8c1c5cb36692
    Size:                       559 GiB
    Bucket size:                512 KiB
    First bucket:               0
    Buckets:                    1144651
    Last mount:                 Sat Aug 27 23:40:35 2022
    State:                      rw
    Label:                      (none)
    Data allowed:               journal,btree,user
    Has data:                   journal,btree,user
    Discard:                    0
    Freespace initialized:      1
  Device:                       5
    UUID:                       bc0820cf-1e1c-4c2f-8461-fd0525260c88
    Size:                       559 GiB
    Bucket size:                512 KiB
    First bucket:               0
    Buckets:                    1144651
    Last mount:                 Sat Aug 27 23:40:35 2022
    State:                      rw
    Label:                      (none)
    Data allowed:               journal,btree,user
    Has data:                   journal,btree,user
    Discard:                    0
    Freespace initialized:      1
  Device:                       6
    UUID:                       2d3717fd-95b7-4fde-9f8f-a8d4c634170f
    Size:                       559 GiB
    Bucket size:                512 KiB
    First bucket:               0
    Buckets:                    1144651
    Last mount:                 Sat Aug 27 23:40:35 2022
    State:                      rw
    Label:                      (none)
    Data allowed:               journal,btree,user
    Has data:                   journal,btree,user
    Discard:                    0
    Freespace initialized:      1
  Device:                       7
    UUID:                       ec2a02dd-5ef2-4e00-a5ea-332d65e652d8
    Size:                       559 GiB
    Bucket size:                512 KiB
    First bucket:               0
    Buckets:                    1144651
    Last mount:                 Sat Aug 27 23:40:35 2022
    State:                      rw
    Label:                      (none)
    Data allowed:               journal,btree,user
    Has data:                   journal,btree,user
    Discard:                    0
    Freespace initialized:      1
  Device:                       8
    UUID:                       ab4bbd01-1b8c-4a95-9847-c7e28279337b
    Size:                       559 GiB
    Bucket size:                512 KiB
    First bucket:               0
    Buckets:                    1144651
    Last mount:                 Sat Aug 27 23:40:35 2022
    State:                      rw
    Label:                      (none)
    Data allowed:               journal,btree,user
    Has data:                   journal,btree,user
    Discard:                    0
    Freespace initialized:      1

Kernel bugs Compile the kernel with these flags:

CONFIG_PREEMPT=y
CONFIG_BCACHEFS_DEBUG=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_DEBUG_FS=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FTRACE=y

Provide the output of dmesg either in a paste-bin or as attachment, if less than 30~ lines just provide inline here.

[Aug27 23:40] bcachefs (f80927af-08c7-4e41-b7ab-17a31d30a03a): recovering from clean shutdown, journal seq 4614
[  +0.163882] bcachefs (f80927af-08c7-4e41-b7ab-17a31d30a03a): going read-write
[  +0.032921] bcachefs (f80927af-08c7-4e41-b7ab-17a31d30a03a): mounted version=backpointers opts=metadata_replicas=2,data_replicas=2,noinodes_use_key_cache
[Aug27 23:48] BUG: MAX_LOCK_DEPTH too low!
[  +0.000039] turning off the locking correctness validator.
[  +0.000023] Please attach the output of /proc/lock_stat to the bug report
[  +0.000026] depth: 48  max: 48!
[  +0.000017] 48 locks held by kworker/20:0/1039:
[  +0.000021]  #0: ffff91d806da4548 ((wq_completion)bcachefs){....}-{0:0}, at: process_one_work+0x206/0x520
[  +0.000058]  #1: ffffad9c85aa7e70 ((work_completion)(&cl->work)#4){....}-{0:0}, at: process_one_work+0x206/0x520
[  +0.000051]  #2: ffff91d8059441a8 (&c->btree_trans_barrier){....}-{0:0}, at: __bch2_trans_init+0x17f/0x2b0
[  +0.000047]  #3: ffff91d821b05c80 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_lock+0x49b/0x4c0
[  +0.000044]  #4: ffff91d821b05c80 (b->c.lock){....}-{0:0}, at: bch2_btree_node_get+0x567/0x920
[  +0.000041]  #5: ffff91d821b05c80 (b->c.lock){....}-{0:0}, at: bch2_btree_node_get+0x567/0x920
[  +0.000040]  #6: ffff91d821b05c80 (b->c.lock){....}-{0:0}, at: bch2_btree_node_get+0x567/0x920
[  +0.000039]  #7: ffff91d802aa4880 (b->c.lock){....}-{0:0}, at: btree_path_traverse_one+0x6c0/0xd40
[  +0.000043]  #8: ffff91d836ccec70 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0xf0/0x250
[  +0.000043]  #9: ffff91dcb8b7bb78 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0xf0/0x250
[  +0.000043]  #10: ffff91d828251c40 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0xf0/0x250
[  +0.000044]  #11: ffff91d836ccd7a0 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0xf0/0x250
[  +0.000042]  #12: ffff91d82f1b1b18 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0xf0/0x250
[  +0.000044]  #13: ffff91d8282526a8 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0xf0/0x250
[  +0.000043]  #14: ffff91d8d43eb5b0 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0xf0/0x250
[  +0.000043]  #15: ffff91dcb8b7ab48 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0xf0/0x250
[  +0.000043]  #16: ffff91d8d43eb238 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0xf0/0x250
[  +0.000043]  #17: ffff91d802aa1480 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0xf0/0x250
[  +0.000043]  #18: ffff91d802aa4480 (b->c.lock){....}-{0:0}, at: btree_path_traverse_one+0x6c0/0xd40
[  +0.000043]  #19: ffff91d802aa4480 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0x23a/0x250
[  +0.000043]  #20: ffff91d802aa4480 (b->c.lock){....}-{0:0}, at: btree_path_traverse_one+0xaa1/0xd40
[  +0.000043]  #21: ffff91d802aa4480 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0x23a/0x250
[  +0.000043]  #22: ffff91d802aa4480 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0x23a/0x250
[  +0.000044]  #23: ffff91dcc0b01c80 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0xf0/0x250
[  +0.000043]  #24: ffff91db8227a880 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0xf0/0x250
[  +0.000043]  #25: ffff91db8242ac80 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0xf0/0x250
[  +0.000043]  #26: ffff91db8227a880 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0x23a/0x250
[  +0.000044]  #27: ffff91db820dec80 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0xf0/0x250
[  +0.000043]  #28: ffff91db8227a880 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0x23a/0x250
[  +0.000043]  #29: ffff91dc3f90f880 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0xf0/0x250
[  +0.000043]  #30: ffff91db8227a880 (b->c.lock){....}-{0:0}, at: __bch2_btree_node_relock+0x23a/0x250
[  +0.000044]  #31: ffff91d821b05c80 (b->c.lock){....}-{0:0}, at: btree_path_copy+0x91/0xa0
[  +0.000040]  #32: ffff91d805967888 (&c->gc_lock){....}-{3:3}, at: bch2_btree_update_start+0x4a6/0x890
[  +0.000041]  #33: ffff91d82f1b1678 (b->c.lock){....}-{0:0}, at: bch2_btree_path_traverse_cached+0x109d/0x1cd0
[  +0.000047]  #34: ffff91d802aa4480 (b->c.lock){....}-{0:0}, at: btree_path_traverse_one+0x596/0xd40
[  +0.000042]  #35: ffff91d82f1b3ca0 (b->c.lock){....}-{0:0}, at: bch2_btree_path_traverse_cached+0x109d/0x1cd0
[  +0.000046]  #36: ffff91db8c506c80 (b->c.lock){....}-{0:0}, at: bch2_btree_node_get+0x3b8/0x920
[  +0.000040]  #37: ffff91d82f1b2208 (b->c.lock){....}-{0:0}, at: bch2_btree_path_traverse_cached+0x109d/0x1cd0
[  +0.000046]  #38: ffff91dcea390880 (b->c.lock){....}-{0:0}, at: bch2_btree_node_get+0x39c/0x920
[  +0.000575]  #39: ffff91d802aa4480 (b->c.lock){....}-{0:0}, at: btree_path_copy+0x91/0xa0
[  +0.000574]  #40: ffff91d82f1b0ae8 (b->c.lock){....}-{0:0}, at: bch2_btree_path_traverse_cached+0x109d/0x1cd0
[  +0.000583]  #41: ffff91d82f1b26a8 (b->c.lock){....}-{0:0}, at: bch2_btree_path_traverse_cached+0x109d/0x1cd0
[  +0.000565]  #42: ffff91dcc0b4fc80 (b->c.lock){....}-{0:0}, at: bch2_btree_node_get+0x3b8/0x920
[  +0.000550]  #43: ffff91d875d06c80 (b->c.lock){....}-{0:0}, at: bch2_btree_node_alloc+0x5a/0x290
[  +0.000544]  #44: ffff91d875d00880 (b->c.lock){....}-{0:0}, at: bch2_btree_node_alloc+0x5a/0x290
[  +0.000533]  #45: ffff91d805941860 (&bc->lock){....}-{3:3}, at: bch2_btree_node_hash_insert+0x36/0xb0
[  +0.000531]  #46: ffffffffae6ddac0 (rcu_read_lock){....}-{1:2}, at: __bch2_btree_node_hash_insert+0x6e/0x480
[  +0.000546]  #47: ffff91d8987e0038 (rhashtable_bucket){....}-{0:0}, at: __bch2_btree_node_hash_insert+0x13b/0x480
[  +0.000547] INFO: lockdep is turned off.
[  +0.000522] CPU: 20 PID: 1039 Comm: kworker/20:0 Tainted: G            E     5.19.0-cd72194e001b-bcachefs #1
[  +0.000542] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 2.13.0 05/14/2021
[  +0.000534] Workqueue: bcachefs bch2_write_index
[  +0.000538] Call Trace:
[  +0.000537]  <TASK>
[  +0.000539]  dump_stack_lvl+0x45/0x5e
[  +0.000543]  __lock_acquire.constprop.0.cold+0x63/0x11a
[  +0.000546]  lock_acquire+0xb2/0x1a0
[  +0.000545]  ? __bch2_btree_node_hash_insert+0x13b/0x480
[  +0.000542]  __bch2_btree_node_hash_insert+0x163/0x480
[  +0.000546]  ? __bch2_btree_node_hash_insert+0x13b/0x480
[  +0.000555]  ? jhash+0x190/0x190
[  +0.000555]  bch2_btree_node_hash_insert+0x41/0xb0
[  +0.000566]  bch2_btree_node_alloc+0x199/0x290
[  +0.000565]  __btree_split_node+0x50/0x7e0
[  +0.000570]  ? _raw_spin_unlock_irqrestore+0x38/0x60
[  +0.000585]  ? bch2_btree_sort_into+0x104/0x1c0
[  +0.000581]  ? __bch2_btree_node_alloc_replacement+0x222/0x290
[  +0.000586]  ? bch2_btree_node_alloc_replacement+0xb0/0x140
[  +0.000586]  ? btree_split.isra.0+0x17d/0xad0
[  +0.000580]  btree_split.isra.0+0x17d/0xad0
[  +0.000589]  ? bch2_trans_mark_alloc+0x50/0x3a0
[  +0.000597]  bch2_btree_split_leaf+0x64/0x1c0
[  +0.000598]  ? bch2_extent_update+0x20e/0x2c0
[  +0.000601]  bch2_trans_commit_error+0x189/0x6c0
[  +0.000606]  ? bch2_btree_path_peek_slot+0xb0/0x310
[  +0.000615]  ? six_trylock_write+0x25/0x110
[  +0.000617]  __bch2_trans_commit+0xa26/0x2530
[  +0.000622]  bch2_extent_update+0x20e/0x2c0
[  +0.000628]  ? bch2_inode_peek+0xb8/0x120
[  +0.000620]  bch2_write_index_default+0x245/0x450
[  +0.000612]  ? __lock_acquire.constprop.0+0x125/0x550
[  +0.000602]  ? cpufreq_this_cpu_can_update+0x12/0x60
[  +0.000601]  ? bch2_write_index_default+0x1c0/0x450
[  +0.000595]  ? bch2_write_index_default+0xe6/0x450
[  +0.000590]  __bch2_write_index+0x313/0x4a0
[  +0.000577]  ? lock_acquire+0xbe/0x1a0
[  +0.000575]  bch2_write_index+0x13/0xe0
[  +0.000569]  process_one_work+0x29d/0x520
[  +0.000570]  worker_thread+0x52/0x3a0
[  +0.000569]  ? rescuer_thread+0x370/0x370
[  +0.000567]  kthread+0xe7/0x110
[  +0.000564]  ? kthread_complete_and_exit+0x20/0x20
[  +0.000578]  ret_from_fork+0x22/0x30
[  +0.000575]  </TASK>
[Aug28 01:00] perf: interrupt took too long (2511 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
[Aug28 01:04] ------------[ cut here ]------------
[  +0.000776] WARNING: CPU: 24 PID: 4129 at fs/bcachefs/fs-io.c:1287 bch2_writepage_io_done+0x2cc/0x360
[  +0.000761] Modules linked in: intel_rapl_msr(E) evdev(E) intel_rapl_common(E) sb_edac(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) ghash_clmulni_intel(E) aesni_intel(E) snd_pcm(E) libaes(E) crypto_simd(E) snd_timer(E) cryptd(E) rapl(E) snd(E) intel_cstate(E) soundcore(E) intel_uncore(E) nls_ascii(E) pcspkr(E) nls_cp437(E) efi_pstore(E) mgag200(E) drm_shmem_helper(E) iTCO_wdt(E) vfat(E) drm_kms_helper(E) mxm_wmi(E) mei_me(E) fat(E) i2c_algo_bit(E) intel_pmc_bxt(E) iTCO_vendor_support(E) mei(E) watchdog(E) ipmi_ssif(E) sg(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_power_meter(E) button(E) drm(E) fuse(E) configfs(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) btrfs(E) blake2b_generic(E) sd_mod(E) t10_pi(E) ses(E) sr_mod(E) enclosure(E) crc64_rocksoft(E) cdrom(E) crc_t10dif(E) scsi_transport_sas(E) crct10dif_generic(E) uas(E) usb_storage(E) ahci(E) libahci(E) libata(E) xhci_pci(E) xhci_hcd(E) crct10dif_pclmul(E)
[  +0.000183]  crct10dif_common(E) crc32_pclmul(E) crc32c_intel(E) ehci_pci(E) ehci_hcd(E) tg3(E) megaraid_sas(E) usbcore(E) lpc_ich(E) libphy(E) usb_common(E) ptp(E) scsi_mod(E) pps_core(E) scsi_common(E) wmi(E)
[  +0.007688] CPU: 24 PID: 4129 Comm: kworker/24:2 Tainted: G            E     5.19.0-cd72194e001b-bcachefs #1
[  +0.000761] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 2.13.0 05/14/2021
[  +0.000907] Workqueue: bcachefs bch2_write_index
[  +0.000770] RIP: 0010:bch2_writepage_io_done+0x2cc/0x360
[  +0.001055] Code: 80 65 6c f0 e8 65 4e 66 00 41 0f b7 86 08 04 00 00 41 39 c4 0f 8c 74 ff ff ff 49 8b 96 a8 01 00 00 48 85 d2 0f 8e 9b fd ff ff <0f> 0b 49 8b 76 60 48 8b 3c 24 e8 65 ec ff ff e9 8d fd ff ff f7 c3
[  +0.002150] RSP: 0018:ffffad9c8e7d7e20 EFLAGS: 00010202
[  +0.000832] RAX: 0000000080000000 RBX: ffff91d80b2a7500 RCX: 0000000000000000
[  +0.000978] RDX: 0000000000000008 RSI: ffffffffad4945ed RDI: 00000000ffffffff
[  +0.001538] RBP: ffff91d884a67568 R08: 0000000000000000 R09: ffff91db6e7eec30
[  +0.000847] R10: 0000000000000000 R11: 0000000000000000 R12: ffff91db6f7ee340
[  +0.000842] R13: ffffcd987f643300 R14: ffff91d884a67500 R15: ffffcd987f643305
[  +0.000875] FS:  0000000000000000(0000) GS:ffff91db6f600000(0000) knlGS:0000000000000000
[  +0.000862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000860] CR2: 0000555dcc6ee000 CR3: 0000000498b7a003 CR4: 00000000001706e0
[  +0.001110] Call Trace:
[  +0.000911]  <TASK>
[  +0.000863]  process_one_work+0x29d/0x520
[  +0.000876]  ? rescuer_thread+0x370/0x370
[  +0.000866]  worker_thread+0x52/0x3a0
[  +0.000950]  ? rescuer_thread+0x370/0x370
[  +0.000865]  kthread+0xe7/0x110
[  +0.000857]  ? kthread_complete_and_exit+0x20/0x20
[  +0.000862]  ret_from_fork+0x22/0x30
[  +0.000869]  </TASK>
[  +0.000903] ---[ end trace 0000000000000000 ]---
[Aug28 01:29] perf: interrupt took too long (3139 > 3138), lowering kernel.perf_event_max_sample_rate to 63500
[Aug28 02:33] perf: interrupt took too long (3926 > 3923), lowering kernel.perf_event_max_sample_rate to 50750
[Aug28 05:58] INFO: task bcachefs:12634 blocked for more than 120 seconds.
[  +0.000961]       Tainted: G        W   E     5.19.0-cd72194e001b-bcachefs #1
[  +0.000919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000927] task:bcachefs        state:D stack:    0 pid:12634 ppid:  1358 flags:0x00000000
[  +0.001054] Call Trace:
[  +0.000953]  <TASK>
[  +0.000930]  __schedule+0x370/0xb50
[  +0.001098]  schedule+0x4e/0xb0
[  +0.000898]  wb_wait_for_completion+0x56/0x80
[  +0.000905]  ? cpuacct_percpu_seq_show+0x10/0x10
[  +0.000874]  sync_inodes_sb+0xd3/0x320
[  +0.000851]  bch2_fs_file_ioctl+0x8ed/0xa30
[  +0.000834]  ? do_sys_openat2+0x86/0x160
[  +0.000819]  ? kmem_cache_free+0x170/0x3b0
[  +0.000797]  ? __x64_sys_ioctl+0x8b/0xc0
[  +0.000771]  __x64_sys_ioctl+0x8b/0xc0
[  +0.000764]  do_syscall_64+0x3b/0x90
[  +0.000730]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  +0.000719] RIP: 0033:0x7f966b57fcc7
[  +0.000773] RSP: 002b:00007ffdfbac2038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  +0.000706] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f966b57fcc7
[  +0.000689] RDX: 00007ffdfbac2060 RSI: 000000004020bc10 RDI: 0000000000000003
[  +0.000683] RBP: 0000000000000001 R08: 00005580c75bc110 R09: 0000000000000035
[  +0.000681] R10: 00007f966b8871c0 R11: 0000000000000246 R12: 00007ffdfbac3ea2
[  +0.000673] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
[  +0.000906]  </TASK>
[  +0.000626] INFO: lockdep is turned off.

Optional Advanced

perf trace -e bcachefs:*
     0.000 kworker/0:3+bc/16001 bcachefs:trans_traverse_all(trans_fn: "bch2_write_index_default", caller_ip: -1387617370)
     0.861 kworker/2:0+bc/15696 bcachefs:btree_path_relock_fail(trans_fn: "bch2_write_index_default", caller_ip: -1387929447, pos_inode: 1006638361, pos_offset: 1, pos_snapshot: 4294965055, node: "ffff91db097ed000", iter_lock_seq: 8948, node_lock_seq: 8948)
     0.881 kworker/2:0+bc/15696 bcachefs:trans_restart_relock_path(trans_fn: "bch2_write_index_default", caller_ip: -1387985036, pos_inode: 1006638361, pos_offset: 1, pos_snapshot: 4294965055)
     0.887 kworker/2:0+bc/15696 bcachefs:trans_restart_relock(trans_fn: "bch2_write_index_default", caller_ip: -1387985036, pos_inode: 1006638361, pos_offset: 1, pos_snapshot: 4294965055)
     0.893 kworker/2:0+bc/15696 bcachefs:btree_path_relock_fail(trans_fn: "bch2_write_index_default", caller_ip: -1387990999, pos_inode: 1006638361, pos_snapshot: 4294965055, node: "ffff91db097ed000", iter_lock_seq: 8948, node_lock_seq: 8948)
    40.573 kworker/0:3+bc/16001 bcachefs:btree_path_relock_fail(trans_fn: "bch2_write_index_default", caller_ip: -1387929447, pos_inode: 1006638361, pos_offset: 9, pos_snapshot: 4294965055, node: "ffff91db097ed000", iter_lock_seq: 8948, node_lock_seq: 8948)
    40.606 kworker/0:3+bc/16001 bcachefs:trans_restart_relock_path(trans_fn: "bch2_write_index_default", caller_ip: -1387985036, pos_inode: 1006638361, pos_offset: 9, pos_snapshot: 4294965055)
    40.626 kworker/0:3+bc/16001 bcachefs:trans_restart_relock(trans_fn: "bch2_write_index_default", caller_ip: -1387985036, pos_inode: 1006638361, pos_offset: 9, pos_snapshot: 4294965055)
    40.646 kworker/0:3+bc/16001 bcachefs:btree_path_relock_fail(trans_fn: "bch2_write_index_default", caller_ip: -1387990999, pos_inode: 1006638361, pos_offset: 8, pos_snapshot: 4294965055, node: "ffff91db097ed000", iter_lock_seq: 8948, node_lock_seq: 8948)
    40.562 kworker/2:0+bc/15696 bcachefs:trans_traverse_all(trans_fn: "bch2_write_index_default", caller_ip: -1387617370)
    78.873 kworker/0:3+bc/16001 bcachefs:trans_traverse_all(trans_fn: "bch2_write_index_default", caller_ip: -1387617370)
    78.876 kworker/2:0+bc/15696 bcachefs:btree_path_relock_fail(trans_fn: "bch2_write_index_default", caller_ip: -1387929447, pos_inode: 1006638361, pos_offset: 1, pos_snapshot: 4294965055, node: "ffff91db097ed000", iter_lock_seq: 8948, node_lock_seq: 8948)
    78.913 kworker/2:0+bc/15696 bcachefs:trans_restart_relock_path(trans_fn: "bch2_write_index_default", caller_ip: -1387985036, pos_inode: 1006638361, pos_offset: 1, pos_snapshot: 4294965055)
    78.932 kworker/2:0+bc/15696 bcachefs:trans_restart_relock(trans_fn: "bch2_write_index_default", caller_ip: -1387985036, pos_inode: 1006638361, pos_offset: 1, pos_snapshot: 4294965055)
    78.953 kworker/2:0+bc/15696 bcachefs:btree_path_relock_fail(trans_fn: "bch2_write_index_default", caller_ip: -1387990999, pos_inode: 1006638361, pos_snapshot: 4294965055, node: "ffff91db097ed000", iter_lock_seq: 8948, node_lock_seq: 8948)
    97.256 kworker/0:3+bc/16001 bcachefs:btree_path_relock_fail(trans_fn: "bch2_write_index_default", caller_ip: -1387929447, pos_inode: 1006638361, pos_offset: 9, pos_snapshot: 4294965055, node: "ffff91db097ed000", iter_lock_seq: 8948, node_lock_seq: 8948)
    97.288 kworker/0:3+bc/16001 bcachefs:trans_restart_relock_path(trans_fn: "bch2_write_index_default", caller_ip: -1387985036, pos_inode: 1006638361, pos_offset: 9, pos_snapshot: 4294965055)
    97.309 kworker/0:3+bc/16001 bcachefs:trans_restart_relock(trans_fn: "bch2_write_index_default", caller_ip: -1387985036, pos_inode: 1006638361, pos_offset: 9, pos_snapshot: 4294965055)
    97.326 kworker/0:3+bc/16001 bcachefs:btree_path_relock_fail(trans_fn: "bch2_write_index_default", caller_ip: -1387990999, pos_inode: 1006638361, pos_offset: 8, pos_snapshot: 4294965055, node: "ffff91db097ed000", iter_lock_seq: 8948, node_lock_seq: 8948)
    97.247 kworker/2:0+bc/15696 bcachefs:trans_traverse_all(trans_fn: "bch2_write_index_default", caller_ip: -1387617370)
  1125.664 kworker/0:3+bc/16001 bcachefs:trans_traverse_all(trans_fn: "bch2_write_index_default", caller_ip: -1387617370)
  1125.663 kworker/2:0+bc/15696 bcachefs:btree_path_relock_fail(trans_fn: "bch2_write_index_default", caller_ip: -1387929447, pos_inode: 1006638361, pos_offset: 1, pos_snapshot: 4294965055, node: "ffff91db097ed000", iter_lock_seq: 8948, node_lock_seq: 8948)
  1125.704 kworker/2:0+bc/15696 bcachefs:trans_restart_relock_path(trans_fn: "bch2_write_index_default", caller_ip: -1387985036, pos_inode: 1006638361, pos_offset: 1, pos_snapshot: 4294965055)
  1125.726 kworker/2:0+bc/15696 bcachefs:trans_restart_relock(trans_fn: "bch2_write_index_default", caller_ip: -1387985036, pos_inode: 1006638361, pos_offset: 1, pos_snapshot: 4294965055)
  1125.748 kworker/2:0+bc/15696 bcachefs:btree_path_relock_fail(trans_fn: "bch2_write_index_default", caller_ip: -1387990999, pos_inode: 1006638361, pos_snapshot: 4294965055, node: "ffff91db097ed000", iter_lock_seq: 8948, node_lock_seq: 8948)
  1164.995 kworker/0:3+bc/16001 bcachefs:btree_path_relock_fail(trans_fn: "bch2_write_index_default", caller_ip: -1387929447, pos_inode: 1006638361, pos_offset: 9, pos_snapshot: 4294965055, node: "ffff91db097ed000", iter_lock_seq: 8948, node_lock_seq: 8948)
  1165.028 kworker/0:3+bc/16001 bcachefs:trans_restart_relock_path(trans_fn: "bch2_write_index_default", caller_ip: -1387985036, pos_inode: 1006638361, pos_offset: 9, pos_snapshot: 4294965055)
  1165.048 kworker/0:3+bc/16001 bcachefs:trans_restart_relock(trans_fn: "bch2_write_index_default", caller_ip: -1387985036, pos_inode: 1006638361, pos_offset: 9, pos_snapshot: 4294965055)
  1165.067 kworker/0:3+bc/16001 bcachefs:btree_path_relock_fail(trans_fn: "bch2_write_index_default", caller_ip: -1387990999, pos_inode: 1006638361, pos_offset: 8, pos_snapshot: 4294965055, node: "ffff91db097ed000", iter_lock_seq: 8948, node_lock_seq: 8948)
  1164.986 kworker/2:0+bc/15696 bcachefs:trans_traverse_all(trans_fn: "bch2_write_index_default", caller_ip: -1387617370)
  1862.683 kworker/0:3+bc/16001 bcachefs:trans_traverse_all(trans_fn: "bch2_write_index_default", caller_ip: -1387617370)

and perf top shows:

Samples: 147K of event 'cycles', 4000 Hz, Event count (approx.): 44467964481 lost: 0/0 drop: 0/0
Overhead  Shared Object          Symbol
  45.20%  [kernel]               [k] __genradix_ptr
  31.12%  [kernel]               [k] bch2_btree_iter_peek_upto
  20.01%  [kernel]               [k] __six_lock_type_slowpath
   0.49%  [kernel]               [k] bch2_btree_path_verify_level
   0.48%  [kernel]               [k] bch2_btree_path_verify
   0.43%  [kernel]               [k] bch2_btree_path_verify_locks

How does it happen:

time bcachefs format --replicas=2 /dev/sda /dev/sdc /dev/sdb /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj
#mount it
fio --rw=randrw --name=test --numjobs=8 --runtime=86400 --iodepth=64 --size=120G --filename=/mnt/blubbssi --group_reporting --time_based
dbench -t=86400 -D /mnt 12
#and do snapshots:
i=0; while [ $i -le 1000001 ]; do bcachefs subvolume snapshot snap_$(date -u +%F-%H%M%S%N); echo $i > snap_number; x=$(( i++ )); sleep 1; done

side note: it doesn't ends after 86400seconds, fio said it need more than 5days

colttt commented 2 years ago

still happens with

Linux bcachefs 5.19.0-5498a3d87b2a
koverstreet commented 2 years ago

Damn, I thought I'd finally fixed all the livelocks.

Can you re-gather the tracepoint information?

alexminder commented 3 weeks ago

I have same issue: kernel version: 6.10.13

# perf trace -e 'bcachefs:*'
...
212157.378 kworker/1:3-ev/319 bcachefs:btree_path_upgrade_fail(trans_fn: "bch2_do_discards_work", caller_ip: -1067175661, btree_id: 12, pos_inode: 3, pos_offset: 31016, locked: 1, self_read_count: 1, read_count: 1, intent_count: 1, iter_lock_seq: 123909, node_lock_seq: 123909)
212157.469 kthreadd/3558 bcachefs:btree_path_relock_fail(trans_fn: "bch2_do_discards_work", caller_ip: -1067175446, btree_id: 12, pos_inode: 1, pos_offset: 990608, node: "ffff992057450400", intent_count: 1, iter_lock_seq: 123909, node_lock_seq: 123909)
...
  PerfTop:    5026 irqs/sec  kernel:93.0%  exact: 95.2% lost: 0/0 drop: 0/0 [4000Hz cycles:P],  (all, 4 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

     6.53%  [kernel]          [k] memchr_inv
     4.30%  [kernel]          [k] ZSTD_decompressSequences_default.constprop.0
     2.82%  [kernel]          [k] __kernel_poison_pages
     2.34%  [kernel]          [k] xhci_irq
     2.34%  [kernel]          [k] io_idle
     1.86%  [kernel]          [k] retbleed_return_thunk
     1.67%  [kernel]          [k] queued_spin_lock_slowpath
     1.49%  [kernel]          [k] acpi_safe_halt
     1.44%  [kernel]          [k] preempt_count_sub
     1.24%  [kernel]          [k] crc_pcl
     1.02%  [kernel]          [k] check_preemption_disabled
     0.88%  [kernel]          [k] xhci_queue_bulk_tx
     0.87%  [bcachefs]        [k] bch2_btree_node_iter_init
     0.83%  [kernel]          [k] sched_clock_noinstr
     0.82%  [bcachefs]        [k] __bch2_trans_commit
     0.80%  [kernel]          [k] preempt_count_add
     0.72%  [bcachefs]        [k] __bch2_bkey_cmp_packed_format_checked
     0.69%  [lz4_decompress]  [k] LZ4_decompress_safe_partial
     0.68%  [bcachefs]        [k] bch2_fs_btree_interior_update_init
     0.61%  [kernel]          [k] kfree
     0.59%  [bcachefs]        [k] __bch2_btree_path_set_pos
     0.58%  [kernel]          [k] __kmalloc_noprof
     0.56%  [kernel]          [k] stackleak_track_stack
     0.53%  [kernel]          [k] __irqentry_text_start
     0.53%  [kernel]          [k] HUF_decompress4X1_usingDTable_internal_default
     0.51%  perf              [.] hist_entry__sort
     0.51%  perf              [.] hists__findnew_entry
     0.51%  [bcachefs]        [k] bch2_btree_node_flush1
     0.50%  [bcachefs]        [k] bch2_bucket_ref_update
     0.50%  [bcachefs]        [k] bch2_bkey_ptrs_invalid
     0.49%  [bcachefs]        [k] bch2_opt_target_to_text
     0.49%  [bcachefs]        [k] __bch2_bkey_cmp_left_packed
     0.47%  [kernel]          [k] _raw_spin_lock_irqsave
     0.47%  [bcachefs]        [k] __bch2_bkey_unpack_key
     0.46%  perf              [.] rb_next
     0.45%  [bcachefs]        [k] bch2_sort_keys
     0.43%  [bcachefs]        [k] __bch2_time_stats_update
     0.43%  [kernel]          [k] iommu_v1_map_pages
     0.43%  [bcachefs]        [k] bch2_btree_iter_peek_upto
     0.42%  [kernel]          [k] menu_select
     0.40%  [bcachefs]        [k] bch2_journal_pins_to_text
     0.39%  perf              [.] hpp__sort_overhead
     0.38%  [bcachefs]        [k] bch2_btree_path_traverse_one
     0.37%  [kernel]          [k] get_page_from_freelist