Closed ojab closed 3 months ago
Not sure when it started, didn't updated the kernel for a long time and cannot properly bisect unfortunately.
Right now on kernel b37c1b07570d (and some previous revisions of bcachefs/master) I'm getting
BUG: kernel NULL pointer dereference, address: 00000000000008b4 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 1667 Comm: kworker/u33:2 Tainted: G W 6.10.0-rc3-ojab-00094-gb37c1b07570d #4 eb87ea158e8fc8a1f83425c32d7a09d228cd52ea Workqueue: bcachefs bch2_write_point_do_index_updates [bcachefs] RIP: 0010:_raw_spin_lock_irqsave+0x31/0x70
and processes doing IO to the bcachefs filesystem hang after that. Full dmesg: https://gist.github.com/ojab/e4ffe88743c3bcc0185a96823f77fba6 Stacktrace w/ faddr2line: https://gist.github.com/ojab/2f161a85040ad46ea7d601d9de4bfc24
]$ bcachefs fs usage -h /mnt/bcachefs/ Filesystem: 360fc60c-8c44-4f3e-9cc4-fbaeee9e7c3b Size: 51.5 TiB Used: 20.9 TiB Online reserved: 443 MiB Data type Required/total Durability Devices reserved: 1/1 [] 13.0 MiB btree: 1/2 2 [sda1 sdc1] 71.4 GiB btree: 1/2 2 [sda1 sdb1] 111 GiB user: 1/1 1 [sda1] 752 GiB user: 1/1 1 [sdc1] 11.7 TiB user: 1/1 1 [sdb1] 8.33 TiB cached: 1/1 1 [sda1] 10.5 TiB cached: 1/1 1 [sdc1] 3.46 TiB cached: 1/1 1 [sdb1] 5.32 TiB Compression: type compressed uncompressed average extent size zstd 1.25 TiB 1.61 TiB 59.0 KiB incompressible 27.9 TiB 27.9 TiB 92.2 KiB Btree usage: extents: 74.9 GiB inodes: 6.04 GiB dirents: 1.57 GiB xattrs: 512 KiB alloc: 15.0 GiB quotas: 512 KiB stripes: 512 KiB reflink: 2.50 MiB subvolumes: 512 KiB snapshots: 512 KiB lru: 3.27 GiB freespace: 54.0 MiB need_discard: 1.00 MiB backpointers: 80.4 GiB bucket_gens: 224 MiB snapshot_trees: 512 KiB deleted_inodes: 512 KiB logged_ops: 1.00 MiB rebalance_work: 799 MiB subvolume_children: 512 KiB accounting: 512 KiB hdd.hdd1 (device 0): sda1 rw data buckets fragmented free: 723 GiB 740398 sb: 3.00 MiB 4 1020 KiB journal: 8.00 GiB 8192 btree: 91.1 GiB 232242 136 GiB user: 752 GiB 859522 87.0 GiB cached: 10.2 TiB 17100564 6.06 TiB parity: 0 B 0 stripe: 0 B 0 need_gc_gens: 0 B 0 need_discard: 4.00 MiB 4 unstriped: 0 B 0 capacity: 18.1 TiB 18940926 hdd.hdd2 (device 1): sdc1 rw data buckets fragmented free: 723 GiB 740073 sb: 3.00 MiB 4 1020 KiB journal: 8.00 GiB 8192 btree: 35.7 GiB 91011 53.2 GiB user: 11.7 TiB 12286947 68.9 GiB cached: 3.37 TiB 5814698 2.17 TiB parity: 0 B 0 stripe: 0 B 0 need_gc_gens: 0 B 0 need_discard: 1.00 MiB 1 unstriped: 0 B 0 capacity: 18.1 TiB 18940926 hdd.hdd3 (device 4): sdb1 rw data buckets fragmented free: 723 GiB 740169 sb: 3.00 MiB 4 1020 KiB journal: 8.00 GiB 8192 btree: 55.4 GiB 141231 82.5 GiB user: 8.33 TiB 8832232 96.9 GiB cached: 5.17 TiB 9219095 3.62 TiB parity: 0 B 0 stripe: 0 B 0 need_gc_gens: 0 B 0 need_discard: 3.00 MiB 3 unstriped: 0 B 0 capacity: 18.1 TiB 18940926
$ bcachefs show-super /dev/sda1 Device: (unknown device) External UUID: 360fc60c-8c44-4f3e-9cc4-fbaeee9e7c3b Internal UUID: bc05affd-9fd1-4eb5-b497-3f7956ac57d2 Magic number: c68573f6-66ce-90a9-d96a-60cf803df7ef Device index: 0 Label: Version: 1.9: disk_accounting_v2 Version upgrade complete: 1.9: disk_accounting_v2 Oldest version on disk: 1.6: btree_subvolume_children Created: Fri Jun 16 22:38:16 2023 Sequence number: 834 Time of last write: Sun Jun 16 16:19:00 2024 Superblock size: 6.27 KiB/1.00 MiB Clean: 0 Devices: 3 Sections: members_v1,replicas_v0,quota,disk_groups,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade Features: zstd,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes Compat features: alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done Options: block_size: 4.00 KiB btree_node_size: 256 KiB errors: continue [ro] panic metadata_replicas: 2 data_replicas: 1 metadata_replicas_required: 1 data_replicas_required: 1 encoded_extent_max: 64.0 KiB metadata_checksum: none crc32c [crc64] xxhash data_checksum: none [crc32c] crc64 xxhash compression: none background_compression: none str_hash: crc32c crc64 [siphash] metadata_target: ssd foreground_target: ssd background_target: hdd promote_target: ssd erasure_code: 0 inodes_32bit: 1 shard_inode_numbers: 1 inodes_use_key_cache: 1 gc_reserve_percent: 5 gc_reserve_bytes: 0 B root_reserve_percent: 0 wide_macs: 0 acl: 1 usrquota: 1 grpquota: 1 prjquota: 1 journal_flush_delay: 1000 journal_flush_disabled: 0 journal_reclaim_delay: 100 journal_transaction_names: 1 version_upgrade: [compatible] incompatible none nocow: 0 members_v2 (size 880): Device: 0 Label: hdd1 (1) UUID: 56cb5559-5826-45a6-8da4-57110f4b7e04 Size: 18.1 TiB read errors: 24 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 1.00 MiB First bucket: 0 Buckets: 18940926 Last mount: Sun Jun 16 16:17:23 2024 Last superblock write: 834 State: rw Data allowed: journal,btree,user Has data: journal,btree,user,cached Btree allocated bitmap blocksize: 1.00 GiB Btree allocated bitmap: 0000000000000000000000000000011111111111111111111111111111111111 Durability: 1 Discard: 0 Freespace initialized: 1 Device: 1 Label: hdd2 (2) UUID: 4c1c7eff-f1e9-44b8-bcac-186fb4aa2367 Size: 18.1 TiB read errors: 35 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 1.00 MiB First bucket: 0 Buckets: 18940926 Last mount: Sun Jun 16 16:17:23 2024 Last superblock write: 834 State: rw Data allowed: journal,btree,user Has data: journal,btree,user,cached Btree allocated bitmap blocksize: 1.00 GiB Btree allocated bitmap: 0000000000000000000000000000011111111111111111111111111111111111 Durability: 1 Discard: 0 Freespace initialized: 1 Device: 4 Label: hdd3 (8) UUID: 37a22bef-419c-4077-958c-226b13516152 Size: 18.1 TiB read errors: 410 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 1.00 MiB First bucket: 0 Buckets: 18940926 Last mount: Sun Jun 16 16:17:23 2024 Last superblock write: 834 State: rw Data allowed: journal,btree,user Has data: journal,btree,user,cached Btree allocated bitmap blocksize: 1.00 GiB Btree allocated bitmap: 0000000000000000000000000000011111111111111111111111111111111111 Durability: 1 Discard: 0 Freespace initialized: 1 errors (size 312): journal_entries_missing 33 Thu Jan 11 10:45:52 2024 btree_node_bad_seq 5 Thu Jan 11 06:23:29 2024 fs_usage_hidden_wrong 1 Thu Jan 11 06:25:23 2024 fs_usage_data_wrong 1 Thu Jan 11 06:25:25 2024 fs_usage_cached_wrong 1 Thu Jan 11 06:25:26 2024 fs_usage_replicas_wrong 8 Thu Jan 11 06:25:28 2024 dev_usage_buckets_wrong 20 Thu Jan 11 06:25:23 2024 dev_usage_sectors_wrong 12 Thu Jan 11 06:25:23 2024 dev_usage_fragmented_wrong 8 Thu Jan 11 06:25:23 2024 alloc_key_data_type_wrong 15126 Thu Jan 11 08:59:55 2024 alloc_key_gen_wrong 10 Thu Jan 11 08:59:15 2024 alloc_key_dirty_sectors_wrong 15072 Thu Jan 11 08:59:55 2024 alloc_key_cached_sectors_wrong 1179 Thu Jan 11 08:59:55 2024 ptr_to_missing_alloc_key 54 Thu Jan 11 04:28:09 2024 ptr_gen_newer_than_bucket_gen 1198 Thu Jan 11 07:15:19 2024 stale_dirty_ptr 14 Thu Jan 11 07:19:11 2024 inode_unreachable 5 Wed Feb 14 20:40:25 2024 deleted_inode_missing 24 Sat Nov 25 11:44:52 2023 inode_points_to_missing_dirent 3 Wed Feb 14 20:24:11 2024
Fixed in my bcachefs-for-upstream branch - I'll be sending the fix to Linus this week "bcachefs: Fix bch2_trans_put()"
Not sure when it started, didn't updated the kernel for a long time and cannot properly bisect unfortunately.
Right now on kernel b37c1b07570d (and some previous revisions of bcachefs/master) I'm getting
and processes doing IO to the bcachefs filesystem hang after that. Full dmesg: https://gist.github.com/ojab/e4ffe88743c3bcc0185a96823f77fba6 Stacktrace w/ faddr2line: https://gist.github.com/ojab/2f161a85040ad46ea7d601d9de4bfc24