Open Valmar33 opened 2 months ago
Are you using zswap?
Are you using zswap?
Yes. Does it have known bad interactions?
It's known to be buggy - should be fixed in 6.9, unless there's more bugs.
Reopen if you repro it without zswap
It's known to be buggy - should be fixed in 6.9, unless there's more bugs.
Cheers. Will disable. :)
Rebooted, confirmed zswap was disabled, mounted my backup drive (fsck seems to not hang anymore!), went to do my backup routine with rsync.
Hangs as soon as there is intense activity on the same bcachefs data partition...
(Mind you, I'm running bcachefs master ~ bcachefs: Kill gc_init_recurse() ~ rebased on top of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=71b1543c83d65af8215d7558d70fc2ecbee77dcf but I'm not sure how much difference it makes much difference here...)
24/4/24 9:24 AM kernel task:rsync state:D stack:0 pid:4045 tgid:4045 ppid:4044 flags:0x00000002
24/4/24 9:24 AM kernel Call Trace:
24/4/24 9:24 AM kernel <TASK>
24/4/24 9:24 AM kernel __schedule (kernel/sched/core.c:5409 kernel/sched/core.c:6746)
24/4/24 9:24 AM kernel ? __blk_mq_alloc_requests (block/blk-mq.c:515 (discriminator 1) block/blk-mq.c:333 (discriminator 1) block/blk-mq.c:516 (discriminator 1))
24/4/24 9:24 AM kernel schedule (./arch/x86/include/asm/preempt.h:84 (discriminator 13) kernel/sched/core.c:6824 (discriminator 13) kernel/sched/core.c:6838 (discriminator 13))
24/4/24 9:24 AM kernel io_schedule (kernel/sched/core.c:9019 (discriminator 1) kernel/sched/core.c:9045 (discriminator 1))
24/4/24 9:24 AM kernel folio_wait_bit_common (mm/filemap.c:1275 (discriminator 4))
24/4/24 9:24 AM kernel ? filemap_invalidate_unlock_two (mm/filemap.c:1091)
24/4/24 9:24 AM kernel migrate_pages_batch (./include/linux/pagemap.h:1048 mm/migrate.c:1486 mm/migrate.c:1700)
24/4/24 9:24 AM kernel ? defer_compaction (mm/compaction.c:1907)
24/4/24 9:24 AM kernel ? isolate_freepages_block (mm/compaction.c:1855)
24/4/24 9:24 AM kernel ? isolate_freepages_block (mm/compaction.c:1855)
24/4/24 9:24 AM kernel migrate_pages (mm/migrate.c:1948)
24/4/24 9:24 AM kernel ? defer_compaction (mm/compaction.c:1907)
24/4/24 9:24 AM kernel ? isolate_freepages_block (mm/compaction.c:1855)
24/4/24 9:24 AM kernel ? isolate_freepages_block (mm/compaction.c:1855)
24/4/24 9:24 AM kernel compact_zone (mm/compaction.c:2663)
24/4/24 9:24 AM kernel compact_zone_order (mm/compaction.c:2801 (discriminator 1))
24/4/24 9:24 AM kernel try_to_compact_pages (mm/compaction.c:2855)
24/4/24 9:24 AM kernel __alloc_pages_direct_compact (./include/linux/sched/mm.h:333 (discriminator 1) ./include/linux/sched/mm.h:434 (discriminator 1) mm/page_alloc.c:3534 (discriminator 1))
24/4/24 9:24 AM kernel __alloc_pages_slowpath.constprop.0 (mm/page_alloc.c:4130)
24/4/24 9:24 AM kernel ? try_charge_memcg (mm/memcontrol.c:2749 (discriminator 1))
24/4/24 9:24 AM kernel __alloc_pages (mm/page_alloc.c:4580)
24/4/24 9:24 AM kernel alloc_pages_mpol (mm/mempolicy.c:2266 (discriminator 1))
24/4/24 9:24 AM kernel folio_alloc (mm/mempolicy.c:2342)
24/4/24 9:24 AM kernel page_cache_ra_order (mm/readahead.c:468 mm/readahead.c:517)
24/4/24 9:24 AM kernel filemap_get_pages (mm/filemap.c:2522)
24/4/24 9:24 AM kernel filemap_read (mm/filemap.c:2601)
24/4/24 9:24 AM kernel bch2_read_iter (fs/bcachefs/fs-io-direct.c:208) bcachefs
24/4/24 9:24 AM kernel vfs_read (./include/linux/fs.h:2104 fs/read_write.c:395 fs/read_write.c:476)
24/4/24 9:24 AM kernel ksys_read (fs/read_write.c:619)
24/4/24 9:24 AM kernel do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
24/4/24 9:24 AM kernel ? do_syscall_64 (./arch/x86/include/asm/cpufeature.h:175 arch/x86/entry/common.c:98)
24/4/24 9:24 AM kernel ? syscall_exit_to_user_mode (kernel/entry/common.c:221)
24/4/24 9:24 AM kernel ? do_syscall_64 (./arch/x86/include/asm/cpufeature.h:175 arch/x86/entry/common.c:98)
24/4/24 9:24 AM kernel ? do_syscall_64 (./arch/x86/include/asm/cpufeature.h:175 arch/x86/entry/common.c:98)
24/4/24 9:24 AM kernel ? do_syscall_64 (./arch/x86/include/asm/cpufeature.h:175 arch/x86/entry/common.c:98)
24/4/24 9:24 AM kernel ? do_syscall_64 (./arch/x86/include/asm/cpufeature.h:175 arch/x86/entry/common.c:98)
24/4/24 9:24 AM kernel entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
24/4/24 9:24 AM kernel RIP: 0033:0x7cea9a1196a1
24/4/24 9:24 AM kernel RSP: 002b:00007fff5219ba18 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
24/4/24 9:24 AM kernel RAX: ffffffffffffffda RBX: 0000637ea2e153f0 RCX: 00007cea9a1196a1
24/4/24 9:24 AM kernel RDX: 0000000000040000 RSI: 0000637ea2e200d0 RDI: 0000000000000003
24/4/24 9:24 AM kernel RBP: 0000000000040000 R08: 00000000160c1f20 R09: 0000000000040000
24/4/24 9:24 AM kernel R10: 00007fff5219bab0 R11: 0000000000000246 R12: 0000000000000000
24/4/24 9:24 AM kernel R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000040000
24/4/24 9:24 AM kernel </TASK>
If I don't start KDE and run the rsync command in pure vt, then the rsync process never gets stuck and completes perfectly fine.
KDE never freezes, though... nor anything else on the SSD partition. Seems to be the external drive partition that gets stuck? But, I'm at a loss as to why having KDE running from the SSD bcachefs partition can cause rsync to get stuck reading data from the SSD and writing it to the external drive. External drive is in a USB caddy, though the drive itself isn't slow. It easily reaches 200MB/s when transferring big files.
But... it also seems to most easily get stuck when transferring massive files at high speeds.
Would it help if I trigger a freeze again? What would I need to do to get the state of the filesystem in that moment? I guess I'd need to get the current transaction or something it's stuck on. I'd appreciate it if you could walk me through the basics.
If you can get it to hang again, check echo w > /proc/sysrq-trigger; dmesg /sys/kernel/debug/bcachefs//btree_transactions /sys/fs/bcachefs//dev-0/alloc_debug /sys/fs/bcachefs/*/internal/journal_debug
I haven't been able to reproduce it since. Maybe another commit you pushed fixed something...
But, I'll keep trying.
Using bcachefs master ~ commit 17299843342b6095a7853220aeb4ae1d45ab2ba5
For context, I have a whole KDE Git installation on my bcachefs partition, used as a data partition separated from my root partition.
When I have this instance of KDE running, and an additional rsync task pushing a ton of data to a backup drive that is also bcachefs-based ~ 250 GB including KDE, massive games files, etc ~ the rsync task will occasionally just get stuck forever more.
bcachefs super:
Decoded backtrace: