namjaejeon / linux-exfat-oot

exFAT for Linux (Backport for low kernel version support)
85 stars 17 forks source link

cpu stall in exfat driver? #21

Closed tmm1 closed 3 years ago

tmm1 commented 3 years ago

I'm seeing this issue with exfat-next on a raspberrypi with USB drive. Have you seen anything like it before?


Apr 27 17:45:06 dvr-server kernel: rcu: INFO: rcu_sched self-detected stall on CPU
Apr 27 17:45:06 dvr-server kernel: rcu:         0-....: (779176 ticks this GP) idle=9fe/1/0x4000000000000002 softirq=2500332/2500470 fqs=388572 
Apr 27 17:45:06 dvr-server kernel: rcu:          (t=777147 jiffies g=27281 q=27)
Apr 27 17:45:06 dvr-server kernel: Task dump for CPU 0:
Apr 27 17:45:06 dvr-server kernel: channels-dvr    R  running task        0   972      1 0x0000020a
Apr 27 17:45:06 dvr-server kernel: Call trace:
Apr 27 17:45:06 dvr-server kernel:  dump_backtrace+0x0/0x170
Apr 27 17:45:06 dvr-server kernel:  show_stack+0x24/0x30
Apr 27 17:45:06 dvr-server kernel:  sched_show_task+0x144/0x170
Apr 27 17:45:06 dvr-server kernel:  dump_cpu_task+0x48/0x60
Apr 27 17:45:06 dvr-server kernel:  rcu_dump_cpu_stacks+0x98/0xd4
Apr 27 17:45:06 dvr-server kernel:  rcu_check_callbacks+0x7e4/0x980
Apr 27 17:45:06 dvr-server kernel:  update_process_times+0x34/0x80
Apr 27 17:45:06 dvr-server kernel:  tick_sched_handle.isra.0+0x44/0x70
Apr 27 17:45:06 dvr-server kernel:  tick_sched_timer+0x5c/0xb0
Apr 27 17:45:06 dvr-server kernel:  __hrtimer_run_queues+0x150/0x3a0
Apr 27 17:45:06 dvr-server kernel:  hrtimer_interrupt+0xfc/0x260
Apr 27 17:45:06 dvr-server kernel:  arch_timer_handler_phys+0x3c/0x50
Apr 27 17:45:06 dvr-server kernel:  handle_percpu_devid_irq+0xa4/0x2b0
Apr 27 17:45:06 dvr-server kernel:  generic_handle_irq+0x34/0x50
Apr 27 17:45:06 dvr-server kernel:  __handle_domain_irq+0x98/0x110
Apr 27 17:45:06 dvr-server kernel:  gic_handle_irq+0x58/0xb0
Apr 27 17:45:06 dvr-server kernel:  el1_irq+0xb4/0x130
Apr 27 17:45:06 dvr-server kernel:  exfat_iget+0x58/0xd0 [exfat]
Apr 27 17:45:06 dvr-server kernel:  exfat_build_inode+0x30/0x2f0 [exfat]
Apr 27 17:45:06 dvr-server kernel:  exfat_lookup+0xf4/0x230 [exfat]
Apr 27 17:45:06 dvr-server kernel:  __lookup_slow+0x98/0x170
Apr 27 17:45:06 dvr-server kernel:  lookup_slow+0x44/0x70
Apr 27 17:45:06 dvr-server kernel:  walk_component+0x21c/0x330
Apr 27 17:45:06 dvr-server kernel:  path_lookupat.isra.0+0xa4/0x210
Apr 27 17:45:06 dvr-server kernel:  filename_lookup+0x9c/0x170
Apr 27 17:45:06 dvr-server kernel:  user_path_at_empty+0x58/0x70
Apr 27 17:45:06 dvr-server kernel:  vfs_statx+0x90/0x110
Apr 27 17:45:06 dvr-server kernel:  __se_sys_newfstatat+0x54/0x90
Apr 27 17:45:06 dvr-server kernel:  __arm64_sys_newfstatat+0x24/0x30
Apr 27 17:45:06 dvr-server kernel:  el0_svc_common+0x88/0x1b0
Apr 27 17:45:06 dvr-server kernel:  el0_svc_handler+0x38/0x80
Apr 27 17:45:06 dvr-server kernel:  el0_svc+0x8/0xc
Apr 27 17:45:06 dvr-server kernel: Task dump for CPU 3:
Apr 27 17:45:06 dvr-server kernel: kswapd0         R  running task        0    46      2 0x0000002a
Apr 27 17:45:06 dvr-server kernel: Call trace:
Apr 27 17:45:06 dvr-server kernel:  __switch_to+0xfc/0x160
Apr 27 17:45:06 dvr-server kernel:  release_pages+0x2e4/0x380
Apr 27 17:45:06 dvr-server kernel:  0x1
Apr 27 17:46:03 dvr-server kernel: rcu: INFO: rcu_preempt self-detected stall on CPU
Apr 27 17:46:03 dvr-server kernel: rcu:         3-....: (792899 ticks this GP) idle=156/1/0x4000000000000002 softirq=1025377/1025377 fqs=396449 
Apr 27 17:46:03 dvr-server kernel: rcu:          (t=792900 jiffies g=4621797 q=16324)
Apr 27 17:46:03 dvr-server kernel: Task dump for CPU 3:
Apr 27 17:46:03 dvr-server kernel: kswapd0         R  running task        0    46      2 0x0000002a
Apr 27 17:46:03 dvr-server kernel: Call trace:
Apr 27 17:46:03 dvr-server kernel:  dump_backtrace+0x0/0x170
Apr 27 17:46:03 dvr-server kernel:  show_stack+0x24/0x30
Apr 27 17:46:03 dvr-server kernel:  sched_show_task+0x144/0x170
Apr 27 17:46:03 dvr-server kernel:  dump_cpu_task+0x48/0x60
Apr 27 17:46:03 dvr-server kernel:  rcu_dump_cpu_stacks+0x98/0xd4
Apr 27 17:46:03 dvr-server kernel:  rcu_check_callbacks+0x7e4/0x980
Apr 27 17:46:03 dvr-server kernel:  update_process_times+0x34/0x80
Apr 27 17:46:03 dvr-server kernel:  tick_sched_handle.isra.0+0x44/0x70
Apr 27 17:46:03 dvr-server kernel:  tick_sched_timer+0x5c/0xb0
Apr 27 17:46:03 dvr-server kernel:  __hrtimer_run_queues+0x150/0x3a0
Apr 27 17:46:03 dvr-server kernel:  hrtimer_interrupt+0xfc/0x260
Apr 27 17:46:03 dvr-server kernel:  arch_timer_handler_phys+0x3c/0x50
Apr 27 17:46:03 dvr-server kernel:  handle_percpu_devid_irq+0xa4/0x2b0
Apr 27 17:46:03 dvr-server kernel:  generic_handle_irq+0x34/0x50
Apr 27 17:46:03 dvr-server kernel:  __handle_domain_irq+0x98/0x110
Apr 27 17:46:03 dvr-server kernel:  gic_handle_irq+0x58/0xb0
Apr 27 17:46:03 dvr-server kernel:  el1_irq+0xb4/0x130
Apr 27 17:46:03 dvr-server kernel:  queued_spin_lock_slowpath+0x9c/0x2e0
Apr 27 17:46:03 dvr-server kernel:  _raw_spin_lock+0x54/0x60
Apr 27 17:46:03 dvr-server kernel:  exfat_inode_tree_erase+0x34/0x70 [exfat]
Apr 27 17:46:03 dvr-server kernel:  exfat_evict_inode+0x4c/0x90 [exfat]
Apr 27 17:46:03 dvr-server kernel:  evict+0xa8/0x170
Apr 27 17:46:03 dvr-server kernel:  dispose_list+0x48/0x60
Apr 27 17:46:03 dvr-server kernel:  prune_icache_sb+0x6c/0xa0
Apr 27 17:46:03 dvr-server kernel:  super_cache_scan+0xf4/0x170
Apr 27 17:46:03 dvr-server kernel:  do_shrink_slab+0x150/0x3e0
Apr 27 17:46:03 dvr-server kernel:  shrink_slab+0xbc/0x2c0
Apr 27 17:46:03 dvr-server kernel:  shrink_node+0xd0/0x470
Apr 27 17:46:03 dvr-server kernel:  kswapd+0x394/0x850
Apr 27 17:46:03 dvr-server kernel:  kthread+0x104/0x130
Apr 27 17:46:03 dvr-server kernel:  ret_from_fork+0x10/0x1c
namjaejeon commented 3 years ago

@tmm1 Please don't use exfat-next of linux-exfat-oot. please use #master, This is coming from #master also ?

tmm1 commented 3 years ago

I wanted to try the performance improvement in 223ac1af2c27061e9b523e5567d96b968a92ce97 because I was noticing some performance problem. I didn't realize rbtree patch is not yet merged in upstream or received review on lkml.

I will switch back to master. I also discovered my original performance issue was due to failing hardware drive.

Since in my cpu stall backtrace I see exfat_inode_tree_erase+0x34/0x70 [exfat], it seems to be a problem with the rbtree patch which introduced exfat_inode_tree_erase().

tmm1 commented 3 years ago

With master the problem is fixed.

I hope this can be helpful in improving the rbtree performance patch.

If you need help to test new versions, let me know.

namjaejeon commented 3 years ago

Really thanks for your test!

tmm1 commented 3 years ago

I was just curious if there has been any v2 of the rbtree patch?

I ask because I see some performance issues sometimes inside exfat_iget:

image