lanconnected / EnhanceIO

EnhanceIO Open Source for Linux
Other
101 stars 31 forks source link

Problem with eio_clean_thr process #54

Open saba69 opened 4 years ago

saba69 commented 4 years ago

Hi,

I compiled and installed master version in kernel 4.14.34. When writing sequentially into the disk, the cache device gets full and cannot evict data from the cache. eio_clean_thr sticks in D state and the eviction (flushing/cleaning) process cannot continue even after stopping IO (sequential write). This is the stack trace of the process:

eio_clean_thr D 0 28023 2 0x80000080 Dec 7 12:08:46 srv01 kernel: Call Trace: Dec 7 12:08:46 srv01 kernel: ? __schedule+0x1ad/0x6a0 Dec 7 12:08:46 srv01 kernel: schedule+0x32/0x80 Dec 7 12:08:46 srv01 kernel: rwsem_down_write_failed+0x1fe/0x380 Dec 7 12:08:46 srv01 kernel: call_rwsem_down_write_failed+0x13/0x20 Dec 7 12:08:46 srv01 kernel: down_write+0x29/0x40 Dec 7 12:08:46 srv01 kernel: eio_clean_set+0x14c/0x9f0 [enhanceio] Dec 7 12:08:46 srv01 kernel: ? del_timer_sync+0x35/0x40 Dec 7 12:08:46 srv01 kernel: ? call_timer_fn+0x130/0x130 Dec 7 12:08:46 srv01 kernel: eio_clean_thread_proc+0x1bc/0x360 [enhanceio] Dec 7 12:08:46 srv01 kernel: ? schedule+0x1b5/0x6a0 Dec 7 12:08:46 srv01 kernel: kthread+0xfc/0x130 Dec 7 12:08:46 srv01 kernel: ? eio_clean_all+0xd0/0xd0 [enhanceio] Dec 7 12:08:46 srv01 kernel: ? kthread_parkme+0x70/0x70 Dec 7 12:08:46 srv01 kernel: ret_from_fork+0x35/0x40

Thanks, Saba

saba69 commented 4 years ago

Please note that when eio is locked IO can be issued directly in both source and cache devices.

saba69 commented 4 years ago

I just compiled and installed master version in kernel 4.14.158. I still face the same bug in this kernel. Here is the stack trace: enhanceio_lru: Initialized 32639 sets in LRU Dec 7 14:48:09 srv01 kernel: sysrq: SysRq : Show Blocked State Dec 7 14:48:09 srv01 kernel: task PC stack pid father Dec 7 14:48:09 srv01 kernel: eio_clean_threa D 0 3640 2 0x80000080 Dec 7 14:48:09 srv01 kernel: Call Trace: Dec 7 14:48:09 srv01 kernel: ? schedule+0x1b0/0x6b0 Dec 7 14:48:09 srv01 kernel: ? switch_to_asm+0x41/0x70 Dec 7 14:48:09 srv01 kernel: ? switch_to_asm+0x35/0x70 Dec 7 14:48:09 srv01 kernel: schedule+0x32/0x80 Dec 7 14:48:09 srv01 kernel: rwsem_down_write_failed+0x206/0x380 Dec 7 14:48:09 srv01 kernel: ? switch_to_asm+0x41/0x70 Dec 7 14:48:09 srv01 kernel: ? switch_to_asm+0x35/0x70 Dec 7 14:48:09 srv01 kernel: call_rwsem_down_write_failed+0x13/0x20 Dec 7 14:48:09 srv01 kernel: down_write+0x29/0x40 Dec 7 14:48:09 srv01 kernel: eio_clean_set+0x14c/0x980 [enhanceio] Dec 7 14:48:09 srv01 kernel: ? del_timer_sync+0x35/0x40 Dec 7 14:48:09 srv01 kernel: ? call_timer_fn+0x140/0x140 Dec 7 14:48:09 srv01 kernel: eio_clean_thread_proc+0x1bc/0x360 [enhanceio] Dec 7 14:48:09 srv01 kernel: ? schedule+0x1b8/0x6b0 Dec 7 14:48:09 srv01 kernel: kthread+0xff/0x140 Dec 7 14:48:09 srv01 kernel: ? eio_clean_all+0xd0/0xd0 [enhanceio] Dec 7 14:48:09 srv01 kernel: ? __kthread_parkme+0x90/0x90 Dec 7 14:48:09 srv01 kernel: ret_from_fork+0x35/0x40

saba69 commented 4 years ago

Master version flushes cache without any problem in kernel 3.10.0. But it takes too long to fill the cache (compared to kernel 4.14) with the same sequential write workload.

saba69 commented 4 years ago

I tested the master version on kernel 5.4.2 and the flush process works correctly.