Closed clipcarl closed 6 years ago
I forgot to mention that the more read activity there is the faster the deadlock occurs. I think.
I also forgot to mention that when the deadlock occurs the cached device is unusable. Processes that read from or write to it hang. However, the rest of the system is just fine and usable. There is nothing in the kernel log when the deadlock occurs (except for the eventual notification that kernel processes have hung).
[root@tantor ~]# dmesg | grep -i enhanceio
[ 22.242118] enhanceio: Slow (clean) shutdown detected
[ 22.242120] enhanceio: Only clean blocks exist in cache
[ 22.242127] enhanceio_lru: eio_lru_instance_init: created new instance of LRU
[ 22.242129] enhanceio: Setting replacement policy to lru (2)
[ 22.242132] enhanceio: Allocate 346799KB (4B per) mem for 88780544-entry cache (capacity:348153MB, associativity:256, block size:4096 bytes)
[ 29.550118] enhanceio: Cache metadata loaded from disk with 6 valid 0 dirty blocks
[ 29.550121] enhanceio: Setting mode to write back
[ 29.558057] enhanceio_lru: Initialized 346799 sets in LRU
[ 31.751099] enhanceio: eio_handle_ssd_message: SSD_ADD event called for ACTIVE cache "data", ignoring!!!
[root@tantor data]# lsmod
Module Size Used by
dm_snapshot 45056 0
nfsd 327680 35
auth_rpcgss 61440 1 nfsd
oid_registry 16384 1 auth_rpcgss
lockd 90112 1 nfsd
grace 16384 2 nfsd,lockd
sunrpc 278528 65 auth_rpcgss,nfsd,lockd
ip6t_REJECT 16384 1
nf_reject_ipv6 16384 1 ip6t_REJECT
nf_log_ipv6 16384 5
xt_hl 16384 22
ip6t_rt 16384 3
nf_conntrack_ipv6 16384 8
nf_defrag_ipv6 32768 1 nf_conntrack_ipv6
ipt_REJECT 16384 1
nf_reject_ipv4 16384 1 ipt_REJECT
nf_log_ipv4 16384 5
nf_log_common 16384 2 nf_log_ipv6,nf_log_ipv4
xt_LOG 16384 10
xt_limit 16384 13
xt_tcpudp 16384 35
xt_addrtype 16384 4
nf_conntrack_ipv4 16384 8
nf_defrag_ipv4 16384 1 nf_conntrack_ipv4
xt_conntrack 16384 16
ip6table_filter 16384 1
ip6_tables 24576 1 ip6table_filter
ipv6 483328 87 nf_conntrack_ipv6,nf_reject_ipv6,nf_defrag_ipv6
nf_conntrack_netbios_ns 16384 0
nf_conntrack_broadcast 16384 1 nf_conntrack_netbios_ns
nf_nat_ftp 16384 0
nf_nat 32768 1 nf_nat_ftp
nf_conntrack_ftp 16384 1 nf_nat_ftp
nf_conntrack 139264 8 nf_conntrack_ipv6,nf_conntrack_ftp,nf_conntrack_ipv4,nf_conntrack_broadcast,nf_nat_ftp,nf_conntrack_netbios_ns,xt_conntrack,nf_nat
iptable_filter 16384 1
ip_tables 24576 1 iptable_filter
x_tables 40960 13 xt_LOG,ipt_REJECT,ip_tables,iptable_filter,xt_tcpudp,xt_limit,ip6t_REJECT,ip6table_filter,xt_addrtype,ip6t_rt,xt_conntrack,ip6_tables,xt_hl
xfs 856064 1
dm_thin_pool 77824 4
dm_bio_prison 20480 1 dm_thin_pool
dm_persistent_data 81920 1 dm_thin_pool
dm_bufio 32768 2 dm_persistent_data,dm_snapshot
raid10 57344 1
kvdo 561152 0
uds 290816 1 kvdo
iscsi_scst 229376 3
scst_vdisk 180224 0
scst_user 98304 0
scst_tape 16384 0
scst_raid 16384 0
scst_processor 16384 0
scst_modisk 16384 0
scst_disk 20480 0
scst_changer 16384 0
scst_cdrom 16384 0
scst 1032192 10 scst_changer,scst_processor,scst_modisk,scst_cdrom,scst_vdisk,scst_disk,iscsi_scst,scst_raid,scst_tape,scst_user
dlm 188416 1 scst
enhanceio_rand 16384 0
enhanceio_fifo 16384 0
enhanceio_lru 16384 1
enhanceio 184320 3 enhanceio_lru,enhanceio_rand,enhanceio_fifo
dm_writeboost 49152 0
dm_mod 135168 21 kvdo,dm_bufio,dm_thin_pool,dm_writeboost,dm_snapshot
dax 20480 1 dm_mod
libcrc32c 16384 7 scst_vdisk,nf_conntrack,iscsi_scst,xfs,dm_persistent_data,dm_writeboost,nf_nat
zfs 3637248 0
zunicode 331776 1 zfs
zavl 16384 1 zfs
icp 278528 1 zfs
zcommon 73728 1 zfs
znvpair 90112 2 zcommon,zfs
spl 114688 4 znvpair,zcommon,zfs,icp
sr_mod 28672 0
cdrom 45056 1 sr_mod
pata_acpi 16384 0
crc32_pclmul 16384 0
pcbc 16384 0
aesni_intel 188416 0
aes_x86_64 20480 1 aesni_intel
crypto_simd 16384 1 aesni_intel
glue_helper 16384 1 aesni_intel
ghash_clmulni_intel 16384 0
ata_generic 16384 0
cryptd 24576 3 crypto_simd,ghash_clmulni_intel,aesni_intel
kvm_intel 217088 0
kvm 458752 1 kvm_intel
iTCO_wdt 16384 0
iTCO_vendor_support 16384 1 iTCO_wdt
irqbypass 16384 1 kvm
intel_cstate 16384 0
ata_piix 36864 0
coretemp 16384 0
crct10dif_pclmul 16384 0
crc32c_intel 24576 1
serio_raw 16384 0
libata 237568 3 ata_piix,ata_generic,pata_acpi
ixgbe 294912 0
input_leds 16384 0
e1000e 217088 0
i2c_i801 24576 0
igb 196608 0
mpt3sas 253952 16
lpc_ich 28672 0
mdio 16384 1 ixgbe
i2c_algo_bit 16384 1 igb
ptp 20480 3 ixgbe,igb,e1000e
ioatdma 49152 0
uas 28672 0
pps_core 16384 1 ptp
i5500_temp 16384 0
dca 16384 3 ioatdma,ixgbe,igb
raid_class 16384 1 mpt3sas
hwmon 20480 4 ixgbe,igb,i5500_temp,coretemp
button 16384 0
acpi_cpufreq 16384 1
Hi! Thanks for reporting. Does your system remain in locked state forever? Can you try if it is possible to access SSD/HDD directly while eio is locked (i.e., with dd)?
Hi @lanconnected. I will try to set up a test server to retest and duplicate this week and hopefully have an answer to your question soon.
Are there people using EnhanceIO on Linux 4.14 systems? If so I wonder what's different about my test setup and theirs...
Quick question: When setting up a new cache what tells it that a new cache needs to be initialized? (The create and setup eio_cli subcommands essentially seem to call the same ioctls. So I assume this means the driver determines whether it needs to initialize a new cache by looking to see if the superblock is valid? And if that's the case I assume the proper way to tell the driver to create a new cache is to zero the first block of the cache device? If that's the case then that's the sort of thing that should be in the documentation (but doesn't seem to be). I have tried zeroing the entire cache device to be sure but that didn't make a difference (still deadlocked).
That's what "persistent" parameter is for. It is set in eio_cli and passed eventually to the eio_cache_create() in eio_conf.c. It is not necessary to wipe the ssd before cache creation.
Closing this issue for the lack of feedback.
Hi I face the same problem using the last version in kernel 4.14.34. The details are reported in issue #54.
Thanks
Hi! Thanks for reporting. Does your system remain in locked state forever? Can you try if it is possible to access SSD/HDD directly while eio is locked (i.e., with dd)?
When eio is locked IO can be issued directly in both source and cache devices.
Hi! Thanks for reporting. Does your system remain in locked state forever? Can you try if it is possible to access SSD/HDD directly while eio is locked (i.e., with dd)?
When eio is locked IO can be issued directly in both source and cache devices.
Hi! Thanks for reporting. Does your system remain in locked state forever? Can you try if it is possible to access SSD/HDD directly while eio is locked (i.e., with dd)?
When eio is locked IO can be issued directly in both source and cache devices.
Hi,
I'm testing out EnhanceIO in my lab but I'm experiencing what appears to be a deadlock in the writeback locking code; I see EIO processes stuck at down_write() and down_read(). The issue appears to occur somewhat randomly but I believe only after writeback has started and while data is being read from the cached device. In my testing the deadlock always eventually occurs within a few hours to a day or two. Details follow. Please let me know what else I can do to help troubleshoot this.
Thanks, Carl
Here is the system state at one deadlock: