Closed jackyding2679 closed 7 years ago
If affected drive (sdb) was used as cache HDD of cache SSD , i/o errors on it would definitely cause cpu lock-ups while editing the cache.
@lanconnected If HDD has I/O errors during editing cache, i can cause CPU lockup? or If SSD has I/O errors.
Both situations (I/O errors on HDD or I/O errors on SSD) would cause cpu lockup while editing the cache. The reason is that eio_clean_all is called under a spinlock. CPU will be released eventually when I/O error is propagated to upper layers. AFAIK, only editing wb mode is affected. This is not a bug, but "overlocking" in EIO is a bit annoying sometimes.
I've upload the whole messages.
I read the code ,I don't think that eio_clean_all or eio_clean_set is called under any spinlock,but eio_clean_set is called under a semaphore. In function eio_clean_all, when all cachesets are flushed,it get a spinlock to change sysctl_active.do_clean:
spin_lock_irqsave(&dmc->cache_spin_lock, flags);
dmc->sysctl_active.do_clean &= ~EIO_CLEAN_START;
spin_unlock_irqrestore(&dmc->cache_spin_lock, flags);
and this goes very fast.
You are right about locks in eio_clean, sorry, didn't look carefully enough - shame on me :) But looking at your messages i'am surprised that cache works at all - your sdb seems to be in really bad condition. I doubt that EIO could really be blamed for cpu stucks when used on disk with virtually no healthy sector at all - you have > 4000 reported I/O errors on it not counting dropped ones.
At first,i think it is the bad condition of sdb that cause the CPU stucks. Then i pull out and plug in the SATA disk,and check the health condition of the disk,it shows OK, alse i cann't reproduce this issue. Maybe there were some physical connection problems with this disk at the time when the issue occured.
When i edit cache mode in our production environment like this: eio_cli edit -c enhanceio1 -m ro
CPU stucks like this: Jun 26 15:56:42 storage27 kernel: BUG: soft lockup - CPU#3 stuck for 22s! [eio_cli:29929] Jun 26 15:56:42 storage27 kernel: Modules linked in: iptable_filter xt_multiport ip_tables enhanceio_rand(OF) enhanceio_lru(OF) enhanceio_fifo(OF) enhanceio(OF) 8021q garp stp mrp llc bonding nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul iTCO_wdt crc32_pclmul crc32c_intel iTCO_vendor_support ghash_clmulni_intel aesni_intel wmi lrw i2c_i801 gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core ioatdma lpc_ich mfd_core ses enclosure mei_me mei shpchp ipmi_si ipmi_msghandler xfs libcrc32c sd_mod crc_t10dif crct10dif_common ast syscopyarea sysfillrect sysimgblt isci drm_kms_helper ttm libsas ixgbe drm ahci scsi_transport_sas libahci igb libata mdio ptp pps_core megaraid_sas i2c_algo_bit i2c_core dca Jun 26 15:56:42 storage27 kernel: CPU: 3 PID: 29929 Comm: eio_cli Tainted: GF O-------------- 3.10.0-229.el7.x86_64 #1 Jun 26 15:56:42 storage27 kernel: Hardware name: Inspur SA5212H2/SA5212H2, BIOS 2.0.4 03/25/2014 Jun 26 15:56:42 storage27 kernel: task: ffff880ee8d5b8e0 ti: ffff880917994000 task.ti: ffff880917994000 Jun 26 15:56:42 storage27 kernel: RIP: 0010:[] [] eio_finish_nrdirty+0xd8/0x1a0 [enhanceio]
Jun 26 15:56:42 storage27 kernel: RSP: 0018:ffff880917997e18 EFLAGS: 00000246
Jun 26 15:56:42 storage27 kernel: RAX: 0000000000000200 RBX: 000000000000ac0e RCX: 0000000000009eda
Jun 26 15:56:42 storage27 kernel: RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000246
Jun 26 15:56:42 storage27 kernel: RBP: ffff880917997e28 R08: 0000000000000246 R09: 00000000eb1c33c8
Jun 26 15:56:42 storage27 kernel: R10: 000000000150e700 R11: 0000000000000007 R12: 0000000000000246
Jun 26 15:56:42 storage27 kernel: R13: 00000000eb1c33c8 R14: 000000000150e700 R15: 0000000000000007
Jun 26 15:56:42 storage27 kernel: FS: 00007f52aa343740(0000) GS:ffff88085fc60000(0000) knlGS:0000000000000000
Jun 26 15:56:42 storage27 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 26 15:56:42 storage27 kernel: CR2: 0000000015cb3000 CR3: 0000000922584000 CR4: 00000000001407e0
Jun 26 15:56:42 storage27 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 26 15:56:42 storage27 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 26 15:56:42 storage27 kernel: Stack:
Jun 26 15:56:42 storage27 kernel: 0000000000000002 ffff880854080124 ffff880917997e78 ffffffffa03df81b
Jun 26 15:56:42 storage27 kernel: ffff880079b48da0 0000000000d396d0 0000000500000000 0000000000d396d0
Jun 26 15:56:42 storage27 kernel: ffffc90016af6000 0000000000d396d0 0000000000d396d0 0000000000000000
Jun 26 15:56:42 storage27 kernel: Call Trace:
Jun 26 15:56:42 storage27 kernel: [] eio_cache_edit+0x3ab/0x650 [enhanceio]
Jun 26 15:56:42 storage27 kernel: [] eio_ioctl+0x265/0x310 [enhanceio]
Jun 26 15:56:42 storage27 kernel: [] do_vfs_ioctl+0x2e5/0x4c0
Jun 26 15:56:42 storage27 kernel: [] ? SYSC_newfstat+0x34/0x60
Jun 26 15:56:42 storage27 kernel: [] SyS_ioctl+0xa1/0xc0
Jun 26 15:56:42 storage27 kernel: [] system_call_fastpath+0x16/0x1b
Jun 26 16:03:02 storage27 kernel: BUG: soft lockup - CPU#3 stuck for 22s! [eio_cli:29929] Jun 26 16:03:02 storage27 kernel: Modules linked in: iptable_filter xt_multiport ip_tables enhanceio_rand(OF) enhanceio_lru(OF) enhanceio_fifo(OF) enhanceio(OF) 8021q garp stp mrp llc bonding nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul iTCO_wdt crc32_pclmul crc32c_intel iTCO_vendor_support ghash_clmulni_intel aesni_intel wmi lrw i2c_i801 gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core ioatdma lpc_ich mfd_core ses enclosure mei_me mei shpchp ipmi_si ipmi_msghandler xfs libcrc32c sd_mod crc_t10dif crct10dif_common ast syscopyarea sysfillrect sysimgblt isci drm_kms_helper ttm libsas ixgbe drm ahci scsi_transport_sas libahci igb libata mdio ptp pps_core megaraid_sas i2c_algo_bit i2c_core dca Jun 26 16:03:02 storage27 kernel: CPU: 3 PID: 29929 Comm: eio_cli Tainted: GF O-------------- 3.10.0-229.el7.x86_64 #1 Jun 26 16:03:02 storage27 kernel: Hardware name: Inspur SA5212H2/SA5212H2, BIOS 2.0.4 03/25/2014 Jun 26 16:03:02 storage27 kernel: task: ffff880ee8d5b8e0 ti: ffff880917994000 task.ti: ffff880917994000 Jun 26 16:03:02 storage27 kernel: RIP: 0010:[] [] _raw_spin_unlock_irqrestore+0x1b/0x40
Jun 26 16:03:02 storage27 kernel: RSP: 0018:ffff880917997de0 EFLAGS: 00000246
Jun 26 16:03:02 storage27 kernel: RAX: 0000000000000246 RBX: 00000000000151ac RCX: 0000000000005c5a
Jun 26 16:03:02 storage27 kernel: RDX: 000000005c5a5c5a RSI: 0000000000000246 RDI: 0000000000000246
Jun 26 16:03:02 storage27 kernel: RBP: ffff880917997de8 R08: 0000000000000246 R09: 00000000eb1c33c8
Jun 26 16:03:02 storage27 kernel: R10: 000000000150e700 R11: 0000000000000007 R12: 00000000005ef860
Jun 26 16:03:02 storage27 kernel: R13: 00000000005ef860 R14: 000000000150e600 R15: ffff8808541f1bf0
Jun 26 16:03:02 storage27 kernel: FS: 00007f52aa343740(0000) GS:ffff88085fc60000(0000) knlGS:0000000000000000
Jun 26 16:03:02 storage27 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 26 16:03:02 storage27 kernel: CR2: 0000000015cb3000 CR3: 0000000922584000 CR4: 00000000001407e0
Jun 26 16:03:02 storage27 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 26 16:03:02 storage27 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 26 16:03:02 storage27 kernel: Stack:
Jun 26 16:03:02 storage27 kernel: ffff880854080000 ffff880917997e08 ffffffffa03d9ef9 ffff880854080000
Jun 26 16:03:02 storage27 kernel: ffffffffa03f30a0 ffff880917997e28 ffffffffa03de1f0 0000000000000002
Jun 26 16:03:02 storage27 kernel: ffff880854080124 ffff880917997e78 ffffffffa03df81b ffff880079b48da0
Jun 26 16:03:02 storage27 kernel: Call Trace:
Jun 26 16:03:02 storage27 kernel: [] eio_clean_all+0xb9/0xe0 [enhanceio]
Jun 26 16:03:02 storage27 kernel: [] eio_finish_nrdirty+0xe0/0x1a0 [enhanceio]
Jun 26 16:03:02 storage27 kernel: [] eio_cache_edit+0x3ab/0x650 [enhanceio]
Jun 26 16:03:02 storage27 kernel: [] eio_ioctl+0x265/0x310 [enhanceio]
Jun 26 16:03:02 storage27 kernel: [] do_vfs_ioctl+0x2e5/0x4c0
Jun 26 16:03:02 storage27 kernel: [] ? SYSC_newfstat+0x34/0x60
Jun 26 16:03:02 storage27 kernel: [] SyS_ioctl+0xa1/0xc0
Jun 26 16:03:02 storage27 kernel: [] system_call_fastpath+0x16/0x1b
Jun 26 16:01:38 storage27 kernel: BUG: soft lockup - CPU#3 stuck for 22s! [eio_cli:29929] Jun 26 16:01:38 storage27 kernel: Modules linked in: iptable_filter xt_multiport ip_tables enhanceio_rand(OF) enhanceio_lru(OF) enhanceio_fifo(OF) enhanceio(OF) 8021q garp stp mrp llc bonding nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul iTCO_wdt crc32_pclmul crc32c_intel iTCO_vendor_support ghash_clmulni_intel aesni_intel wmi lrw i2c_i801 gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core ioatdma lpc_ich mfd_core ses enclosure mei_me mei shpchp ipmi_si ipmi_msghandler xfs libcrc32c sd_mod crc_t10dif crct10dif_common ast syscopyarea sysfillrect sysimgblt isci drm_kms_helper ttm libsas ixgbe drm ahci scsi_transport_sas libahci igb libata mdio ptp pps_core megaraid_sas i2c_algo_bit i2c_core dca Jun 26 16:01:38 storage27 kernel: CPU: 3 PID: 29929 Comm: eio_cli Tainted: GF O-------------- 3.10.0-229.el7.x86_64 #1 Jun 26 16:01:38 storage27 kernel: Hardware name: Inspur SA5212H2/SA5212H2, BIOS 2.0.4 03/25/2014 Jun 26 16:01:38 storage27 kernel: task: ffff880ee8d5b8e0 ti: ffff880917994000 task.ti: ffff880917994000 Jun 26 16:01:38 storage27 kernel: RIP: 0010:[] [] _raw_spin_lock_irqsave+0x9/0x60
Jun 26 16:01:38 storage27 kernel: RSP: 0018:ffff880917997de8 EFLAGS: 00000246
Jun 26 16:01:38 storage27 kernel: RAX: 00000000000151ad RBX: 00000000000151ac RCX: 0000000000000008
Jun 26 16:01:38 storage27 kernel: RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff880854080124
Jun 26 16:01:38 storage27 kernel: RBP: ffff880917997de8 R08: 0000000000000246 R09: 00000000eb1c33c8
Jun 26 16:01:38 storage27 kernel: R10: 000000000150e700 R11: 0000000000000007 R12: 00000000005ef860
Jun 26 16:01:38 storage27 kernel: R13: 00000000005ef860 R14: 000000000150e600 R15: ffff8808541f1bf0
Jun 26 16:01:38 storage27 kernel: FS: 00007f52aa343740(0000) GS:ffff88085fc60000(0000) knlGS:0000000000000000
Jun 26 16:01:38 storage27 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 26 16:01:38 storage27 kernel: CR2: 0000000015cb3000 CR3: 0000000922584000 CR4: 00000000001407e0
Jun 26 16:01:38 storage27 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 26 16:01:38 storage27 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 26 16:01:38 storage27 kernel: Stack:
Jun 26 16:01:38 storage27 kernel: ffff880917997e08 ffffffffa03d9ee7 ffff880854080000 ffffffffa03f30a0
Jun 26 16:01:38 storage27 kernel: ffff880917997e28 ffffffffa03de1f0 0000000000000002 ffff880854080124
Jun 26 16:01:38 storage27 kernel: ffff880917997e78 ffffffffa03df81b ffff880079b48da0 0000000000d396d0
Jun 26 16:01:38 storage27 kernel: Call Trace:
Jun 26 16:01:38 storage27 kernel: [] eio_clean_all+0xa7/0xe0 [enhanceio]
Jun 26 16:01:38 storage27 kernel: [] eio_finish_nrdirty+0xe0/0x1a0 [enhanceio]
Jun 26 16:01:38 storage27 kernel: [] eio_cache_edit+0x3ab/0x650 [enhanceio]
Jun 26 16:01:38 storage27 kernel: [] eio_ioctl+0x265/0x310 [enhanceio]
Jun 26 16:01:38 storage27 kernel: [] do_vfs_ioctl+0x2e5/0x4c0
Jun 26 16:01:38 storage27 kernel: [] ? SYSC_newfstat+0x34/0x60
Jun 26 16:01:38 storage27 kernel: [] SyS_ioctl+0xa1/0xc0
Jun 26 16:01:38 storage27 kernel: [] system_call_fastpath+0x16/0x1b
Before and during the editing,there were some error for the SATA disk:
Jun 26 13:17:36 storage27 kernel: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Jun 26 13:17:36 storage27 kernel: sd 0:2:1:0: [sdb] CDB: Jun 26 13:17:36 storage27 kernel: Write(16): 8a 00 00 00 00 00 02 73 72 a8 00 00 00 08 00 00 Jun 26 13:17:36 storage27 kernel: end_request: I/O error, dev sdb, sector 41120424 Jun 26 13:17:36 storage27 kernel: sd 0:2:1:0: [sdb]
Jun 26 13:17:36 storage27 kernel: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Jun 26 13:17:36 storage27 kernel: sd 0:2:1:0: [sdb] CDB: Jun 26 13:17:36 storage27 kernel: Write(16): 8a 00 00 00 00 00 02 73 72 90 00 00 00 08 00 00 Jun 26 13:17:36 storage27 kernel: end_request: I/O error, dev sdb, sector 41120400
Jun 26 15:56:16 storage27 kernel: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Jun 26 15:56:16 storage27 kernel: sd 0:2:1:0: [sdb] CDB: Jun 26 15:56:16 storage27 kernel: Write(16): 8a 00 00 00 00 00 00 95 02 c8 00 00 00 08 00 00 Jun 26 15:56:16 storage27 kernel: end_request: I/O error, dev sdb, sector 9765576 Jun 26 15:56:16 storage27 kernel: sd 0:2:1:0: [sdb]
Jun 26 15:56:16 storage27 kernel: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Jun 26 15:56:16 storage27 kernel: sd 0:2:1:0: [sdb] CDB: Jun 26 15:56:16 storage27 kernel: Write(16): 8a 00 00 00 00 00 00 95 00 68 00 00 00 08 00 00 Jun 26 15:56:16 storage27 kernel: end_request: I/O error, dev sdb, sector 9764968 Jun 26 15:56:42 storage27 kernel: BUG: soft lockup - CPU#3 stuck for 22s! [eio_cli:29929]
Maybe the I/O error on the SATA disk cause the CPU stuck,any ideas?