openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.66k stars 1.76k forks source link

Blocked for more than 120 seconds #4876

Closed akkyoh closed 8 years ago

akkyoh commented 8 years ago

Hello.

CentOS 7 (3.10.0-327.22.2.el7.x86_64), qemu-kvm 1.5.3, zfs 0.6.5.7-1.

I have 4 discs (2 hdd, 2 ssd) I created from these two software raid 1 array. Every weekend, when the array starts checking and cpanel run night backups, I see errors in dmesg (host and vm) and the virtual machine file system goes into read-only mode.

cat /proc/mdstat
Personalities : [raid1]
md124 : active raid1 sdc3[0] sdd3[1]
      934128640 blocks super 1.2 [2/2] [UU]
      bitmap: 0/7 pages [0KB], 65536KB chunk

md125 : active raid1 sdb1[1] sda1[0]
      486153216 blocks super 1.2 [2/2] [UU]
      bitmap: 3/4 pages [12KB], 65536KB chunk

md126 : active raid1 sdd2[1] sdc2[0]
      525248 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md127 : active raid1 sdd1[1] sdc1[0]
      41943040 blocks super 1.2 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk
hdd                                type                  filesystem             -
hdd                                creation              Sat May  7 22:25 2016  -
hdd                                used                  37.5G                  -
hdd                                available             823G                   -
hdd                                referenced            5.28G                  -
hdd                                compressratio         1.01x                  -
hdd                                mounted               yes                    -
hdd                                quota                 none                   default
hdd                                reservation           none                   default
hdd                                recordsize            128K                   default
hdd                                mountpoint            /storage/hdd           local
hdd                                sharenfs              off                    default
hdd                                checksum              on                     default
hdd                                compression           off                    default
hdd                                atime                 on                     default
hdd                                devices               on                     default
hdd                                exec                  on                     default
hdd                                setuid                on                     default
hdd                                readonly              off                    default
hdd                                zoned                 off                    default
hdd                                snapdir               hidden                 default
hdd                                aclinherit            restricted             default
hdd                                canmount              on                     default
hdd                                xattr                 on                     default
hdd                                copies                1                      default
hdd                                version               5                      -
hdd                                utf8only              off                    -
hdd                                normalization         none                   -
hdd                                casesensitivity       sensitive              -
hdd                                vscan                 off                    default
hdd                                nbmand                off                    default
hdd                                sharesmb              off                    default
hdd                                refquota              none                   default
hdd                                refreservation        none                   default
hdd                                primarycache          metadata               local
hdd                                secondarycache        all                    default
hdd                                usedbysnapshots       0                      -
hdd                                usedbydataset         5.28G                  -
hdd                                usedbychildren        32.3G                  -
hdd                                usedbyrefreservation  0                      -
hdd                                logbias               latency                default
hdd                                dedup                 off                    default
hdd                                mlslabel              none                   default
hdd                                sync                  standard               default
hdd                                refcompressratio      1.00x                  -
hdd                                written               5.28G                  -
hdd                                logicalused           37.7G                  -
hdd                                logicalreferenced     5.28G                  -
hdd                                filesystem_limit      none                   default
hdd                                snapshot_limit        none                   default
hdd                                filesystem_count      none                   default
hdd                                snapshot_count        none                   default
hdd                                snapdev               hidden                 default
hdd                                acltype               off                    default
hdd                                context               none                   default
hdd                                fscontext             none                   default
hdd                                defcontext            none                   default
hdd                                rootcontext           none                   default
hdd                                relatime              on                     temporary
hdd                                redundant_metadata    all                    default
hdd                                overlay               off                    default
ssd                                type                  filesystem             -
ssd                                creation              Sat May  7 22:24 2016  -
ssd                                used                  41.2G                  -
ssd                                available             404G                   -
ssd                                referenced            19K                    -
ssd                                compressratio         1.72x                  -
ssd                                mounted               yes                    -
ssd                                quota                 none                   default
ssd                                reservation           none                   default
ssd                                recordsize            128K                   default
ssd                                mountpoint            /storage/ssd           local
ssd                                sharenfs              off                    default
ssd                                checksum              on                     default
ssd                                compression           off                    default
ssd                                atime                 on                     default
ssd                                devices               on                     default
ssd                                exec                  on                     default
ssd                                setuid                on                     default
ssd                                readonly              off                    default
ssd                                zoned                 off                    default
ssd                                snapdir               hidden                 default
ssd                                aclinherit            restricted             default
ssd                                canmount              on                     default
ssd                                xattr                 on                     default
ssd                                copies                1                      default
ssd                                version               5                      -
ssd                                utf8only              off                    -
ssd                                normalization         none                   -
ssd                                casesensitivity       sensitive              -
ssd                                vscan                 off                    default
ssd                                nbmand                off                    default
ssd                                sharesmb              off                    default
ssd                                refquota              none                   default
ssd                                refreservation        none                   default
ssd                                primarycache          metadata               local
ssd                                secondarycache        all                    default
ssd                                usedbysnapshots       0                      -
ssd                                usedbydataset         19K                    -
ssd                                usedbychildren        41.2G                  -
ssd                                usedbyrefreservation  0                      -
ssd                                logbias               latency                default
ssd                                dedup                 off                    default
ssd                                mlslabel              none                   default
ssd                                sync                  standard               default
ssd                                refcompressratio      1.00x                  -
ssd                                written               19K                    -
ssd                                logicalused           70.2G                  -
ssd                                logicalreferenced     9.50K                  -
ssd                                filesystem_limit      none                   default
ssd                                snapshot_limit        none                   default
ssd                                filesystem_count      none                   default
ssd                                snapshot_count        none                   default
ssd                                snapdev               hidden                 default
ssd                                acltype               off                    default
ssd                                context               none                   default
ssd                                fscontext             none                   default
ssd                                defcontext            none                   default
ssd                                rootcontext           none                   default
ssd                                relatime              on                     temporary
ssd                                redundant_metadata    all                    default
ssd                                overlay               off                    default

dmesg on host

[ 2642.690241] INFO: task qemu-kvm:11604 blocked for more than 120 seconds.
[ 2642.691365] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2642.692525] qemu-kvm        D 0000000000000000     0 11604      1 0x00000080
[ 2642.692529]  ffff88059982bc80 0000000000000086 ffff880f0f06c500 ffff88059982bfd8
[ 2642.692534]  ffff88059982bfd8 ffff88059982bfd8 ffff880f0f06c500 ffff880f6252e980
[ 2642.692540]  ffff880f6252e800 ffff880f6252e988 ffff880f6252e828 0000000000000000
[ 2642.692545] Call Trace:
[ 2642.692550]  [<ffffffff8163b809>] schedule+0x29/0x70
[ 2642.692564]  [<ffffffffa0011755>] cv_wait_common+0x125/0x150 [spl]
[ 2642.692567]  [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30
[ 2642.692578]  [<ffffffffa0011795>] __cv_wait+0x15/0x20 [spl]
[ 2642.692624]  [<ffffffffa016749b>] zil_commit.part.11+0x8b/0x7d0 [zfs]
[ 2642.692630]  [<ffffffff81173e80>] ? global_dirtyable_memory+0x70/0x70
[ 2642.692674]  [<ffffffffa0167bf7>] zil_commit+0x17/0x20 [zfs]
[ 2642.692717]  [<ffffffffa01792f5>] zvol_request+0x305/0x610 [zfs]
[ 2642.692724]  [<ffffffff812c77bf>] ? generic_make_request_checks+0x24f/0x380
[ 2642.692728]  [<ffffffff8116c225>] ? mempool_alloc_slab+0x15/0x20
[ 2642.692731]  [<ffffffff8116c369>] ? mempool_alloc+0x69/0x170
[ 2642.692735]  [<ffffffff812c79d2>] generic_make_request+0xe2/0x130
[ 2642.692739]  [<ffffffff812c7a91>] submit_bio+0x71/0x150
[ 2642.692743]  [<ffffffff81217e1d>] ? bio_alloc_bioset+0x1fd/0x350
[ 2642.692747]  [<ffffffff812ce05b>] blkdev_issue_flush+0xab/0x110
[ 2642.692751]  [<ffffffff812195d5>] blkdev_fsync+0x35/0x50
[ 2642.692762]  [<ffffffff81210025>] do_fsync+0x65/0xa0
[ 2642.692766]  [<ffffffff81210313>] SyS_fdatasync+0x13/0x20
[ 2642.692770]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b

dmesg on vm

[ 1605.689485] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 1605.689789] ata4.00: failed command: FLUSH CACHE EXT
[ 1605.690026] ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[ 1605.690028]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1605.690603] ata4.00: status: { DRDY }
[ 1605.690865] ata4: hard resetting link
[ 1605.995700] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1610.995537] ata4.00: qc timeout (cmd 0xec)
[ 1610.995762] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[ 1610.996026] ata4.00: revalidation failed (errno=-5)
[ 1610.996232] ata4: hard resetting link
[ 1611.301711] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1621.301792] ata4.00: qc timeout (cmd 0xec)
[ 1621.303212] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[ 1621.303464] ata4.00: revalidation failed (errno=-5)
[ 1621.303665] ata4: limiting SATA link speed to 1.5 Gbps
[ 1621.303934] ata4: hard resetting link
[ 1621.608698] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 1631.587263] ata4.00: configured for UDMA/100
[ 1631.588404] ata4.00: device reported invalid CHS sector 0
[ 1631.589537] ata4: EH complete
[ 1986.945202] EXT4-fs warning (device sdb1): ext4_end_bio: I/O error writing to inode 4980894 (size 0 starting block 1345575)
[ 1986.945233] sd 3:0:0:0: [sdb]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 1986.945237] sd 3:0:0:0: [sdb] CDB: Write(10): 2a 00 00 a4 41 3f 00 01 00 00
[ 1986.945243] end_request: I/O error, dev sdb, sector 10764607

XML config VM:

   <disk type="block" device="disk">
          <driver name="qemu" cache="none" discard="unmap" io="native"/>
          <source dev="/dev/zvol/ssd/vsv1001-0nic-fzk8xviz3ewwlg71"/>
          <target dev="sda" bus="sata"/>
    </disk>
    <disk type="block" device="disk">
          <driver name="qemu" cache="none" discard="unmap" io="native"/>
          <source dev="/dev/zvol/hdd/vsv1001-1fbi-fzk8xviz3ewwlg71"/>
          <target dev="sdb" bus="sata"/>
    </disk>

I tried to use the virtio-scsi, but I get the error "blocked for more than 120 seconds" inside the virtual machine (and it thus hangs), the host at the same time there were no errors. It was like issue #4065

memory information on host:

              total        used        free      shared  buff/cache   available
Mem:          64257       36014       24444           9        3798       25809
Swap:          4103           0        4103

arcstats

cat arcstats
6 1 0x01 91 4368 929199517 4024452703661
name                            type data
hits                            4    1092756
misses                          4    3113220
demand_data_hits                4    171
demand_data_misses              4    3050047
demand_metadata_hits            4    1092514
demand_metadata_misses          4    61960
prefetch_data_hits              4    0
prefetch_data_misses            4    43
prefetch_metadata_hits          4    71
prefetch_metadata_misses        4    1170
mru_hits                        4    318724
mru_ghost_hits                  4    0
mfu_hits                        4    773961
mfu_ghost_hits                  4    0
deleted                         4    41
mutex_miss                      4    0
evict_skip                      4    20
evict_not_enough                4    0
evict_l2_cached                 4    0
evict_l2_eligible               4    356352
evict_l2_ineligible             4    0
evict_l2_skip                   4    0
hash_elements                   4    211736
hash_elements_max               4    211741
hash_collisions                 4    72971
hash_chains                     4    3641
hash_chain_max                  4    3
p                               4    16844611584
c                               4    33689223168
c_min                           4    33554432
c_max                           4    33689223168
size                            4    2180396016
hdr_size                        4    89797264
data_size                       4    1387667456
metadata_size                   4    690424832
other_size                      4    12506464
anon_size                       4    425984
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    1822821376
mru_evictable_data              4    1387274240
mru_evictable_metadata          4    412967936
mru_ghost_size                  4    0
mru_ghost_evictable_data        4    0
mru_ghost_evictable_metadata    4    0
mfu_size                        4    254844928
mfu_evictable_data              4    0
mfu_evictable_metadata          4    254400512
mfu_ghost_size                  4    0
mfu_ghost_evictable_data        4    0
mfu_ghost_evictable_metadata    4    0
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_lock_retry            4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_evict_l1cached               4    0
l2_free_on_write                4    0
l2_cdata_free_on_write          4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
l2_compress_successes           4    0
l2_compress_zeros               4    0
l2_compress_failures            4    0
memory_throttle_count           4    0
duplicate_buffers               4    0
duplicate_buffers_size          4    0
duplicate_reads                 4    0
memory_direct_count             4    0
memory_indirect_count           4    0
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    792728560
arc_meta_limit                  4    25266917376
arc_meta_max                    4    2792856608
arc_meta_min                    4    16777216
arc_need_free                   4    0
arc_sys_free                    4    1052786688

Which is why there are these errors, and what should be done for the stable operation of the system?

behlendorf commented 8 years ago

Both ext4 and zfs and getting IO errors when accessing the virtual block devices. You'll need to resolve that problem first before either filesystem will perform reliably.

akkyoh commented 8 years ago

You want to say that the problem in the server disks? SMART disks does not see the problem, a problem both in the SSD (raid-1), and on the HDD (raid-1). I had previously used disk connection mode virtio-scsi, already observed error "kernel: [808.698312] sd 2:0:0:0: [sda] abort" (on the vm, host not have errors) and unresponsive virtual machine (host system worked without problems, but the increased load average). I have a second server in another data center with ZFS / SPL 0.6.5.7-1, it is also used virtio-scsi and under load there arise "sd 2:0:0:0: [sda] abort" error.

I do not think that this is a problem of the system. Tell me what to provide for the analysis of the problem?

behlendorf commented 8 years ago

SMART only reads and reports internal counters from the drives. It's certainly a useful utility for monitoring but it's only part of the story. The errors you posted were logged by the kernel's scsi mid-layer and they imply a transient connectivity problem with the devices. This kind of thing isn't uncommon for systems which rely on SAS expanders and it usually occurs when you crank up the workload on the devices. For example, when you start checking the array.

[ 1605.689485] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 1605.689789] ata4.00: failed command: FLUSH CACHE EXT
[ 1605.690026] ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[ 1605.690028]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1605.690603] ata4.00: status: { DRDY }
>>> [ 1605.690865] ata4: hard resetting link
[ 1605.995700] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1610.995537] ata4.00: qc timeout (cmd 0xec)
[ 1610.995762] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[ 1610.996026] ata4.00: revalidation failed (errno=-5)
[ 1610.996232] ata4: hard resetting link
[ 1611.301711] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1621.301792] ata4.00: qc timeout (cmd 0xec)
[ 1621.303212] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[ 1621.303464] ata4.00: revalidation failed (errno=-5)
[ 1621.303665] ata4: limiting SATA link speed to 1.5 Gbps
[ 1621.303934] ata4: hard resetting link
[ 1621.608698] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 1631.587263] ata4.00: configured for UDMA/100
[ 1631.588404] ata4.00: device reported invalid CHS sector 0
[ 1631.589537] ata4: EH complete

There's nothing which can be done about this at the filesystem layer. My suggestion would be to take this to the mailing list where you may be able to get some support.

http://list.zfsonlinux.org/mailman/listinfo/zfs-discuss

akkyoh commented 8 years ago

Thank you, I understand your position. Unfortunately, SSD drives in a RAID-1 problem also occurs on two servers. I'm afraid I'll have to go back to the use of the LVM, there are no such problems.

Sorry to trouble you, and time taken away.

The following information at the virtual machine hangs on the SSD disk using virtio-scsi. At the same time on the host node error does not appear. Perhaps it is useful to you to solve any problems in the future, if it is there someone else.

Jul 24 05:12:28 pluton kernel: [ 2887.219825] sd 2:0:0:0: [sda] abort
Jul 24 05:12:28 pluton kernel: [ 2887.220639] sd 2:0:0:0: [sda] abort
Jul 24 05:13:28 pluton kernel: [ 2947.705167] sd 2:0:0:0: [sda] abort
Jul 24 05:14:21 pluton kernel: [ 3000.709259] INFO: task jbd2/sda1-8:434 blocked for more than 120 seconds.
Jul 24 05:14:21 pluton kernel: [ 3000.710578]       Not tainted 2.6.32-673.26.1.lve1.4.15.el6.x86_64 #1
Jul 24 05:14:21 pluton kernel: [ 3000.711749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 24 05:14:21 pluton kernel: [ 3000.713808] jbd2/sda1-8   D ffff88063739e4c0     0   434      2    0 0x00000000
Jul 24 05:14:21 pluton kernel: [ 3000.716134]  ffff880636923c00 0000000000000046 0000000000000000 000000001c90ad56
Jul 24 05:14:21 pluton kernel: [ 3000.718283]  ffff880636923b60 ffff88063806cb40 0000028897db5e7f ffffffffa86be73b
Jul 24 05:14:21 pluton kernel: [ 3000.720459]  00000000098e9f96 0000000000000000 000000010025edb8 ffff88063739ea88
Jul 24 05:14:21 pluton kernel: [ 3000.722629] Call Trace:
Jul 24 05:14:21 pluton kernel: [ 3000.723844]  [<ffffffff811f4850>] ? sync_buffer+0x0/0x50
Jul 24 05:14:21 pluton kernel: [ 3000.724968]  [<ffffffff81547a03>] io_schedule+0x73/0xc0
Jul 24 05:14:21 pluton kernel: [ 3000.726050]  [<ffffffff811f4890>] sync_buffer+0x40/0x50
Jul 24 05:14:21 pluton kernel: [ 3000.727164]  [<ffffffff81548b6f>] __wait_on_bit+0x5f/0x90
Jul 24 05:14:21 pluton kernel: [ 3000.728267]  [<ffffffff811f4850>] ? sync_buffer+0x0/0x50
Jul 24 05:14:21 pluton kernel: [ 3000.729429]  [<ffffffff81548c18>] out_of_line_wait_on_bit+0x78/0x90
Jul 24 05:14:21 pluton kernel: [ 3000.730833]  [<ffffffff810abd20>] ? wake_bit_function+0x0/0x50
Jul 24 05:14:21 pluton kernel: [ 3000.731983]  [<ffffffff811f5ba6>] __wait_on_buffer+0x26/0x30
Jul 24 05:14:21 pluton kernel: [ 3000.733120]  [<ffffffffa00417ea>] jbd2_journal_commit_transaction+0x117a/0x14f0 [jbd2]
Jul 24 05:14:21 pluton kernel: [ 3000.735198]  [<ffffffff810abca0>] ? autoremove_wake_function+0x0/0x40
Jul 24 05:14:21 pluton kernel: [ 3000.736220]  [<ffffffffa0046bd0>] kjournald2+0xd0/0x230 [jbd2]
Jul 24 05:14:21 pluton kernel: [ 3000.737105]  [<ffffffff810abca0>] ? autoremove_wake_function+0x0/0x40
Jul 24 05:14:21 pluton kernel: [ 3000.737989]  [<ffffffffa0046b00>] ? kjournald2+0x0/0x230 [jbd2]
Jul 24 05:14:21 pluton kernel: [ 3000.739006]  [<ffffffff810ab8ae>] kthread+0x9e/0xc0
Jul 24 05:14:21 pluton kernel: [ 3000.739894]  [<ffffffff8100c3ca>] child_rip+0xa/0x20
Jul 24 05:14:21 pluton kernel: [ 3000.740878]  [<ffffffff810ab810>] ? kthread+0x0/0xc0
Jul 24 05:14:21 pluton kernel: [ 3000.741772]  [<ffffffff8100c3c0>] ? child_rip+0x0/0x20
Jul 24 05:14:21 pluton kernel: [ 3000.803913] INFO: task sadc:33464 blocked for more than 120 seconds.
Jul 24 05:14:21 pluton kernel: [ 3000.804859]       Not tainted 2.6.32-673.26.1.lve1.4.15.el6.x86_64 #1
Jul 24 05:14:21 pluton kernel: [ 3000.805827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 24 05:14:21 pluton kernel: [ 3000.807520] sadc          D ffff8804d00f4780     0 33464  33443    0 0x00000004
Jul 24 05:14:21 pluton kernel: [ 3000.809137]  ffff8801b4827db8 0000000000000086 ffff8801b4827d80 0000000000000001
Jul 24 05:14:21 pluton kernel: [ 3000.810787]  000000000000000e 000000000005cad0 0000028765ec89d3 ffff8800282cfd70
Jul 24 05:14:21 pluton kernel: [ 3000.812434]  ffff880600000001 ffff8800282cfd60 000000010025d900 ffff8804d00f4d48
Jul 24 05:14:21 pluton kernel: [ 3000.814042] Call Trace:
Jul 24 05:14:21 pluton kernel: [ 3000.814812]  [<ffffffffa00466f5>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
Jul 24 05:14:21 pluton kernel: [ 3000.815704]  [<ffffffff810abca0>] ? autoremove_wake_function+0x0/0x40
Jul 24 05:14:21 pluton kernel: [ 3000.816615]  [<ffffffffa0046aa8>] jbd2_complete_transaction+0x68/0xb0 [jbd2]
Jul 24 05:14:21 pluton kernel: [ 3000.817550]  [<ffffffffa0062ab7>] ext4_sync_file+0x157/0x210 [ext4]
Jul 24 05:14:21 pluton kernel: [ 3000.818453]  [<ffffffff811f0a6d>] vfs_fsync_range+0x10d/0x260
Jul 24 05:14:21 pluton kernel: [ 3000.819328]  [<ffffffff811f0c2d>] vfs_fsync+0x1d/0x20
Jul 24 05:14:21 pluton kernel: [ 3000.820165]  [<ffffffff811f0cb3>] do_fsync+0x83/0xe0
Jul 24 05:14:21 pluton kernel: [ 3000.821005]  [<ffffffff811f0d23>] sys_fdatasync+0x13/0x20
Jul 24 05:14:21 pluton kernel: [ 3000.821861]  [<ffffffff8100b1a2>] system_call_fastpath+0x16/0x1b
Jul 24 05:14:21 pluton kernel: [ 3000.822791] INFO: task repquota:33644 blocked for more than 120 seconds.
Jul 24 05:14:21 pluton kernel: [ 3000.823711]       Not tainted 2.6.32-673.26.1.lve1.4.15.el6.x86_64 #1
Jul 24 05:14:21 pluton kernel: [ 3000.824853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 24 05:14:21 pluton kernel: [ 3000.846836] repquota      D ffff8804d6c442c0     0 33644  33469    0 0x00000000
Jul 24 05:14:21 pluton kernel: [ 3000.850302]  ffff8804e686fcd8 0000000000000086 0000000000000000 ffff88063c6070e0
Jul 24 05:14:21 pluton kernel: [ 3000.857651]  0000000000000282 0000000000000282 0000027e7ed43cd6 0000000000000282
Jul 24 05:14:21 pluton kernel: [ 3000.861820]  ffff88063c607024 0000000000000000 000000010025444c ffff8804d6c44888
Jul 24 05:14:21 pluton kernel: [ 3000.865867] Call Trace:
Jul 24 05:14:21 pluton kernel: [ 3000.867798]  [<ffffffff810abfce>] ? prepare_to_wait+0x4e/0x80
Jul 24 05:14:21 pluton kernel: [ 3000.869950]  [<ffffffffa00466f5>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
Jul 24 05:14:21 pluton kernel: [ 3000.872234]  [<ffffffff810abca0>] ? autoremove_wake_function+0x0/0x40
Jul 24 05:14:21 pluton kernel: [ 3000.874646]  [<ffffffffa007fb5f>] ext4_sync_fs+0x7f/0xb0 [ext4]
Jul 24 05:14:21 pluton kernel: [ 3000.876836]  [<ffffffff81228517>] sync_quota_sb+0x67/0x140
Jul 24 05:14:21 pluton kernel: [ 3000.878294]  [<ffffffff8122882c>] do_quotactl+0x23c/0x4c0
Jul 24 05:14:21 pluton kernel: [ 3000.880150]  [<ffffffff811d8367>] ? iput+0x87/0x100
Jul 24 05:14:21 pluton kernel: [ 3000.882584]  [<ffffffff81228b3b>] sys_quotactl+0x8b/0xf0
Jul 24 05:14:21 pluton kernel: [ 3000.885115]  [<ffffffff8100b1a2>] system_call_fastpath+0x16/0x1b
akkyoh commented 8 years ago

I spent testing on a second server. Hardware another, but the software is the same CentOS 7, the latest version of a ZFS.

I launched an array of checks (on host: echo 'check'> /sys/block/md3/md/sync_action) and began to write to disk in a virtual machine (on vm: dd if=/dev/sda of=test bs=4M count=5000)

The system does not hold the load. This second server. If you are using the same software with the exception of ZFS (replace it with LVM) - no problem.

The error on the host node:

[14661523.111294] md: data-check of RAID array md3
[14661523.111699] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[14661523.112143] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
[14661523.112596] md: using 128k window, over a total of 92049920k.
[14661773.277957] INFO: task txg_sync:2552 blocked for more than 120 seconds.
[14661773.278404] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[14661773.278854] txg_sync        D ffff8807f35fd080     0  2552      2 0x00000080
[14661773.279309]  ffff8800d5a8fab0 0000000000000046 ffff8807d7131700 ffff8800d5a8ffd8
[14661773.279767]  ffff8800d5a8ffd8 ffff8800d5a8ffd8 ffff8807d7131700 ffff88081f314780
[14661773.280246]  0000000000000000 7fffffffffffffff ffff8801ab2d44f0 0000000000000001
[14661773.280719] Call Trace:
[14661773.281182]  [<ffffffff8163a879>] schedule+0x29/0x70
[14661773.281681]  [<ffffffff81638569>] schedule_timeout+0x209/0x2d0
[14661773.282171]  [<ffffffff810b8c22>] ? default_wake_function+0x12/0x20
[14661773.282674]  [<ffffffff810af028>] ? __wake_up_common+0x58/0x90
[14661773.283170]  [<ffffffff8101c829>] ? read_tsc+0x9/0x10
[14661773.283719]  [<ffffffff810d814c>] ? ktime_get_ts64+0x4c/0xf0
[14661773.284209]  [<ffffffff81639eae>] io_schedule_timeout+0xae/0x130
[14661773.284618]  [<ffffffff810a6896>] ? prepare_to_wait_exclusive+0x56/0x90
[14661773.285032]  [<ffffffff81639f48>] io_schedule+0x18/0x20
[14661773.285463]  [<ffffffffa00116fe>] cv_wait_common+0xae/0x150 [spl]
[14661773.285887]  [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30
[14661773.286324]  [<ffffffffa00117f8>] __cv_wait_io+0x18/0x20 [spl]
[14661773.286785]  [<ffffffffa016d613>] zio_wait+0x123/0x210 [zfs]
[14661773.287230]  [<ffffffffa00f4c4f>] dsl_pool_sync+0xbf/0x430 [zfs]
[14661773.287707]  [<ffffffffa01100b8>] spa_sync+0x388/0xb70 [zfs]
[14661773.288175]  [<ffffffff810a6b0b>] ? autoremove_wake_function+0x2b/0x40
[14661773.288665]  [<ffffffffa01227ac>] txg_sync_thread+0x3cc/0x640 [zfs]
[14661773.289118]  [<ffffffffa01223e0>] ? txg_fini+0x2a0/0x2a0 [zfs]
[14661773.289603]  [<ffffffffa000c881>] thread_generic_wrapper+0x71/0x80 [spl]
[14661773.290051]  [<ffffffffa000c810>] ? __thread_exit+0x20/0x20 [spl]
[14661773.290517]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
[14661773.290979]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[14661773.291476]  [<ffffffff81645818>] ret_from_fork+0x58/0x90
[14661773.291958]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140

Error on the VM:

[337296.031017] BUG: soft lockup - CPU#3 stuck for 67s! [mysqld:2708]
[337308.027015] BUG: soft lockup - CPU#2 stuck for 67s! [lvestats-server:1403]
[337308.027027] Modules linked in: lve(P)(U) iolimits(U) nf_conntrack_ftp xt_recent nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_owner xt_multiport iptable_filter ip_tables ip6table_filter ip6_tables ipv6 sg virtio_balloon snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core 8139too 8139cp mii ext4 jbd2 mbcache sd_mod crc_t10dif virtio_scsi virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[337308.027253] CPU 2
[337308.027253] Modules linked in: lve(P)(U) iolimits(U) nf_conntrack_ftp xt_recent nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_owner xt_multiport iptable_filter ip_tables ip6table_filter ip6_tables ipv6 sg virtio_balloon snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core 8139too 8139cp mii ext4 jbd2 mbcache sd_mod crc_t10dif virtio_scsi virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[337308.027253]
[337308.027253] Pid: 1403, comm: lvestats-server veid: 0 Tainted: P           ---------------    2.6.32-604.30.3.lve1.3.63.el6.x86_64 #1 042stab111_11 Red Hat KVM
[337308.027253] RIP: 0010:[<ffffffff810cffd8>]  [<ffffffff810cffd8>] smp_call_function_many+0x208/0x280
[337308.027253] RSP: 0018:ffff880232da3da8  EFLAGS: 00000202
[337308.027253] RAX: 0000000000000009 RBX: ffff880232da3de8 RCX: 0000000000000830
[337308.027253] RDX: 0000000000000008 RSI: 0000000000000004 RDI: 0000000000000286
[337308.027253] RBP: ffffffff8100bcce R08: ffffffff81c12580 R09: 0000000000000000
[337308.027253] R10: 0000000000000000 R11: 000000000000000d R12: 0000000000000286
[337308.027253] R13: ffff880232da3d58 R14: 0000000000000200 R15: ffff88009efe9238
[337308.027253] FS:  00007f497ce48700(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
[337308.027253] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[337308.027253] CR2: 0000000001090670 CR3: 0000000232d92000 CR4: 00000000001407e0
[337308.027253] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[337308.027253] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[337308.027253] Process lvestats-server (pid: 1403, veid: 0, threadinfo ffff880232da0000, task ffff880238bbace0)
[337308.027253] Stack:
[337308.027253]  01ff88002830bba0 0000000000000000 ffffffff81016bb3 ffff88006e396a40
[337308.027253] <d> ffff880232da3e60 0000000000000000 0000000000000400 ffff880232da3e60
[337308.027253] <d> ffff880232da3df8 ffffffff810d0072 ffff880232da3e18 ffffffff8101d0d6
[337308.027253] Call Trace:
[337308.027253]  [<ffffffff81016bb3>] ? get_cpu_cap_masked+0x393/0x3d0
[337308.027253]  [<ffffffff810d0072>] ? smp_call_function+0x22/0x30
[337308.027253]  [<ffffffff8101d0d6>] ? c_start+0x36/0x50
[337308.027253]  [<ffffffff811defb6>] ? seq_read+0x96/0x410
[337308.027253]  [<ffffffff812285ce>] ? proc_reg_read+0x7e/0xc0
[337308.027253]  [<ffffffff811b6d65>] ? vfs_read+0xb5/0x1a0
[337308.027253]  [<ffffffff811b6ea1>] ? sys_read+0x51/0x90
[337308.027253]  [<ffffffff810ff41e>] ? __audit_syscall_exit+0x25e/0x290
[337308.027253]  [<ffffffff8100b1a2>] ? system_call_fastpath+0x16/0x1b
[337308.027253] Code: e8 fe c2 46 00 0f ae f0 48 8b 7b 38 ff 15 c9 b9 9c 00 80 7d c7 00 0f 84 7f fe ff ff f6 43 20 01 0f 84 75 fe ff ff 0f 1f 44 00 00 <f3> 90 f6 43 20 01 75 f8 e9 63 fe ff ff 0f 1f 00 4c 89 ea 4c 89
[337308.027253] Call Trace:
[337308.027253]  [<ffffffff810cffbf>] ? smp_call_function_many+0x1ef/0x280
[337308.027253]  [<ffffffff81016bb3>] ? get_cpu_cap_masked+0x393/0x3d0
[337308.027253]  [<ffffffff810d0072>] ? smp_call_function+0x22/0x30
[337308.027253]  [<ffffffff8101d0d6>] ? c_start+0x36/0x50
[337308.027253]  [<ffffffff811defb6>] ? seq_read+0x96/0x410
[337308.027253]  [<ffffffff812285ce>] ? proc_reg_read+0x7e/0xc0
[337308.027253]  [<ffffffff811b6d65>] ? vfs_read+0xb5/0x1a0
[337308.027253]  [<ffffffff811b6ea1>] ? sys_read+0x51/0x90
[337308.027253]  [<ffffffff810ff41e>] ? __audit_syscall_exit+0x25e/0x290
[337308.027253]  [<ffffffff8100b1a2>] ? system_call_fastpath+0x16/0x1b
[337296.031379] Modules linked in: lve(P)(U) iolimits(U) nf_conntrack_ftp xt_recent nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_owner xt_multiport iptable_filter ip_tables ip6table_filter ip6_tables ipv6 sg virtio_balloon snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core 8139too 8139cp mii ext4 jbd2 mbcache sd_mod crc_t10dif virtio_scsi virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[337296.031379] CPU 3
[337296.031379] Modules linked in: lve(P)(U) iolimits(U) nf_conntrack_ftp xt_recent nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_owner xt_multiport iptable_filter ip_tables ip6table_filter ip6_tables ipv6 sg virtio_balloon snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core 8139too 8139cp mii ext4 jbd2 mbcache sd_mod crc_t10dif virtio_scsi virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[337296.031379]
[337296.031379] Pid: 2708, comm: mysqld veid: 0 Tainted: P           ---------------    2.6.32-604.30.3.lve1.3.63.el6.x86_64 #1 042stab111_11 Red Hat KVM
[337296.031379] RIP: 0010:[<ffffffff810542c0>]  [<ffffffff810542c0>] flush_tlb_others_ipi+0x120/0x130
[337296.031379] RSP: 0018:ffff88023b3cfde8  EFLAGS: 00000246
[337296.031379] RAX: 0000000000000000 RBX: ffff88023b3cfe28 RCX: 0000000000000004
[337296.031379] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff81e45858
[337296.031379] RBP: ffffffff8100bcce R08: 0000000000000000 R09: 0000000000000000
[337296.031379] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88023b3cfdd8
[337296.031379] R13: ffffffff8100bcce R14: ffff880216e1f100 R15: ffff880216e1f0c0
[337296.031379] FS:  00007f8b4262b700(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
[337296.031379] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[337296.031379] CR2: 00007f8b4267b000 CR3: 0000000232f48000 CR4: 00000000001407e0
[337296.031379] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[337296.031379] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[337296.031379] Process mysqld (pid: 2708, veid: 0, threadinfo ffff88023b3cc000, task ffff880239260d60)
[337296.031379] Stack:
[337296.031379]  ffffffffffffffff ffffffff81e45840 ffff88023b3cfe28 ffff880239066bc0
[337296.031379] <d> ffffffffffffffff ffff880239066e98 ffff880239066bc0 00007f8b4267b000
[337296.031379] <d> ffff88023b3cfe58 ffffffff81054346 ffff88023b3cfe58 ffff880239066bc0
[337296.031379] Call Trace:
[337296.031379]  [<ffffffff81054346>] ? native_flush_tlb_others+0x76/0x90
[337296.031379]  [<ffffffff8105451c>] ? flush_tlb_mm+0x5c/0xa0
[337296.031379]  [<ffffffff8117a41f>] ? unmap_region+0xff/0x130
[337296.031379]  [<ffffffff8117becc>] ? do_munmap+0x26c/0x370
[337296.031379]  [<ffffffff8117c026>] ? sys_munmap+0x56/0x80
[337296.031379]  [<ffffffff8100b1a2>] ? system_call_fastpath+0x16/0x1b
[337296.031379] Code: 44 00 00 48 8b 05 c9 ca bb 00 41 8d b5 e7 00 00 00 4c 89 e7 ff 90 e0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 8b 35 00 32 bb 00 <4c> 89 e7 e8 a8 99 25 00 85 c0 74 ec eb 90 66 90 55 48 89 e5 48
[337296.031379] Call Trace:
[337296.031379]  [<ffffffff810542c8>] ? flush_tlb_others_ipi+0x128/0x130
[337296.031379]  [<ffffffff81054346>] ? native_flush_tlb_others+0x76/0x90
[337296.031379]  [<ffffffff8105451c>] ? flush_tlb_mm+0x5c/0xa0
[337296.031379]  [<ffffffff8117a41f>] ? unmap_region+0xff/0x130
[337296.031379]  [<ffffffff8117becc>] ? do_munmap+0x26c/0x370
[337296.031379]  [<ffffffff8117c026>] ? sys_munmap+0x56/0x80
[337296.031379]  [<ffffffff8100b1a2>] ? system_call_fastpath+0x16/0x1b
[337696.515795] sd 2:0:0:0: [sda] abort
[337696.516898] sd 2:0:0:0: [sda] abort
[337696.516968] sd 2:0:0:0: [sda] abort
[337829.903054] hrtimer: interrupt took 2429229 ns
[337869.247448] sd 2:0:0:0: [sda] abort
[337869.249385] sd 2:0:0:0: [sda] abort
[337869.249727] sd 2:0:0:0: [sda] abort
[337869.249777] sd 2:0:0:0: [sda] abort
[337869.249821] sd 2:0:0:0: [sda] abort
[337869.249871] sd 2:0:0:0: [sda] abort
[337869.249913] sd 2:0:0:0: [sda] abort
[337869.249956] sd 2:0:0:0: [sda] abort
[337869.249987] sd 2:0:0:0: [sda] abort
[337869.250042] sd 2:0:0:0: [sda] abort
[337869.250112] sd 2:0:0:0: [sda] abort
[337869.252267] sd 2:0:0:0: [sda] abort
[337869.252317] sd 2:0:0:0: [sda] abort
[337869.252376] sd 2:0:0:0: [sda] abort
[337869.252470] sd 2:0:0:0: [sda] abort
[337869.252536] sd 2:0:0:0: [sda] abort
[337869.252580] sd 2:0:0:0: [sda] abort
[337869.252653] sd 2:0:0:0: [sda] abort
[337869.252698] sd 2:0:0:0: [sda] abort
[337869.252751] sd 2:0:0:0: [sda] abort
[337869.252798] sd 2:0:0:0: [sda] abort
[337869.252843] sd 2:0:0:0: [sda] abort
[337869.252886] sd 2:0:0:0: [sda] abort
[337869.252919] sd 2:0:0:0: [sda] abort
[337869.252963] sd 2:0:0:0: [sda] abort
[337869.253008] sd 2:0:0:0: [sda] abort
[337869.253057] sd 2:0:0:0: [sda] abort
[337869.253093] sd 2:0:0:0: [sda] abort
[337869.253120] sd 2:0:0:0: [sda] abort
[337869.253160] sd 2:0:0:0: [sda] abort
[337869.253198] sd 2:0:0:0: [sda] abort
[337869.253278] sd 2:0:0:0: [sda] abort
[337869.253313] sd 2:0:0:0: [sda] abort
[337869.253351] sd 2:0:0:0: [sda] abort
[337869.253413] sd 2:0:0:0: [sda] abort
[337869.253448] sd 2:0:0:0: [sda] abort
[337869.253483] sd 2:0:0:0: [sda] abort
[337869.253513] sd 2:0:0:0: [sda] abort
[337869.253583] sd 2:0:0:0: [sda] abort
[337869.253630] sd 2:0:0:0: [sda] abort
[337869.253667] sd 2:0:0:0: [sda] abort
[337869.253705] sd 2:0:0:0: [sda] abort
[337869.253738] sd 2:0:0:0: [sda] abort
[337869.253786] sd 2:0:0:0: [sda] abort

Tell me if these errors are associated with ZFS? If you think not, I'll just move on to the use of LVM and will close this issue, in my case, it solves the problem.

behlendorf commented 8 years ago

They're not from ZFS but now I'm not quite sure I understand your configuration. Could you describe it. It looks like your using mdraid and ZFS.

akkyoh commented 8 years ago

First server have 2 HDD disk (raid-1, mdraid /dev/md124) and 2 SSD disk (raid-1, mdraid /dev/md125). /dev/md124 is zfs hdd pool, /dev/md125 is zfs ssd pool. Next, i create VMs on ssd pool and add second disks VMs on zfs hdd pool (for backups).

cat /proc/mdstat:

md124 : active raid1 sdc3[0] sdd3[1]
      934128640 blocks super 1.2 [2/2] [UU]
      bitmap: 0/7 pages [0KB], 65536KB chunk

md125 : active raid1 sdb1[1] sda1[0]
      486153216 blocks super 1.2 [2/2] [UU]
      bitmap: 4/4 pages [16KB], 65536KB chunk

zfs list

NAME                                USED  AVAIL  REFER  MOUNTPOINT
hdd                                45.2G   815G  12.9G  /storage/hdd
hdd/vsv1001-1fbi-fzk8xviz3ewwlg71  32.0G   815G  32.0G  -
hdd/vsv1002-1ibz-puanslbnyhsfohke   202M   815G   202M  -
hdd/vsv1003-1jqf-iggzhwcmhzbenpbi  40.0M   815G  40.0M  -
ssd                                54.1G   392G    19K  /storage/ssd
ssd/vsv1001-0nic-fzk8xviz3ewwlg71  38.9G   392G  38.9G  -
ssd/vsv1002-08yr-puanslbnyhsfohke  12.6G   392G  12.6G  -
ssd/vsv1003-05wg-iggzhwcmhzbenpbi  2.46G   392G  2.46G  -

For runs VMs I used QEMU 1.5.3 from centos 7 repository, example config disks for v1001:

    <disk type="block" device="disk">
          <driver name="qemu" cache="none" discard="unmap" io="native"/>
          <source dev="/dev/zvol/ssd/vsv1001-0nic-fzk8xviz3ewwlg71"/> (this is thin provision zvol with LZ4 compression on SSD /dev/md125 mdraid)
          <target dev="sda" bus="sata"/>
    </disk>
    <disk type="block" device="disk">
          <driver name="qemu" cache="none" discard="unmap" io="native"/>
          <source dev="/dev/zvol/hdd/vsv1001-1fbi-fzk8xviz3ewwlg71"/> (this is thin provision zvol with LZ4 compression on HDD /dev/md124 mdraid)
          <target dev="sdb" bus="sata"/>
    </disk>

If I use "bus=sata", inside VMs show error - ata4: hard resetting link If I use "bus=virtio-scsi", inside VMs show error - sd 2:0:0:0: [sda / sdb] abort

Inside VMs:

/dev/sda1 on / type ext4 (rw,errors=remount-ro,discard,usrquota)
/dev/sdb1 on /backup type ext4 (rw,errors=remount-ro,discard,noquota)

In my opinion it is quite simple configuration, and it should work.

I'll check the effect of KSM and TRIM (DISCARD / UNMAP) on the system, if I can find out something, I'll tell you.

akkyoh commented 8 years ago

I found an interesting behavior of the system without the use of virtualization.

/dev/md124 - is RAID-1 mdraid. /storage/hdd - is mountoiunt HDD zfs pool.

I run check mdraid - echo 'check'> /sys/block/md3/md/sync_action and I run dd if=/dev/sda of=/storage/hdd/test bs=4M count=5000 In operation command dd, recording stops for some time (from several seconds to several minutes). Error while nowhere. I made a tracing process on it is seen as a record freezes.

read(0, "849:849\tevent_buttons:94:94\teven"..., 4194304) = 4194304
write(1, "849:849\tevent_buttons:94:94\teven"..., 4194304) = 4194304
read(0, "L]:\342\230\203\342\230\203\342\230\203\342\230\203\342\230\203 [asis,\342\230\203\342\230\203\342"..., 4194304) = 4194304
write(1, "L]:\342\230\203\342\230\203\342\230\203\342\230\203\342\230\203 [asis,\342\230\203\342\230\203\342"..., 4194304

At the same time inside the virtual machines appear errors "sd 2:0:0:0: [sda] abort" and "sd 2:0:0:1: [sdb] abort". If you stop testing mdraid, errors disappear.

Disable KSM and TRIM does not help.

behlendorf commented 8 years ago

@akkyoh it sounds like you've found your answer. It sounds like checking your mdraid devices can result in them not servicing user IO for many seconds. The virtual machine devices which are backed by these devices will certainly treat it as non-responsive device and perform normal error handling/recovery. You'll have to take this issue with the mdraid developers.

akkyoh commented 8 years ago

Thank you. I found a solution for themselves. After changing scheduler with CFQ on NOOP (for SSD) and DEADLOCK (for HDD) the problem disappeared. Maybe I encountered a problem described here https://bugzilla.kernel.org/show_bug.cgi?id=12309

The problem is closed. Thank you, sorry for taken away time.