openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.33k stars 1.72k forks source link

0.6.5.6 - I/O timeout during disk spin up #4638

Open flo82 opened 8 years ago

flo82 commented 8 years ago

see #3856 for further details. Bug is still present:

i'm still experiencing this bug with 0.6.5.6 on Ubuntu 16.04. with same chipset (SAS2008).

This is what i did: created a zpool on the device and sent it to sleep (hdparm -y). Then started writing a file to it. Result was: [59526.359997] sd 0:0:1:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [59526.360003] sd 0:0:1:0: [sda] CDB: [59526.360006] Read(16): 88 00 00 00 00 00 31 28 fd 58 00 00 00 08 00 00 [59526.360022] blk_update_request: I/O error, dev sda, sector 824769380

i created a ext3 filesystem on the device and sent it to sleep. Then started the fily copy again. Result: No messages in dmesg.

I also checked the original file with the copied one - they are identical. So this bug has to do with ZFS and is not closed.Any Ideas? Need further Information @behlendorf ?

hoppel118 commented 7 years ago

Hey guys,

I also see this error for my pool. I also only see this error in combination with my zfs hdds in the syslog. It always happens, when my hdds have to wake up after a spindown (127). There are no errors in the hdd smart informations.

Oct  6 07:42:11 omv kernel: [21313.092935] sd 1:0:1:0: [sdc] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:11 omv kernel: [21313.092939] sd 1:0:1:0: [sdc] tag#2 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d f0 00 00 00 08 00 00
Oct  6 07:42:11 omv kernel: [21313.092941] blk_update_request: I/O error, dev sdc, sector 3645713904
Oct  6 07:42:11 omv kernel: [21313.092989] sd 1:0:2:0: [sdd] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:11 omv kernel: [21313.092990] sd 1:0:2:0: [sdd] tag#1 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d f0 00 00 00 08 00 00
Oct  6 07:42:11 omv kernel: [21313.092991] blk_update_request: I/O error, dev sdd, sector 3645713904
Oct  6 07:42:11 omv kernel: [21313.093036] sd 1:0:7:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:11 omv kernel: [21313.093037] sd 1:0:7:0: [sdi] tag#0 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d c0 00 00 00 08 00 00
Oct  6 07:42:11 omv kernel: [21313.093038] blk_update_request: I/O error, dev sdi, sector 3645713856
Oct  6 07:42:11 omv zed: eid=11 class=io pool=mediatank
Oct  6 07:42:11 omv zed: eid=12 class=io pool=mediatank
Oct  6 07:42:29 omv kernel: [21330.548739] sd 1:0:5:0: [sdg] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:29 omv kernel: [21330.548743] sd 1:0:5:0: [sdg] tag#3 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d f0 00 00 00 08 00 00
Oct  6 07:42:29 omv kernel: [21330.548745] blk_update_request: I/O error, dev sdg, sector 3645713904
Oct  6 07:42:29 omv kernel: [21330.548788] sd 1:0:6:0: [sdh] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:29 omv kernel: [21330.548790] sd 1:0:6:0: [sdh] tag#1 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d f0 00 00 00 08 00 00
Oct  6 07:42:29 omv kernel: [21330.548791] blk_update_request: I/O error, dev sdh, sector 3645713904
Oct  6 07:42:29 omv zed: eid=13 class=io pool=mediatank
Oct  6 07:42:38 omv kernel: [21339.463139] sd 1:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:38 omv kernel: [21339.463143] sd 1:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d e8 00 00 00 08 00 00
Oct  6 07:42:38 omv kernel: [21339.463145] blk_update_request: I/O error, dev sdb, sector 3645713896
Oct  6 07:42:55 omv kernel: [21356.397858] sd 1:0:3:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:55 omv kernel: [21356.397861] sd 1:0:3:0: [sde] tag#0 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d c0 00 00 00 08 00 00
Oct  6 07:42:55 omv kernel: [21356.397863] blk_update_request: I/O error, dev sde, sector 3645713856
Oct  6 07:42:55 omv kernel: [21356.397905] sd 1:0:4:0: [sdf] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:55 omv kernel: [21356.397907] sd 1:0:4:0: [sdf] tag#1 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d c0 00 00 00 08 00 00
Oct  6 07:42:55 omv kernel: [21356.397908] blk_update_request: I/O error, dev sdf, sector 3645713856
Oct  6 07:42:55 omv zed: eid=14 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=15 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=16 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=17 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=18 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=19 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=20 class=io pool=mediatank

My hardware specs:

Mainboard:Supermicro X11SSH-CTF CPU: Intel Xeon E3-1240Lv5 4x 2.10GHz So.1151 TRAY HBA: LSI SAS3008 onboard HDDs: 8x4TB WD Red Raid-Z2

My HBA is pci passed through to the kvm. The mpt3sas modules are blacklisted on the host system.

Host-OS - "Proxmox":

root@proxmox:~# uname -a
Linux proxmox 4.4.19-1-pve #1 SMP Wed Sep 14 14:33:50 CEST 2016 x86_64 GNU/Linux

root@proxmox:~# cat /etc/debian_version
8.6

root@proxmox:~# pveversion -v
proxmox-ve: 4.3-66 (running kernel: 4.4.19-1-pve)
pve-manager: 4.3-1 (running version: 4.3-1/e7cdc165)
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-46
qemu-server: 4.0-88
pve-firmware: 1.1-9
libpve-common-perl: 4.0-73
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-61
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6.1-6
pve-container: 1.0-75
pve-firewall: 2.0-29
pve-ha-manager: 1.0-35
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.4-1
lxcfs: 2.0.3-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8

Guest-OS (KVM) - Openmediavault3.0.41:

root@omv:~# uname -a
Linux omv 4.4.19-1-pve #1 SMP Wed Sep 14 14:33:50 CEST 2016 x86_64 GNU/Linux

root@omv:~# cat /etc/debian_version
8.6

As you can see, I also use the proxmox kernel in the kvm.

I use the following zfs packages which have a dependency to the openmediavault-plugin "openmediavault-zfs":

root@omv:~# apt-cache depends --important openmediavault-zfs
openmediavault-zfs
  Hängt ab von: openmediavault
  Hängt ab von: debian-zfs
  Hängt ab von: build-essential
root@omv:~# apt-cache depends --important policy openmediavault-zfs
openmediavault-zfs
  Hängt ab von: openmediavault
  Hängt ab von: debian-zfs
  Hängt ab von: build-essential
root@omv:~# apt-cache depends --important debian-zfs
debian-zfs
  Hängt ab von: spl
  Hängt ab von: spl-dkms
    spl-modules-3.16.0-4-amd64
  Hängt ab von: zfs-dkms
  Hängt ab von: zfsutils
    zfsutils-linux
root@omv:~# apt-cache policy debian-zfs
debian-zfs:
  Installiert:           7~jessie
  Installationskandidat: 7~jessie
  Versionstabelle:
 *** 7~jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache policy zfs-dkms
zfs-dkms:
  Installiert:           0.6.5.7-8-jessie
  Installationskandidat: 0.6.5.7-8-jessie
  Versionstabelle:
     0.6.5.8-1~bpo8+1 0
        100 http://httpredir.debian.org/debian/ jessie-backports/contrib amd64 Packages
 *** 0.6.5.7-8-jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache policy zfsutils
zfsutils:
  Installiert:           0.6.5.7-8-jessie
  Installationskandidat: 0.6.5.7-8-jessie
  Versionstabelle:
 *** 0.6.5.7-8-jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache policy zfs-dkms
zfs-dkms:
  Installiert:           0.6.5.7-8-jessie
  Installationskandidat: 0.6.5.7-8-jessie
  Versionstabelle:
     0.6.5.8-1~bpo8+1 0
        100 http://httpredir.debian.org/debian/ jessie-backports/contrib amd64 Packages
 *** 0.6.5.7-8-jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache policy spl
spl:
  Installiert:           0.6.5.7-5-jessie
  Installationskandidat: 0.6.5.7-5-jessie
  Versionstabelle:
     0.6.5.8-2~bpo8+2 0
        100 http://httpredir.debian.org/debian/ jessie-backports/main amd64 Packages
 *** 0.6.5.7-5-jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache policy spl-dkms
spl-dkms:
  Installiert:           0.6.5.7-5-jessie
  Installationskandidat: 0.6.5.7-5-jessie
  Versionstabelle:
     0.6.5.8-2~bpo8+2 0
        100 http://httpredir.debian.org/debian/ jessie-backports/main amd64 Packages
 *** 0.6.5.7-5-jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~#

If you need other informations, please tell me what you need.

Thanks and greetings Hoppel

hoppel118 commented 7 years ago

After disabling spindown and rebooting the kvm I don't see this messages anymore. But I want to spindown my hdds.

hoppel118 commented 7 years ago

OK, I tried another thing. I use 8x4TB WD Red hdds behind my lsi sas3008 controller. I read that there is a tool to deactivate the automatic spindown in the hdds firmware.

So I downloaded the "Idle3-tools" to my openmediavault (debian jessie) kvm.

The default value for my disks was:

root@omv:~# idle3ctl -g /dev/sd[b-i]
Idle3 timer set to 138 (0x8a)

So I decided to deactivate the default spindown with the following command for all 8 disks:

root@omv:~# idle3ctl -d /dev/sd[b-i]
Idle3 timer is disabled
Please power cycle your drive off and on for the new setting to be taken into account. A reboot will not be enough!

I power cycled the server completely, started again and had a look at the result with the following command:

root@omv:~# idle3ctl -g105 /dev/sd[b-i]
Idle3 timer is disabled

At this stage of configuration I don't have any issues/errors in the syslog while opening a samba-share with a zfs file system as a basis.

After that I configured my "/etc/hdparm.conf" with the openmediavault webui the following way:

dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7XXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E6LXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2XXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E6LXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7HXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5EXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E3NXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7XXXXX {
    spindown_time = 240
    write_cache = off
}

This way openmediavault has the privileges to spindown the disk after 20 minutes.

Now I see the following on the command line:

root@omv:~# hdparm -C /dev/sd[bcdefghi]

/dev/sdb:
 drive state is:  active/idle

/dev/sdc:
 drive state is:  active/idle

/dev/sdd:
 drive state is:  active/idle

/dev/sde:
 drive state is:  active/idle

/dev/sdf:
 drive state is:  active/idle

/dev/sdg:
 drive state is:  active/idle

/dev/sdh:
 drive state is:  active/idle

/dev/sdi:
 drive state is:  active/idle

20 minutes later I see the following on the command line:

root@omv:~# hdparm -C /dev/sd[bcdefghi]

/dev/sdb:
 drive state is:  standby

/dev/sdc:
 drive state is:  standby

/dev/sdd:
 drive state is:  standby

/dev/sde:
 drive state is:  standby

/dev/sdf:
 drive state is:  standby

/dev/sdg:
 drive state is:  standby

/dev/sdh:
 drive state is:  standby

/dev/sdi:
 drive state is:  standby

So we see that the spindown controlled by openmediavault works fine. Now I opened a file from one of my samba-shares with zfs as a file system. I can see the disks spinning up with hdparm and I see the following messages in the logfile again:

complete syslog: http://pastebin.com/9A300u3R

Oct 10 17:27:01 omv kernel: [ 8733.047909] sd 7:0:5:0: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:01 omv kernel: [ 8733.047912] sd 7:0:5:0: [sdg] tag#0 CDB: Read(16) 88 00 00 00 00 00 99 2b 60 80 00 00 00 08 00 00
Oct 10 17:27:01 omv kernel: [ 8733.047914] blk_update_request: I/O error, dev sdg, sector 2569756800
Oct 10 17:27:01 omv zed: eid=11 class=io pool=mediatank
Oct 10 17:27:18 omv kernel: [ 8750.314209] sd 7:0:2:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:18 omv kernel: [ 8750.314212] sd 7:0:2:0: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 00 99 2b 60 80 00 00 00 08 00 00
Oct 10 17:27:18 omv kernel: [ 8750.314214] blk_update_request: I/O error, dev sdd, sector 2569756800
Oct 10 17:27:18 omv kernel: [ 8750.314259] sd 7:0:6:0: [sdh] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:18 omv kernel: [ 8750.314260] sd 7:0:6:0: [sdh] tag#1 CDB: Read(16) 88 00 00 00 00 00 99 2b 60 80 00 00 00 08 00 00
Oct 10 17:27:18 omv kernel: [ 8750.314261] blk_update_request: I/O error, dev sdh, sector 2569756800
Oct 10 17:27:18 omv zed: eid=12 class=io pool=mediatank
Oct 10 17:27:18 omv zed: eid=13 class=io pool=mediatank
Oct 10 17:27:18 omv zed: eid=14 class=io pool=mediatank
Oct 10 17:27:35 omv kernel: [ 8767.469326] sd 7:0:4:0: [sdf] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:35 omv kernel: [ 8767.469330] sd 7:0:4:0: [sdf] tag#1 CDB: Read(16) 88 00 00 00 00 00 98 07 e8 08 00 00 00 08 00 00
Oct 10 17:27:35 omv kernel: [ 8767.469332] blk_update_request: I/O error, dev sdf, sector 2550654984
Oct 10 17:27:35 omv kernel: [ 8767.469378] sd 7:0:7:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:35 omv kernel: [ 8767.469379] sd 7:0:7:0: [sdi] tag#0 CDB: Read(16) 88 00 00 00 00 00 98 07 e8 08 00 00 00 08 00 00
Oct 10 17:27:35 omv kernel: [ 8767.469380] blk_update_request: I/O error, dev sdi, sector 2550654984
Oct 10 17:27:35 omv zed: eid=15 class=io pool=mediatank
Oct 10 17:27:36 omv zed: eid=16 class=io pool=mediatank
Oct 10 17:27:52 omv kernel: [ 8784.531993] sd 7:0:1:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:52 omv kernel: [ 8784.531997] sd 7:0:1:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 00 98 07 e8 08 00 00 00 08 00 00
Oct 10 17:27:52 omv kernel: [ 8784.531999] blk_update_request: I/O error, dev sdc, sector 2550654984
Oct 10 17:27:52 omv kernel: [ 8784.532040] sd 7:0:3:0: [sde] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:52 omv kernel: [ 8784.532041] sd 7:0:3:0: [sde] tag#1 CDB: Read(16) 88 00 00 00 00 00 98 07 e8 08 00 00 00 08 00 00
Oct 10 17:27:52 omv kernel: [ 8784.532042] blk_update_request: I/O error, dev sde, sector 2550654984
Oct 10 17:27:53 omv zed: eid=17 class=io pool=mediatank
Oct 10 17:27:53 omv zed: eid=18 class=io pool=mediatank
Oct 10 17:27:53 omv zed: eid=19 class=io pool=mediatank
Oct 10 17:28:02 omv kernel: [ 8793.895994] sd 7:0:0:0: [sdb] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:28:02 omv kernel: [ 8793.895998] sd 7:0:0:0: [sdb] tag#1 CDB: Read(16) 88 00 00 00 00 00 98 02 94 48 00 00 00 18 00 00
Oct 10 17:28:02 omv kernel: [ 8793.896000] blk_update_request: I/O error, dev sdb, sector 2550305864
Oct 10 17:28:02 omv zed: eid=20 class=io pool=mediatank

So that didn't help at all and i brought it back to default values:

root@omv:~# idle3ctl -s 138 /dev/sd[b-i]
Idle3 timer set to 138 (0x8a)
Please power cycle your drive off and on for the new setting to be taken into account. A reboot will not be enough!
root@omv:~# idle3ctl -g /dev/sd[b-i]
Idle3 timer set to 138 (0x8a)
root@omv:~# nano /etc/hdparm.conf
quiet
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7XXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E6XXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2XXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E6LXXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7HXXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5EXXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E3NXXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7XXXXXX {
    write_cache = off
}

What do you think about this?

A last check can be to clone the kvm to bare metal and to check the whole thing again. Maybe it has something to do with kvm or with pci passthrough. But for this I need some time.

Greetings Hoppel

hoppel118 commented 7 years ago

These issues describe the same thing:

https://github.com/zfsonlinux/zfs/issues/4713 https://github.com/zfsonlinux/zfs/issues/3785 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1504909?comments=all

Greetings Hoppel

luxflow commented 7 years ago

I also encountered this issues. I did several tests intel sata controller + SATA hdd + zfs + writing during spin up = no issue sas2008 controller + SATA hdd + ext4 + writing during spin up = no issue sas2008 controller + SATA hdd + zfs + writing during spin up = i/o issue maybe SATA disk behind SAS controller is problem

hoppel118 commented 7 years ago

There might be problem between zfs and "mpt3sas" driver.

Is it possible for you to reduce your spin up time in your controller bios? Maybe it's possible for you to stagger spin up two or three disks at a time. This should be possible if your controller is flashed to "it mode" and if your psu is powerful enough.

For me it's not possible to check this at the moment, because I use a beta firmware from supermicro, where the option to spin up some disks at a time is not available.

How many disks do you use behind your sas2008 controller for zfs? How long do you have to wait until all disks got up?

Greetings Hoppel

luxflow commented 7 years ago

can't test bios, server should be rebooted, 4 disks, I don't know how long it is exactly, but spin up serially

red-scorp commented 6 years ago

Same problem on Z87 Extreme11/ac -> 22 x SATA3 (16 x SAS3 12.0 Gb/s + 6 x SATA3 6.0 Gb/s) from LSI SAS 3008 Controller+ 3X24R Expander

OS: Ubuntu 18.04 dev

$ cat /etc/issue
Ubuntu Bionic Beaver (development branch) \n \l
$ uname -a
Linux AGVault 4.15.0-12-generic #13-Ubuntu SMP Thu Mar 8 06:24:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ dpkg -l zfs*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-=================================
un  zfs            <none>       <none>       (no description available)
un  zfs-dkms       <none>       <none>       (no description available)
un  zfs-dracut     <none>       <none>       (no description available)
un  zfs-fuse       <none>       <none>       (no description available)
un  zfs-initramfs  <none>       <none>       (no description available)
un  zfs-modules    <none>       <none>       (no description available)
ii  zfs-zed        0.7.5-1ubunt amd64        OpenZFS Event Daemon
un  zfsutils       <none>       <none>       (no description available)
ii  zfsutils-linux 0.7.5-1ubunt amd64        command-line tools to manage Open

ZFS hangs on spin-up of SATA HDDs. So I assume It's a problem between LSI controller driver and ZFS. mpt3sas 17.100.00.00

I'll try BIOS updates, let's see if it fix the problem

UPDATE: I've updated MB BIOS and flash SAS Contoller to IT mode with a newest available FW from 9300 card. This did not help with the disk spin-up problem. funny enough it's not only the ZFS freezes but hddtemp and smartctl too. This issue might be related not to ZFS but to misbehavior of mpt3sas itself.

Please let me know if you found any solution or workarounds to the freezing disk on spin-up? Thanks in advance!

d-helios commented 6 years ago

I have the same issue with SAS drives.

my configuration:

HBA: lsi sas 9300-8e (Symbios Logic SAS3008)
Drives: HUC101818CS4204, PX05SMB040

kernel parameters:

BOOT_IMAGE=/vmlinuz-4.15.0-23-generic root=UUID=4f30713c-5618-4c31-a051-97a9e5acee09 ro console=tty1 console=ttyS0,115200 dm_mod.use_blk_mq=y scsi_mod.use_blk_mq=y transparent_hugepage=never processor.max_cstate=1 udev.children-max=32 mpt3sas.msix_disable=1

Notice: I have the same configuration on solaris and it's work fine. the only thing that I changed is power-condition:false statement in sd.conf:

sd-config-list=
"HGST    HUH", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,physical-block-size:4096",
"HGST    HUS72", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,physical-block-size:4096",
"HGST    HUC10", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,physical-block-size:4096",
"HGST    HUC15", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,physical-block-size:4096",
"HGST    HUSMH", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,throttle-max:32,disksort:false,cache-nonvolatile:true,power-condition:false,physical-block-size:4096",
"HGST    HUSMM", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,throttle-max:32,disksort:false,cache-nonvolatile:true,power-condition:false,physical-block-size:4096",
"TOSHIBA PX", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,throttle-max:32,disksort:false,cache-nonvolatile:true,power-condition:false,physical-block-size:4096";
cwedgwood commented 6 years ago

@d-helios one ugly hack to paper over the issue is to tweak zfs-import-cache.service unit file (in the [Service] section) with something like:

# quirk/hack to make sure all the disks are visible
ExecStartPre=/sbin/modprobe mpt3sas
ExecStartPre=/bin/sleep 13
ExecStartPre=/sbin/udevadm settle

(tweak as appropriate for you).

You probably do not need the modprobe as the module will normally be loaded by this time, but for testing (systemctl stop/start w/ rmmod) it is needed.

cwedgwood commented 5 years ago

@behlendorf @ahrens

Would it make sense to "enhance" zpool import to have a timeout parameter, loop until import completes or we timeout?

What would me the 'right thing' to do when we reach the end of a timeout and it's possible to import and pool in a degraded manner?

ghost commented 4 years ago

This is manifesting with a X10SDV based system (mp2sas controller built-in) for me. Testing a kernel with no CONFIG_PM but thus far messing with the controller BIOS setting might be the only way.

malventano commented 4 years ago

Same problem on Z87 Extreme11/ac -> 22 x SATA3 (16 x SAS3 12.0 Gb/s + 6 x SATA3 6.0 Gb/s) from LSI SAS 3008 Controller+ 3X24R Expander

@red-scorp I've been troubleshooting a very similar issue over here. Did you ever resolve yours? If so, how?

Thanks in advance for any tips/info...

red-scorp commented 4 years ago

@red-scorp I've been troubleshooting a very similar issue over here. Did you ever resolve yours? If so, how?

Thanks in advance for any tips/info...

Nope, I did switch to another controller. Unfortunately, this bug is not fixed till now. I do use this controller for linux md-raid. There it works also with some disk sleep issues. Basically I had to disable sleep on disks all together.

malventano commented 4 years ago

Nope, I did switch to another controller. Unfortunately, this bug is not fixed till now. I do use this controller for linux md-raid. There it works also with some disk sleep issues. Basically I had to disable sleep on disks all together.

Could you give me some more detail on your expander setup? What is the actual hardware / enclosure in use? Are you using multipath? Did it happen on the SAS drives / SATA / both? My issues might be pointing in the direction of the SAS-SATA multipath bridges in this repurposed Netapp gear I'm using.

red-scorp commented 4 years ago

Could you give me some more detail on your expander setup? What is the actual hardware / enclosure in use? Are you using multipath? Did it happen on the SAS drives / SATA / both? My issues might be pointing in the direction of the SAS-SATA multipath bridges in this repurposed Netapp gear I'm using.

As you know, the Z87 Extreme11/ac does use LSI SAS 3008 Controller+ 3X24R Expander. The board itself uses non usual in this case SATA connectors for this setup. I had to use cable converting SATA to SFF8087 and very primitive Chinese Rack-mounted RAID case which as I know does only routing of signals from SFF-connector to the drives itself. I pretty sure the case and it's backplanes are really simple though you may assume the drives are connected directly to the SATA ports on MB. I also use only SATA drives in this setup. All of them are 3TB and some of them are about a 8+ years old, some were new at the time of experimenting. This a naturally growing system which holds my personal data since ages.

Now I use https://www.amazon.de/dp/B00YHE2IPU/ref=pe_3044161_185740101_TE_item (SATA-card) and https://www.amazon.de/gp/product/B0050SLTPC (SAS-card) to communicate to 24 3TB HDDs with ZFS without any issues.

The mentioned LSI SAS setup is attach linux MD-RAID10 with 16 cheap SSDs w/ disabled sleep.

As SW I use Ubuntu 18.04 LTE at the moment. At the moment of my writing above it was one of earlier releases.

tobby88 commented 3 years ago

Same here :(

Proxmox 6.2 (based on Debian 10), Kernel 5.4.44, ZoL 0.8.4

During disk spin up I get this:

Jul 21 17:56:11 t-hyper zed: eid=20 class=delay pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500af05df7b-part1 Jul 21 17:56:11 t-hyper zed: eid=21 class=delay pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500af05df7b-part1 Jul 21 17:56:11 t-hyper zed: eid=22 class=io pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500af05df7b-part1 Jul 21 17:56:11 t-hyper zed: eid=23 class=io pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500af05df7b-part1 Jul 21 17:56:11 t-hyper zed: eid=24 class=delay pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500af060223-part1 Jul 21 17:56:11 t-hyper zed: eid=25 class=delay pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500af060223-part1 Jul 21 17:56:12 t-hyper zed: eid=26 class=io pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500af060223-part1 Jul 21 17:56:12 t-hyper zed: eid=27 class=io pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500af060223-part1 Jul 21 17:56:21 t-hyper zed: eid=28 class=delay pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500af05c897-part1 Jul 21 17:56:21 t-hyper zed: eid=29 class=delay pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500af05c897-part1 Jul 21 17:56:21 t-hyper zed: eid=30 class=io pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500af05c897-part1 Jul 21 17:56:21 t-hyper zed: eid=31 class=io pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500af05c897-part1 Jul 21 17:56:22 t-hyper zed: eid=32 class=delay pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500aeff9323-part1 Jul 21 17:56:22 t-hyper zed: eid=33 class=delay pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500aeff9323-part1 Jul 21 17:56:22 t-hyper zed: eid=34 class=io pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500aeff9323-part1 Jul 21 17:56:22 t-hyper zed: eid=35 class=io pool_guid=0x459729ACEBE4C0EF vdev_path=/dev/disk/by-id/wwn-0x5000c500aeff9323-part1

"zpool status" shows errors after this

I am using SAS disks on a LSI/Broadcom SAS3416 controller. Same problem with SATA disks on a LSI SAS3008.

hoppel118 commented 3 years ago

I am still interested in a solution.

Regards Hoppel

bsdice commented 3 years ago

In case someone finds this issue through a search engine, looking for a workaround:

I got hit by this problem as well, running ZFS 0.8.4 on Linux 5.4 (Arch Linux LTS-kernel) with eight 14 TB SATA disks in RAIDZ2 behind an LSI 2308 controller flashed to IT mode. Whenever I turn on hd-idle and let it spin down the disks (they sit idle 20h per day), ZFS will complain loudly in kernel log during wakeup. After a couple of days of testing many read and, most worringly, also write and even checksum errors occurred (zpool status). Scrub could correct all problems, but this needed to be fixed asap.

I solved the problem by doing away with the LSI and buying a JMicron JMB585 5-port SATA controller card instead. These chips exist since about 2018 so relatively new. No extra driver is needed, the card will run with any even remotely recent stock AHCI driver. Since the switch no more errors have occurred at all, even though I aggressively put disks into standby when not in use. As far as I can see the card also has no PCIe bottleneck, because it can use PCIe 3.0 with two lanes, supposedly reaching 1700 MByte/s transfer rates. Should be good enough for 5 modern HDDs. There are mostly chinese no-names out there, US$ 30-40 in 2020, I recommend getting a card with largish black heatsink though, to preclude thermal issues. There appear to be no electrolytic capacitors on these cards, so these might even be very long term stable (10+years).