im-0 / hpsahba

Tool to enable/disable HBA mode on some HP Smart Array controllers
GNU General Public License v2.0
128 stars 45 forks source link

Issue with P812 after having it workin #37

Open jfreak53 opened 11 months ago

jfreak53 commented 11 months ago

I had my P812 controller working for like 2 weeks, no issues. I ran the directions here: https://github.com/im-0/hpsahba/tree/master/contrib/dkms

I am running Ubuntu 22 on kernel v 5.15.0-83. After editing patch.sh it worked with my 5.15 kernel. I added the line hpsa hpsa_use_nvram_hba_flag=1 to my modprobe file so it keeps after reboot. I rebooted multiple times in 2 weeks time period of it working, kept working, drives were there, I could access them, it worked.

Then one day all drives on that adapter just disappeared. I have rebooted, still won't come back. The controller see's them, they are just masked. HBA mode is enabled according to the tool:

./hpsahba -i /dev/sg0
VENDOR_ID='HP'
PRODUCT_ID='P812'
BOARD_ID='0x3249103c'
SOFTWARE_NAME=''
HARDWARE_NAME=''
RUNNING_FIRM_REV='3.66'
ROM_FIRM_REV='3.66'
REC_ROM_INACTIVE_REV='3.66'
YET_MORE_CONTROLLER_FLAGS='0xfa53a216'
NVRAM_FLAGS='0x08'
HBA_MODE_SUPPORTED=1
HBA_MODE_ENABLED=1

But yet, they are not there. I checked dmesg and this is what it gave me:

$ dmesg | grep masked
[    4.554241] hpsa 0000:42:00.0: scsi 1:0:1:0: masked Direct-Access     ATA      SSD 2TB          PHYS DRV SSDSmartPathCap- En- Exp=0
[    4.568580] hpsa 0000:42:00.0: scsi 1:0:2:0: masked Direct-Access     ATA      SSD 2TB          PHYS DRV SSDSmartPathCap- En- Exp=0
[    4.583223] hpsa 0000:42:00.0: scsi 1:0:3:0: masked Direct-Access     ATA      SSD 2TB          PHYS DRV SSDSmartPathCap- En- Exp=0
[    4.597927] hpsa 0000:42:00.0: scsi 1:0:4:0: masked Direct-Access     ATA      SSD 2TB          PHYS DRV SSDSmartPathCap- En- Exp=0
[    4.612871] hpsa 0000:42:00.0: scsi 1:0:5:0: masked Enclosure         HP       MSA70            enclosure SSDSmartPathCap- En- Exp=0
[    4.628075] hpsa 0000:42:00.0: scsi 1:0:6:0: masked Enclosure         HP       P812 INT EXP     enclosure SSDSmartPathCap- En- Exp=0
[    4.643197] hpsa 0000:42:00.0: scsi 1:0:7:0: masked Enclosure         PMCSIERA  SRC 8x6G        enclosure SSDSmartPathCap- En- Exp=0

So it seems they are there, it sees them, they are masked. If I check ssacli it gives me this:

$ ssacli ctrl all show detail

Smart Array P812 in Slot 4
   Bus Interface: PCI
   Slot: 4
   Serial Number: PAGXQ0ARHZY04W
   Cache Serial Number: PBCDF0CRH0D5U9
   RAID 6 Status: Enabled
   Controller Status: OK
   Hardware Revision: C
   Firmware Version: 3.66
   Firmware Supports Online Firmware Activation: False
   Cache Board Present: True
   Cache Status: Not Configured
   Total Cache Size: 1.0
   Total Cache Memory Available: 0.9
   Battery Backed Cache Size: 0.9
   Cache Backup Power Source: Capacitors
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: OK
   Number of Ports: 6 (2 Internal / 4 External )
   Driver Name: hpsa
   Driver Version: 3.4.20
   WWN Port: 50014380132301B0
   HBA Mode Enabled: True
   PCI Address (Domain:Bus:Device.Function): 0000:42:00.0
   Port Max Phy Rate Limiting Supported: False
   Sanitize Erase Supported: False
   Primary Boot Volume: None
   Secondary Boot Volume: None
   SPDM Supports Get Slot Certificate Chain: no
   SPDM Supports Get Controller Info       : no
   SPDM Supports Get Slot Info             : no
   SPDM Supports Set Import Certificate    : no
   SPDM Supports Set Invalidate Slot       : no
   Surface Scan Completion Supported: False
   Persistent Event Log Policy Change Supported: False

$ cat /proc/scsi/scsi
Attached devices:
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: HP       Model: P812             Rev: 3.66
  Type:   RAID                             ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: SEAGATE  Model: ST3146356SS      Rev: HS0F
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: WD       Model: WD3001FYYG-01SL3 Rev: VR07
  Type:   Direct-Access                    ANSI  SCSI revision: 06
Host: scsi0 Channel: 00 Id: 02 Lun: 00
  Vendor: WD       Model: WD3001FYYG-01SL3 Rev: VR07
  Type:   Direct-Access                    ANSI  SCSI revision: 06
Host: scsi0 Channel: 00 Id: 03 Lun: 00
  Vendor: WD       Model: WD3001FYYG-01SL3 Rev: VR07
  Type:   Direct-Access                    ANSI  SCSI revision: 06
Host: scsi0 Channel: 00 Id: 04 Lun: 00
  Vendor: SEAGATE  Model: ST3146356SS      Rev: HS0F
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 05 Lun: 00
  Vendor: WD       Model: WD3001FYYG-01SL3 Rev: VR07
  Type:   Direct-Access                    ANSI  SCSI revision: 06
Host: scsi0 Channel: 00 Id: 06 Lun: 00
  Vendor: WD       Model: WD3001FYYG-01SL3 Rev: VR07
$ ssacli ctrl slot=4 pd all show detail

Smart Array P812 in Slot 4

   HBA Drives

      physicaldrive 4E:1:1
         Port: 4E
         Box: 1
         Bay: 1
         Status: OK
         Drive Type: HBA Mode Drive
         Interface Type: Solid State SATA
         Size: 2 TB
         Drive exposed to OS: False
         Logical/Physical Block Size: 512/512
         Firmware Revision: U1014A0
         Serial Number: AA202300002100001831
         WWID: 50014380042BA501
         Model: ATA     SSD 2TB
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 1.5Gbps
         PHY Physical Link Rate: Unknown
         PHY Maximum Link Rate: Unknown
         Sanitize Erase Supported: False
         Shingled Magnetic Recording Support: None

So I swapped the drives out and rebooted, thinking maybe something happened to the drives. Still the same thing, won't show up. So then I re-ran modprobe to see if I could get them to re-show:

$ modprobe -r hpsa
$ modprobe hpsa hpsa_use_nvram_hba_flag=1

Still don't show up and below is what dmesg shows me:

[ 1128.665648] hpsa: unknown parameter 'hpsa_use_nvram_hba_flag' ignored
[ 1128.665655] hpsa: unknown parameter 'hpsa_use_nvram_hba_flag' ignored
[ 1128.665811] HP HPSA Driver (v 3.4.20-200)
[ 1128.665853] hpsa 0000:42:00.0: can't disable ASPM; OS doesn't have ASPM control
[ 1128.666674] hpsa 0000:42:00.0: Logical aborts not supported
[ 1128.666679] hpsa 0000:42:00.0: HP SSD Smart Path aborts not supported
[ 1128.708189] scsi host0: hpsa
[ 1128.708832] hpsa can't handle SMP requests
[ 1130.092220] hpsa 0000:42:00.0: scsi 0:0:0:0: added RAID              HP       P812             controller SSDSmartPathCap- En- Exp=1
[ 1130.092227] hpsa 0000:42:00.0: scsi 0:0:1:0: masked Direct-Access     ATA      WDC WD20SPZX-22U PHYS DRV SSDSmartPathCap- En- Exp=0
[ 1130.092230] hpsa 0000:42:00.0: scsi 0:0:2:0: masked Direct-Access     ATA      WDC WD20SPZX-22U PHYS DRV SSDSmartPathCap- En- Exp=0
[ 1130.092233] hpsa 0000:42:00.0: scsi 0:0:3:0: masked Direct-Access     ATA      WDC WD20SPZX-22U PHYS DRV SSDSmartPathCap- En- Exp=0
[ 1130.092236] hpsa 0000:42:00.0: scsi 0:0:4:0: masked Direct-Access     ATA      WDC WD20SPZX-22U PHYS DRV SSDSmartPathCap- En- Exp=0
[ 1130.092239] hpsa 0000:42:00.0: scsi 0:0:5:0: masked Direct-Access     ATA      WDC WD20SPZX-22U PHYS DRV SSDSmartPathCap- En- Exp=0
[ 1130.092241] hpsa 0000:42:00.0: scsi 0:0:6:0: masked Enclosure         HP       MSA70            enclosure SSDSmartPathCap- En- Exp=0
[ 1130.092244] hpsa 0000:42:00.0: scsi 0:0:7:0: masked Enclosure         HP       P812 INT EXP     enclosure SSDSmartPathCap- En- Exp=0
[ 1130.092246] hpsa 0000:42:00.0: scsi 0:0:8:0: masked Enclosure         PMCSIERA  SRC 8x6G        enclosure SSDSmartPathCap- En- Exp=0
[ 1130.092336] hpsa can't handle SMP requests
[ 1130.092823] scsi 0:0:0:0: RAID              HP       P812             3.66 PQ: 0 ANSI: 5
[ 1130.093612] scsi 0:0:0:0: Attached scsi generic sg0 type 12

I know the drives are good, if I plug them into another computer they work fine.

Like I said, this worked for 2 whole weeks, then just stopped one day 2 or 3 weeks ago and I have not been able to get them back up since then. Nothing has changed in the server since it worked, I haven't upgraded anything, kernel still the same version.

jfreak53 commented 11 months ago

I have since upgraded my P812 to 6.64 firmware, same deal, I even re-patched to make sure and rebooted.

$ ./hpsahba -i /dev/sg0
VENDOR_ID='HP'
PRODUCT_ID='P812'
BOARD_ID='0x3249103c'
SOFTWARE_NAME=''
HARDWARE_NAME=''
RUNNING_FIRM_REV='6.64'
ROM_FIRM_REV='6.63'
REC_ROM_INACTIVE_REV='6.63'
YET_MORE_CONTROLLER_FLAGS='0xfa57a216'
NVRAM_FLAGS='0x08'
HBA_MODE_SUPPORTED=1
HBA_MODE_ENABLED=1
fermino commented 10 months ago

Same issue here on a P410i with kernel 6.2. I guess it's not a board issue but rather something related to the kernel.

Iv commented 10 months ago

cat contrib/dkms/patch.sh ?

fermino commented 10 months ago

@Iv I'm using this one: https://github.com/fermino/hpsahba/blob/master/contrib/dkms/patch.sh

The behavior I have identified is the following: when starting up, the drives and cage appear masked. If I unload and load the dkms module again (with the flag, of course), the connected drives do show up (unmasked), but any further disconnect/connect of drives is not detected by the kernel. After that, trying to unload/load the kernel module again does not work (it gets stuck on load).

jfreak53 commented 9 months ago

@lv I was using the exact same one also