mikaku / Monitorix

Monitorix is a free, open source, lightweight system monitoring tool.
https://www.monitorix.org
GNU General Public License v2.0
1.12k stars 167 forks source link

Support hp raid controllers #463

Closed ostasevych closed 9 months ago

ostasevych commented 9 months ago

Hi! I have a hp p410i raid controller with 2 RAID1 arrays: SSD (/dev/sda 2500GB disks) and HDD (/dev/sdb 22TB disks). I tried to put them into the config file to monitor the temperature of each drive:

<disk>
        <list>
                0 = /dev/disk/by-path/pci-0000:00:14.1-ata-1.1
                1 = "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:0 -d cciss,0", "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:0 -d cciss,1"
                2 = "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:2 -d cciss,2", "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:2 -d cciss,3"
        </list>
        <desc>
                0 = individual drive /dev/sdc system disk 256GB
                1 = RAID1 /dev/sda data sdd array 500GB
                2 = RAID1 /dev/sdb backup hdd array 2TB
        </desc>
        <map>
               pci-0000:02:00.0-scsi-0:1:0:0 = "data sdd array 500GB"
               pci-0000:02:00.0-scsi-0:1:0:2 = "backup hdd array 2TB"
               pci-0000:00:14.1-ata-1.1 = "system disk"
        </map>

So, this is not working, as the monitorix doesn't want to recognise the devices /dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:0 -d cciss,N

<disk>
        <list>
                0 = /dev/disk/by-path/pci-0000:00:14.1-ata-1.1
                1 = "/dev/sda -d cciss,0", "/dev/sda -d cciss,1"
                2 = "/dev/sdb -d cciss,2", "/dev/sdb -d cciss,3"
        </list>
        <desc>
                0 = individual drive /dev/sdc system disk 256GB
                1 = RAID1 /dev/sda data sdd array 500GB
                2 = RAID1 /dev/sdb backup hdd array 2TB
        </desc>
        <map>
                /dev/sda = "data sdd array 500GB"
                /dev/sdb = "backup hdd array 2TB"
                pci-0000:00:14.1-ata-1.1 = "system disk"
        </map>

This is working, except mapping:

image

So, is that possible to fix both issues?

Additionally, is that possible to add the support of hp raid controllers by utilising ssacli or hpacucli hp specific utilities to monitor the state?

mikaku commented 9 months ago

Are you sure your Smart Array is still using the cciss driver?

ostasevych commented 9 months ago

Are you sure your Smart Array is still using the cciss driver?

Well, at least, smartctl responds with cciss attribute (you may see the temperature dynamic in the chart).

image

However there's no directory /dev/cciss, so I suppose it uses hpsa driver. How to check that, meanwhile?

One more questions: how to make the right charts representative, so the reallocated sector count and current pending sectors are shown as well?

mikaku commented 9 months ago

Can you, please, paste here in the output of smartctl -A /dev/sda and smartctl -A /dev/sda -d cciss,0?

ostasevych commented 9 months ago

Here it is:

# smartctl -A /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.5.0-17-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/sda: requires option '-d cciss,N'
Please specify device type with the -d option.

Use smartctl -h to get a usage summary
# smartctl -A /dev/sda -d cciss,0
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.5.0-17-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       284
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       26
160 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
161 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       100
163 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
164 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
165 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
166 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
167 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
168 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
169 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       100
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       0
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       5
194 Temperature_Celsius     0x0032   100   100   050    Old_age   Always       -       22
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       100
241 Total_LBAs_Written      0x0032   100   100   050    Old_age   Always       -       1
242 Total_LBAs_Read         0x0032   100   100   050    Old_age   Always       -       223947
245 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       1
mikaku commented 9 months ago

The problem with mapping is that Monitorix don't know how to handle the key and the value, since the key contains spaces. That's why there is the option -s in the command line of Monitorix.

Try adding -s equalsign in the Monitorix command line.

You don't need to touch the systemd unit file, just modify the file in /etc/sysconfig/monitorix where you can add extra command line arguments.

Then restart Monitorix, and you should see your map strings appear in the graph.

mikaku commented 9 months ago

One more questions: how to make the right charts representative, so the reallocated sector count and current pending sectors are shown as well?

As long as the output of your smartctl command shows the information, Monitorix will show the values of these attributes.

ostasevych commented 9 months ago

-s equalsign

thanks, it works!

mikaku commented 9 months ago

Perfect!

ostasevych commented 3 months ago

Hi I would like to open the ticket again.

So, I have tried to use /dev/disks/by-path instead of physical /dev/sdX indication and found that it still doesn't support spaces:

/etc/monitorix/monitorix.conf

# cat /etc/monitorix/monitorix.conf | grep -5 disk
...
<disk>
        <list>
                0 = /dev/disk/by-path/pci-0000:00:14.1-ata-1.1
#               1 = /dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:0
#               1 = "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:0 -d cciss,0", "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:0 -d cciss,1"
                1 = "/dev/sdb -d cciss,0", "/dev/sdb -d cciss,1"
                2 = "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:2 -d cciss,2", "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:2 -d cciss,3"
#               2 = "/dev/sdc -d cciss,2", "/dev/sdc -d cciss,3"

                3 = "/dev/disk/by-path/pci-0000:00:13.2-usb-0:1:1.0-scsi-0:0:0:0"
        </list>
        <desc>
                0 = individual drive system disk 256GB
                1 = RAID1 /dev/sdb data sdd array 500GB
                2 = RAID1 /dev/sdc backup hdd array 1TB
                3 = individual USB drive boot disk 2GB
        </desc>
        <map>
#               pci-0000:02:00.0-scsi-0:1:0:0 = "data sdd array 500GB"
                /dev/sdb -d cciss,0  = "data sdd 1 array 500GB"
                /dev/sdb -d cciss,1  = "data sdd 2 array 500GB"
#               pci-0000:02:00.0-scsi-0:1:0:2 = "backup hdd array 2TB"
                /dev/sdc -d cciss,2 = "backup hdd 1 array 2TB"
                /dev/sdc -d cciss,3 = "backup hdd 2 array 2TB"
                pci-0000:00:14.1-ata-1.1 = "system disk 256GB"
                pci-0000:00:13.2-usb-0:1:1.0-scsi-0:0:0:0 = "boot disk 2GB"
        </map>
        </alerts>
</disk>
...

image

Everything is fine if I use physical drive indication:

#               2 = "/dev/sdc -d cciss,2", "/dev/sdc -d cciss,3"
# smartctl -x /dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:2 -d cciss,2 | grep Celsius
194 Temperature_Celsius     -O---K   112   109   000    -    35
Current Temperature:                    35 Celsius
Power Cycle Min/Max Temperature:     29/36 Celsius
Lifetime    Min/Max Temperature:      6/38 Celsius

So, the problem is in this line:

2 = "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:2 -d cciss,2", "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:2 -d cciss,3"

I've placed -s equalsign as you have suggested to the default options when it starts

# cat /etc/default/monitorix
OPTIONS="-s equalsign"

UPD: It seems the matter is in the colon symbol : and spaces, as it works fine with path without extra options and with uid, partuuid, which doesn't contain colon symbol, with further options.

So, for me this configuration works:

        <list>
                0 = /dev/disk/by-uuid/887a2b16-5c24-47a8-8650-90fd7e6fc19d -d sat
                1 = /dev/disk/by-path/pci-0000:00:14.1-ata-1.1
                2 = "/dev/disk/by-uuid/9fc5ac7e-6660-4e7f-a069-c171e5ce7675 -d cciss,0", /dev/disk/by-uuid/9fc5ac7e-6660-4e7f-a069-c171e5ce7675 -d cciss,1"
                3 = "/dev/disk/by-uuid/6f5a533b-ac42-4151-b467-d55d1cdd8075 -d cciss,2", /dev/disk/by-uuid/6f5a533b-ac42-4151-b467-d55d1cdd8075 -d cciss,3"
        </list>

Can you check that?

mikaku commented 3 months ago

The problem here is that the character colon has an special meaning (a separator) when creating the graph with RRDtool.

Since this is something I'm not sure I can fix from Monitorix, I'd recommend you to avoid using it as much as possible.

ostasevych commented 3 months ago

Can that be fixed by just extending the list if separators?

The thing is that it is much better to use one approach to indicate drives, in your case by-path.

mikaku commented 3 months ago

Can that be fixed by just extending the list if separators?

Monitorix uses the Config::General Perl module, which accepts the option -SplitPolicy, but Monitorix do not uses the option -SplitDelimiter which is only used when you specify the value custom in the -SplitPolicy option.

You might want to modify your /usr/bin/monitorix and monitorix.cgi files in lines 519, 591, and 258, 271 respectively, by adding the option -SplitDelimiter with a regular expression that could fit your case.

Let me know if that worked for you and, if so, I'll include the custom value in the -s option of Monitorix.