nicolargo / glances

Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems.
http://nicolargo.github.io/glances/
Other
25.81k stars 1.47k forks source link

Glances shows mdadm RAID0 as degraded when chunksize=128k and the array isn't degraded. #1299

Closed piotrp88 closed 5 years ago

piotrp88 commented 5 years ago

Glances v2.11.1 with psutil v5.4.3

I'm observing a very strange behaviour after creating a new and clean RAID0 array using mdadm and six SSDs. Glances shows the following output when the chunk size of the array is 128k:

RAID disks   Used Avail
RAID0 md125  128k     2
└─ Degraded mode
   └─ chunks

I tried with other values, like 64k, 256k and 512k and none of them displayed as degraded. For example:

RAID disks   Used Avail
RAID0 md125   64k     2

I'm sure that the array isn't degraded for two reasons: first, it's a clean and empty array. Second, the output of cat /proc/mdstat is:

Personalities : [raid0] [raid1] [linear] [multipath] [raid6] [raid5] [raid4] [raid10] 
md125 : active raid0 sdh[5] sdg[4] sdf[3] sde[2] sdc[1] sdb[0]
      2306837760 blocks super 1.2 128k chunks

and the output of mdadm --detail /dev/md125 is:

/dev/md125:
        Version : 1.2
  Creation Time : Fri Aug  3 17:48:23 2018
     Raid Level : raid0
     Array Size : 2306837760 (2199.97 GiB 2362.20 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

    Update Time : Fri Aug  3 17:48:23 2018
          State : clean 
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 128K

           Name : calculon:125  (local to host calculon)
           UUID : 8a7e941e:6c616c29:52711695:d75f5aa5
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc
       2       8       64        2      active sync   /dev/sde
       3       8       80        3      active sync   /dev/sdf
       4       8       96        4      active sync   /dev/sdg
       5       8      112        5      active sync   /dev/sdh
piotrp88 commented 5 years ago

Probably I found the issue. In the code, you check the number of used disks (x) against the number of available disks (y). If x < y then the array is degraded, but in this case x equals the chunk size ("128k") and not the number of used disks. Also, for some reasons y equals 2 and not 6. To resume, in my case x equals the chunk size and y equals 2. You should check how Plugin.stats[array]['used'] and Plugin.stats[array]['available'] are obtained.

nicolargo commented 5 years ago

Hi @piotrp88 ,

Glances uses the Mdstat Python lib to parse the /etc/mdstat file.

When i try to parse your mdstat file:

Personalities : [raid0] [raid1] [linear] [multipath] [raid6] [raid5] [raid4] [raid10] 
md125 : active raid0 sdh[5] sdg[4] sdf[3] sde[2] sdc[1] sdb[0]
      2306837760 blocks super 1.2 128k chunks

I have the following error:

Traceback (most recent call last):
  File "./unitest.py", line 106, in test_010
    mdstat_test = MdStat('./tests/mdstat.%s' % i)
  File "/home/nicolargo/Dropbox/dev/pymdstat/pymdstat/pymdstat.py", line 21, in __init__
    self.stats = self.load()
  File "/home/nicolargo/Dropbox/dev/pymdstat/pymdstat/pymdstat.py", line 91, in load
    ret['arrays'] = self.get_arrays(lines[1:-1], ret['personalities'])
  File "/home/nicolargo/Dropbox/dev/pymdstat/pymdstat/pymdstat.py", line 121, in get_arrays
    ret[md_device].update(self.get_md_status(lines[i]))
IndexError: list index out of range

Are you sure that this is exactly the output of your mdstat file ? I did not see any comma before the chunk size as you can see in some example: https://raid.wiki.kernel.org/index.php/Mdstat

Can you copy:paste the result of the following command line:

# cat /proc/mdstat

# python
>>> import pymdstat
>>> raid = pymdstat.MdStat()
>>> raid

Thanks !

nung0707 commented 5 years ago

/ etc / mdstat

piotrp88 commented 5 years ago

I have different RAID arrays on my server. The output from pymdstat seems the same as the output of cat /proc/mdstat.

# cat /proc/mdstat
Personalities : [raid0] [raid1] [linear] [multipath] [raid6] [raid5] [raid4] [raid10] 
md129 : active raid1 sda3[0] sdd3[1]
      15825920 blocks super 1.2 [2/2] [UU]

md128 : active raid1 sda2[0] sdd2[1]
      126887936 blocks super 1.2 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md125 : active raid0 sdg[4] sdc[1] sdf[3] sdh[5] sde[2] sdb[0]
      2306837760 blocks super 1.2 128k chunks

md126 : active raid0 nvme0n1[0] nvme1n1[1]
      999948288 blocks super 1.2 3072k chunks

unused devices: <none>
# python
Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymdstat
>>> raid = pymdstat.MdStat()
>>> raid
Personalities : [raid0] [raid1] [linear] [multipath] [raid6] [raid5] [raid4] [raid10] 
md129 : active raid1 sda3[0] sdd3[1]
      15825920 blocks super 1.2 [2/2] [UU]

md128 : active raid1 sda2[0] sdd2[1]
      126887936 blocks super 1.2 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md125 : active raid0 sdg[4] sdc[1] sdf[3] sdh[5] sde[2] sdb[0]
      2306837760 blocks super 1.2 128k chunks

md126 : active raid0 nvme0n1[0] nvme1n1[1]
      999948288 blocks super 1.2 3072k chunks

unused devices: <none>

>>> 
nicolargo commented 5 years ago

Corrected in the Glances DEVELOP branch.

image

Thanks for the report !

piotrp88 commented 5 years ago

But what does "Avail" and "Used" mean? In my case I must interpret it reports 2 disks available for all my arrays, but there are a total of 6 used disks and 0 available disks for md125 and 2 used disks and 0 available disks for md126. I think the reported info is wrong, or probably I don't quite understand it.

nicolargo commented 5 years ago

Ok copy that. I just push another patch on the DEVELOP branch:

image

For a RAID0, it display the number of used disks. Available disk has no sense in the RAID0 case because if one disk is down, all the RAID0 will be in failure.

Can you simulate a failure on one disk and show me the result of the Glances UI in this case ?

Nicolas

piotrp88 commented 5 years ago

I can't because this is a production server, I'm sorry.

piotrp88 commented 5 years ago

I can confirm that version 3.0.1 solves the bug.