mikaku / Monitorix

Monitorix is a free, open source, lightweight system monitoring tool.
https://www.monitorix.org
GNU General Public License v2.0
1.12k stars 167 forks source link

BTRFS support for monitoring #373

Open Adanorm opened 2 years ago

Adanorm commented 2 years ago

Hello,

When you have a BTRFS volume, the first fear is DATA corruption. The command : sudo btrfs device stats /mount/point provides this :
[/dev/sdb].write_io_errs 0 [/dev/sdb].read_io_errs 0 [/dev/sdb].flush_io_errs 0 [/dev/sdb].corruption_errs 0 [/dev/sdb].generation_errs 0 [/dev/sdc].write_io_errs 0 [/dev/sdc].read_io_errs 0 [/dev/sdc].flush_io_errs 0 [/dev/sdc].corruption_errs 0 [/dev/sdc].generation_errs 0 [/dev/sdd].write_io_errs 0 [/dev/sdd].read_io_errs 0 [/dev/sdd].flush_io_errs 0 [/dev/sdd].corruption_errs 0 [/dev/sdd].generation_errs 0 [/dev/sde].write_io_errs 0 [/dev/sde].read_io_errs 0 [/dev/sde].flush_io_errs 0 [/dev/sde].corruption_errs 0 [/dev/sde].generation_errs 0

Every five value for each disk has to be seen. (need a list to list all disk device)

The command : sudo btrfs device usage /mount/point provides this : /dev/sdb, ID: 1 Device size: 5.46TiB Device slack: 0.00B Data,RAID1: 3.92TiB Metadata,RAID1: 5.00GiB System,RAID1: 32.00MiB Unallocated: 1.53TiB

/dev/sdc, ID: 3 Device size: 1.82TiB Device slack: 0.00B Data,RAID1: 407.00GiB Metadata,RAID1: 1.00GiB Unallocated: 1.42TiB

/dev/sdd, ID: 4 Device size: 2.73TiB Device slack: 0.00B Data,RAID1: 1.31TiB Unallocated: 1.42TiB

/dev/sde, ID: 5 Device size: 3.64TiB Device slack: 0.00B Data,RAID1: 2.21TiB Metadata,RAID1: 4.00GiB System,RAID1: 32.00MiB Unallocated: 1.42TiB

Ignore ID, it change everytime you change a dead disk. Device size and Data, Metadata, System ... is interesting to keep an eye on.

need a parameter maybe to say witch redondancy mode is used.

For me it's the main things. Maybe someone will like something else.

mikaku commented 2 years ago

I'll install a Fedora 35 which comes with BTRFS by default and I'll let you know. Thanks!

WarmChocolateCake commented 2 years ago

I've just stumbled onto Monitorix and noticed that there was a ZFS section, so I was looking for the btrfs section ;)

I've setup btrfs on an Arch Linux NAS, using a Fedora 36 VM to play with btrfs and find it's foibles

Note that there's an issue with duand df not representing the "correct" space usage because of the variables with compression / dedupe / snapshots / etc, so.... I guess we'd need to keep those in the btrfs section, rather than fs.pm?

Great application btw :)

mikaku commented 2 years ago

I've just stumbled onto Monitorix and noticed that there was a ZFS section, so I was looking for the btrfs section ;)

Yeah, sorry. I've not got enough time yet to work on this. I installed F35 with BTRFS some weeks ago and I saw that btrfs tools show a lot of information. I hope I can resume this work in the next weeks.

Note that there's an issue with du and df not representing the "correct" space usage because of the variables with compression / dedupe / snapshots / etc, so.... I guess we'd need to keep those in the btrfs section, rather than fs.pm?

Didn't know that. Anyway, fs.pm will continue rely on df command even if there are BTRFS mount points not showing accurate information. Once we have the btrfs.pm module, people will have to rely on this until these issues in du and df get solved.

Great application btw :)

Thank you very much. Glad to know you are enjoying it.

Adanorm commented 2 years ago

Hello! Good to know that BTRFS is coming :) I switched my BTRFS on RAID5 for data and Raid1 for metatada, I think the result of the usage command will be important for you, because RAID5 changes many things! I will provide the screenshot soon.

WarmChocolateCake commented 2 years ago

Yeah, sorry. I've not got enough time yet to work on this.

No worries, Life happens ;)

I read with interest #295 - so, I'd presumed that this would be a new feature and might go in to that list?

Just to clarify, yep, du and df are fine at the moment.... I can see if the array is empty / ok / full.... that's good enough for me at the moment.

because RAID5 changes many things

Oh yes :)

btrfs has it's own du & df commands (btrfs-filesystem(8)) but the output's not comparable to the traditional commands, so if there's a plan to combine different filesystems into the same graph/plot, then I'd recommend something like the extra_args parameter; perhaps a command_override parameter? :)

I think the biggest point (for the end-user) to understand is that a volume and sub-volume report the same disk usage of the parent volume (at the moment), so there's no point / no way to see how large a subvolume is.

mikaku commented 2 years ago

Sorry for the late reply.

I read with interest https://github.com/mikaku/Monitorix/issues/295 - so, I'd presumed that this would be a new feature and might go in to that list?

295 has a lot of wishes :smiley: and I'm not sure if time will permit.

I'd like to include the btrfs.pm module in the next Monitorix version.

btrfs has it's own du & df commands (btrfs-filesystem(8)) but the output's not comparable to the traditional commands, so if there's a plan to combine different filesystems into the same graph/plot, then I'd recommend something like the extra_args parameter; perhaps a command_override parameter? :)

Yes, once we have btrfs.pm module, people should rely on it to have an accurate information of filesystem usage. Of course, anyone can include a BTRFS into the list in fs.pm but he will get the results of the df command (for good and for bad).

I'm still struggling with the meaning of this output:

# btrfs filesystem df /
Data, single: total=2.73GiB, used=879.05MiB
System, DUP: total=8.00MiB, used=16.00KiB
Metadata, DUP: total=256.00MiB, used=55.77MiB
GlobalReserve, single: total=4.75MiB, used=0.00B

Anyone can clarify this and which values are important here?

Adanorm commented 2 years ago

2022-11-10_185302

Here mine

the sum of the "total" gives you the sum of the pre-allocated data. Interesting but not enought. The sum of used is, obviously, the real data usage.

I don't know if you are aware of how btrfs works, I assume no, forgive me if it's not the case. BRTFS is a filesystem that allow you to write "blocks" of data, rewrite another block when something changed. Create checkpoint, revert to previous checkpoint etc ... It also allow you to have a filesystem extended in more than 1 disk. It allow some kind of RAID. Data and metadata are managed separately.

Adanorm commented 2 years ago

So in your case. You have certainly a single disque, with at least 2.8 Gb of capacity. 879 Mb are used for storage 16 kb of system data (BTRFS system data for the volume) are written numerous time (DUP) (in case of write error) 55 Mb of data are metadata of your data (adresses of the block, checkpoints infos etc etc ...) written too multiple time in DUP.

Mine is a RAID5 storage (N - 1 disk capacity), using 6 Tb. 400 kb of system data are stored on a RAID1 style (2 copies on 2 different disk) 6,63 Gb of metadata avec stored on a RAID1 style too. GlobalReserve is quite a nonsense in my case

Adanorm commented 2 years ago

2022-11-10_190645

sudo btrfs device usage /mount/point

Gives more clues ! There is 5 drives ! 10 TB, 2x6 TB, 4TB, 3TB. The DATA are (currently) writen on all 5 drives equaly (4 capacity + 1 redondancy) as 5x1,51 Tb (6 Tb used, it's OK !) When the 3 TB drive will be full, the following stripe will provide 1 Tb (up to 4 Tb) of 4x1Tb (3 capacity + 1 redondancy) When the 4 TB drive will be full, the following stripe will provide 2 TB (up to 6 TB) of 3x2TB (2 capacity + 1 redondancy) The last 4 TB of the 10 TB drive will not be usable for the moment. (will have to change 2 drives in the future, or add 2 more 10TB+ drives in the array)

Adanorm commented 2 years ago

I don't have a screenshot of this, but BTRFS can generate multiple "Data" rows (stripes) that can be added to get the read storage used

Adanorm commented 2 years ago

Tell me if you need more infos ! I'm very interested in have this module on next release :)

Adanorm commented 2 years ago

2022-11-10_191807 my df

Adanorm commented 2 years ago

2022-11-10_192027 Monitorix result (accurate)

mikaku commented 2 years ago

Thanks for your information, but still I'm not sure which values should be important to appear on graphs and how they relate each other.

I have the following stats in my freshly installed Fedora 36:

# btrfs filesystem df /
Data, single: total=2.73GiB, used=1.09GiB
System, DUP: total=8.00MiB, used=16.00KiB
Metadata, DUP: total=256.00MiB, used=55.95MiB
GlobalReserve, single: total=4.75MiB, used=0.00B
# btrfs device usage /
/dev/vda2, ID: 1
   Device size:             9.00GiB
   Device slack:              0.00B
   Data,single:             2.73GiB
   Metadata,DUP:          512.00MiB
   System,DUP:             16.00MiB
   Unallocated:             5.76GiB
# df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda2       9.0G  1.3G  7.4G  14% /

I would discard using the command btrfs device usage because after seeing your stats, it looks like it shows different amount of information based on the number of disks. It would complicate the parsing procedure.

I would prefer to focus on the command btrfs filesystem df since it seems that it always shows a more consistent information like df does.

So, the question is: do you think that would make sense to put each line in the output of btrfs filesystem df on a different graph?

Something like this:

+------------------------+
|   Data (single)        |
|                        |
|  ---- total 2.75GB     |
|  ---- used  1.09GB     |
+------------------------+
|   System (DUP)         |
|                        |
|  ---- total 8.00MB     |
|  ---- used 16.00KB     |
+------------------------+
|   Metadara (DUP)       |
|                        |
|  ---- total 256.00MB   |
|  ---- used   55.95MB   |
+------------------------+
| GlobalReserve (single) |
|                        |
|  ---- total   4.75MB   |
|  ---- used    0.00B    |
+------------------------+

Other interesting information that would be on separate (smaller) graphs is:

# btrfs device stats /
[/dev/vda2].write_io_errs    0
[/dev/vda2].read_io_errs     0
[/dev/vda2].flush_io_errs    0
[/dev/vda2].corruption_errs  0
[/dev/vda2].generation_errs  0
Adanorm commented 2 years ago

I think 3 graphs :

Could be cool to get all possible values of the storage type. single, DUP, RAID1, RAID5, RAID6 etc ...

I'm not using this feature, but some pleople maybe would enjoy a monitoring per subvolume. (quotas can be set, monitoring usage / quota)

mikaku commented 2 years ago

Per disk usage from btrfs device usage (here Device Size 9 GB minus Unallocated 5.76 = 3.24) Per disk errors from btrfs device stats

Since I have only a single disk I cannot test all the possibilities. So, it is possible to use the commands btrfs device usage and btrfs device stats to get information of each device individually? This way it would be based on an option like list = /dev/sda, /dev/sdb, ....

Could be cool to get all possible values of the storage type. single, DUP, RAID1, RAID5, RAID6 etc ...

Where should appear this information?

I'm not using this feature, but some pleople maybe would enjoy a monitoring per subvolume. (quotas can be set, monitoring usage / quota)

An example?

Adanorm commented 2 years ago

Yes sir, look at all my screenshots ;) yes for the list, it's the better way the storage type is returned by btrfs device usage too, on the data, system, metadata lines.

For the subvolume feature, It was a proposal for advanced users. But in my case, I don't use it. So I will be unable to help for this. Could be written later on a "btrfs_subvolume.pm" module

mikaku commented 2 years ago

the storage type is returned by btrfs device usage too, on the data, system, metadata lines.

Yes I know, I meant, I wanted to know how these values should be represented in graphs.

mikaku commented 2 years ago

Also, I think that the option list should accept only mountpoints, not devices, since is a common value accepted in the commands btrfs device usage and btrfs device stats.

Something like this:

list = /, /mnt/data, /mnt/backup

Do you agree?

Adanorm commented 2 years ago

In a time based graph has no sense yes. Just a list I think, could have multiple "Data" stripes, keep in mind.

If you are able to detect device by the mountpoint yes. But for the admin watching at the graph, it's important to see there is a dying disk and identify it.

Multiple BTRFS volumes can be attached to the same system of course.

mikaku commented 2 years ago

If you are able to detect device by the mountpoint yes.

Why do I need to detect the device by the mountpoint? Why is necessary the device name? I can get stats without the device name.

Adanorm commented 2 years ago

the mountpoint is mandatory in the command, so it's normal to declare it. But, when you have multiple disk, you need to provide a list of device to make "data usage per device" "errors per device" graphs. Like the "Disk drive temperatures and health" graph.

Adanorm commented 2 years ago

There is 2 way to see brtfs. Global overview : global size, remaining size etc ... (already covered by "Filesystem usage and I/O activity") Per Device view : per disk size, disk errors etc ...

mikaku commented 2 years ago

But, when you have multiple disk, you need to provide a list of device to make "data usage per device" "errors per device" graphs.

I don't understand it, sorry.

Data usage and errors can be obtained with the commands btrfs device usage and btrfs device stats and both accept mountpoint. Again, why is device name necessary?

Like the "Disk drive temperatures and health" graph.

Temperatures? health? this is not related to BTRFS, you have this information in another graph.

Adanorm commented 2 years ago

The command line require the mountpoint for the command line But, the device mane is usefull to build the graph.

Let me time to draw it. later today. I think you will understand my point of view with a draw.

No no I don't need temperature, health from smart I already have.

BRTFS provides me btrfs device stats /mount/point with real CRC errors, when a drive starts to make CRC errors, it's time to remove it from the array to avoid data corruption.

mikaku commented 2 years ago

Let me time to draw it. later today. I think you will understand my point of view with a draw.

That would very useful.

So far, I have this:

<btrfs>
    list = /, /mnt/data1, /mnt/data2
</btrfs>

Still, I don't know if device names are useful. This configuration would create the following three graphs:

+--------------------------------+--------------------------------+
| Title: device usage /          | Title: global usage /          |
|                                |                                |
|                                |  Used (= output of df command) |
| (Device Size - Unallocated)    +--------------------------------+
|                                | Title: errors on /             |
|                                |  - write, read, flush_io,      |
|                                |  - corruption and generation   |
+--------------------------------+--------------------------------+
+--------------------------------+--------------------------------+
| Title: device usage /mnt/data1 | Title: global usage /mnt/data1 |
|                                |                                |
|                                |  Used (= output of df command) |
| (Device Size - Unallocated)    +--------------------------------+
|                                | Title: errors on /mnt/data1    |
|                                |  - write, read, flush_io,      |
|                                |  - corruption and generation   |
+--------------------------------+--------------------------------+
+--------------------------------+--------------------------------+
| Title: device usage /mnt/data2 | Title: global usage /mnt/data2 |
|                                |                                |
|                                |  Used (= output of df command) |
| (Device Size - Unallocated)    +--------------------------------+
|                                | Title: errors on /mnt/data2    |
|                                |  - write, read, flush_io,      |
|                                |  - corruption and generation   |
+--------------------------------+--------------------------------+

Thoughts?

WarmChocolateCake commented 2 years ago

Just catching up with this...

So, the device usage graph (main pane above), would show the usage on different physical disks? That would be interesting, but I'm not sure it adds real value.

I mean... if I had a large change in data, but the total storage was still "half full"... do I care what is on each disk?? Not really...

But do I care if a disk is starting to fail? Oh yes 😉

Sorry I have been away for a while, but I think this was a good start:

Something like this:

+------------------------+
|   Data (single)        |
|                        |
|  ---- total 2.75GB     |
|  ---- used  1.09GB     |
+------------------------+
|   System (DUP)         |
|                        |
|  ---- total 8.00MB     |
|  ---- used 16.00KB     |
+------------------------+
|   Metadara (DUP)       |
|                        |
|  ---- total 256.00MB   |
|  ---- used   55.95MB   |
+------------------------+
| GlobalReserve (single) |
|                        |
|  ---- total   4.75MB   |
|  ---- used    0.00B    |
+------------------------+

Other interesting information that would be on separate (smaller) graphs is:

# btrfs device stats /
[/dev/vda2].write_io_errs    0
[/dev/vda2].read_io_errs     0
[/dev/vda2].flush_io_errs    0
[/dev/vda2].corruption_errs  0
[/dev/vda2].generation_errs  0

But I would tweak that top graph to be something like:

        ┌───────────────────────────────────────────────────────────────────────────┐
        │ Mount: /srv/data         │   Errors:                                      |
        │                          │                                                |
        │                          │                                                |
        │  ---- Total: 123GB       │                                                |
        │  ---- Used:   12GB       │                                                |
        ├──────────────────────────┤                                                |
        │ Mount: /srv/data/subvol1 │                                                |
        │                          │                                                |
        │                          │                                                |
        │                          │   --- Write:  /dev/sda   /dev/sdb  /dev/sdc    │
        │                          │   --- Read: /dev/sda   /dev/sdb  /dev/sdc      │
        │                          │  --- Flush: /dev/sda   /dev/sdb  /dev/sdc      │
        │                          │  --- Cor: /dev/sda   /dev/sdb  /dev/sdc        │
        │                          │  --- Gen: /dev/sda   /dev/sdb  /dev/sdc        │
        │  ---- Total: 1.2GB       │                                                |
        │  ---- Used:  123MB       │                                                |
        ├──────────────────────────┤                                                |
        │                          │                                                |
        │ Mount: /srv/data/subvol2 │                                                |
        │                          │                                                |
        │                          │                                                |
        │                          │                                                |
        │  ---- Total: 4.2GB       │                                                |
        │  ---- Used:  2.9GB       │                                                |
        └──────────────────────────┤────────────────────────────────────────────────┘

(for example - but my drawing is badly scaled / proportioned 😄 )

That might be what @mikaku is showing above anyway... I might have just misunderstood...

mikaku commented 2 years ago

That might be what @mikaku is showing above anyway... I might have just misunderstood...

Your drawing is very similar to mine.

In my drawing I tried to isolate each device/filesystem with a set of graphs (3 per device or filesystem). So basically, each device/filesystem has its own error graph. I think this way is scalable to an unlimited number of devices/filesystems.