matusnovak / prometheus-zfs

Prometheus exporter for (some) ZFS statistics via zpool iostatus and zfs get space
The Unlicense
18 stars 4 forks source link

Error log when zpool device is marked UNAVAIL #7

Open ianneub opened 6 months ago

ianneub commented 6 months ago

I've been using prometheus-zfs for several months now and it has been working perfectly! Love it!

I just had an issue with a hard disk and now I am unable to get zfs stats due to the following error:

Traceback (most recent call last):
  File "/usr/src/./zfsprom.py", line 116, in <module>
    main()
  File "/usr/src/./zfsprom.py", line 105, in main
    collect(metrics)
  File "/usr/src/./zfsprom.py", line 78, in collect
    recursive_children(metrics, pool.name, pool.name, pool.root_vdev)
  File "/usr/src/./zfsprom.py", line 53, in recursive_children
    recursive_children(metrics, pool, source_nested, child)
  File "/usr/src/./zfsprom.py", line 36, in recursive_children
    metrics['status'].labels(**labels).state(child.status)
  File "/usr/local/lib/python3.10/dist-packages/prometheus_client/metrics.py", line 724, in state
    self._value = self._states.index(state)
ValueError: 'UNAVAIL' is not in list

Any thoughts?

Thanks in advance for your help!

Here is the output of zpool status:

  pool: tank
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 15:15:08 with 0 errors on Sun Apr 14 15:39:10 2024
remove: Removal of vdev 1 copied 671G in 1h53m, completed on Sat May 20 12:13:25 2023
        2.81M memory used for removed device mappings
config:

        NAME                                      STATE     READ WRITE CKSUM
        tank                                      DEGRADED     0     0     0
          mirror-0                                DEGRADED     0     0     0
            aabbccdd-a111-1122-3344-aabbccddeeff  ONLINE       0     0     0
            0000000000000000000                   UNAVAIL      0     0     0  was /dev/disk/by-partuuid/aabbccdd-a111-1122-3344-aabbccddeeff
          mirror-2                                ONLINE       0     0     0
            aabbccdd-a111-1122-3344-aabbccddeeff  ONLINE       0     0     0
            aabbccdd-a111-1122-3344-aabbccddeeff  ONLINE       0     0     0
            aabbccdd-a111-1122-3344-aabbccddeeff  ONLINE       0     0     0

errors: No known data errors
matusnovak commented 6 months ago

Hi @ianneub This seems like an easy fix. I will fix it shortly. Thank you for finding this bug.