influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.67k stars 5.59k forks source link

ZFS pool with IO values greater than INT64 #7502

Open mulroony opened 4 years ago

mulroony commented 4 years ago

Relevant telegraf.conf:

# Read metrics of ZFS from arcstats, zfetchstats, vdev_cache_stats, and pools
[[inputs.zfs]]
  ## ZFS kstat path. Ignored on FreeBSD
  ## If not specified, then default is:
  # kstatPath = "/proc/spl/kstat/zfs"

  ## By default, telegraf gather all zfs stats
  ## If not specified, then default is:
  # kstatMetrics = ["arcstats", "zfetchstats", "vdev_cache_stats"]
  ## For Linux, the default is:
  # kstatMetrics = ["abdstats", "arcstats", "dnodestats", "dbufcachestats",
  #   "dmu_tx", "fm", "vdev_mirror_stats", "zfetchstats", "zil"]
  ## By default, don't gather zpool stats
  poolMetrics = true

System info:

kernel-3.10.0-1127.el7.x86_64 redhat-release-server-7.8-2.el7.x86_64 telegraf-1.14.2-1.x86_64 zfs-0.8.3-1.el7.x86_64

Steps to reproduce:

  1. Enable ZFS input module
  2. Enable pool statistics
  3. Have a pool with a statistic > INT64

Expected behavior:

Be able to handle large integers, or catch the error and still collect other stats.

Actual behavior:

[inputs.zfs] Error in plugin: strconv.ParseInt: parsing "15708770957432056996": value out of range

Additional info:

cat /proc/spl/kstat/zfs/zdata/io
64 3 0x00 1 80 1559694066680 2814880137852481
nread nwritten reads writes wtime wlentime wupdate rtime rlentime rupdate wcnt rcnt
1991388487303168 281035540668416 2773546755 3790027449 1369155734128164 15708777337377171995 2814879891741232 1851453894946502 464328081809849506 2814879892282255 0 0
mulroony commented 4 years ago

If I am reading the code correctly I believe ZFS on Linux has an upper limit for these of UInt64

danielnelson commented 4 years ago

This puts us in a bit of a tricky situation. If we switch parsing to uint64 it will cause type conflicts for users with influx_uint_support = true, but users with this enabled are also precisely the ones who would want the correct uint values.

We could parse as uint64 and convert to int64 capping the values, this would be safer. Or we could adopt the policy of allowing int64 -> uint64 updates in field values. The final option would be adding new field keys and duplicating the data.

powersj commented 2 years ago

next steps: add opt-in option for users on zfs plugin to parse as unit, and test with influxdb output with uint support enabled.