influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.63k stars 5.58k forks source link

Generate parsing error for smart field value "<not available>" #13926

Closed sdalu closed 1 year ago

sdalu commented 1 year ago

Relevant telegraf.conf

[[inputs.smart]]
  path_smartctl     = "/usr/local/sbin/smartctl"
  use_sudo      = true
  nocheck       = "standby"
  tagexclude        = [ "capacity", "enabled" ]

Logs from Telegraf

2023-09-14T18:07:40Z E! [inputs.smart] Error in plugin: error parsing Temperature_Celsius: "<not available>": expected integer

System info

Telegraf 1.28.0 FreeBSD

Docker

No response

Steps to reproduce

  1. run the configuration

Expected behavior

Don't generate error on Temperature_Celsius with value <not available>, just ignore it

Actual behavior

Error in log file

Additional info

No response

powersj commented 1 year ago

Hi,

Looking at the plugin, in most cases if we run across an error parsing something we ignore the error and continue on. However in one case, specifically NVMe attributes we produce an error and skip adding the fields.

Can you confirm that you are seeing this with a NVMe device?

I am hesitant to remove the error, but at the same time, there is not much you can do to fix that, so rather than fail and not produce any metrics, I have put up #13927 which will continue on and let the fields that do parse correctly return.

Can you please download the artifacts found there in 20-30mins and verify that this resolves the situation for you?

Thanks!

sdalu commented 1 year ago

In my case it is more due to an iSCSI device:

smartctl 7.4 2023-08-01 r5530 [FreeBSD 13.2-RELEASE-p2 arm64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               FREEBSD
Product:              CTLDISK
Revision:             0001
Compliance:           SPC-5
User Capacity:        256,624,295,936 bytes [256 GB]
Logical block size:   512 bytes
Physical block size:  8192 bytes
LU is thin provisioned, LBPRZ=1
Rotation Rate:        Solid State Device
Logical Unit id:      iqn.1999-09.com.sdalu:brain,lun,0
Serial number:        sd-replicate
Device type:          disk
Transport protocol:   iSCSI
Local Time is:        Thu Sep 14 21:01:57 2023 CEST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     <not available>
Drive Trip Temperature:        <not available>

Elements in grown defect list: 0

Error Counter logging not supported

Device does not support Self Test logging

Can you please download the artifacts found there in 20-30mins and verify that this resolves the situation for you?

I have no idea how it works, I'm more an end user than a go programmer

powersj commented 1 year ago

Here is the artifacts for FreeBSD amd64: https://output.circle-artifacts.com/output/job/bbd2a845-e10a-4a40-8c72-681617d14889/artifacts/0/build/dist/telegraf-1.29.0~8eb84e7e_freebsd_amd64.tar.gz

If you could download that and try it out I would appreciate it.

sdalu commented 1 year ago

I'm on a Raspberry Pi, can you built it for arm64 ?

powersj commented 1 year ago

Ah we only have an armv7 build from FreeBSD: https://output.circle-artifacts.com/output/job/fc00041e-51df-4ad6-831d-4ad796ca114e/artifacts/0/build/dist/telegraf-1.29.0~8eb84e7e_freebsd_armv7.tar.gz

sdalu commented 1 year ago

Seems fine, no error generated.

sdalu commented 1 year ago

As a side note, I've used it in my full telegraf configuration and it segfault.

2023-09-14T19:50:27Z W! DeprecationWarning: Option "dns_lookup" of plugin "inputs.ntpq" deprecated since version 1.24.0 and will be removed in 2.0.0: add '-n' to 'options' instead to skip DNS lookup

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x118 pc=0x1286e00]

goroutine 61 [running]:
github.com/shirou/gopsutil/v3/process.(*Process).createTimeWithContext(0x53aa6cf0, {0x72d7454, 0xaef0488})
    /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.23.6/process/process_freebsd.go:123 +0x50
github.com/shirou/gopsutil/v3/process.(*Process).CreateTimeWithContext(0x53aa6cf0, {0x72d7454, 0xaef0488})
    /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.23.6/process/process.go:303 +0x74
github.com/shirou/gopsutil/v3/process.NewProcessWithContext({0x72d7454, 0xaef0488}, 0x4cb6)
    /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.23.6/process/process.go:211 +0x78
github.com/shirou/gopsutil/v3/process.NewProcess(...)
    /go/pkg/mod/github.com/shirou/gopsutil/v3@v3.23.6/process/process.go:196
github.com/influxdata/telegraf/plugins/inputs/procstat.NewProc(0x4cb6)
    /go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/process.go:49 +0x30
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).updateProcesses(0x53dee400, {0x53bc7ae8, 0x1, 0x2}, 0x539a4ae0, 0x0, 0x539a4a80)
    /go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:305 +0x114
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).Gather(0x53dee400, {0x72ec048, 0x53e7d308})
    /go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:107 +0x254
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x53e7e570, {0x72ec048, 0x53e7d308})
    /go/src/github.com/influxdata/telegraf/models/running_input.go:149 +0x48
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
    /go/src/github.com/influxdata/telegraf/agent/agent.go:575 +0x34
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce in goroutine 88
    /go/src/github.com/influxdata/telegraf/agent/agent.go:574 +0xd8
powersj commented 1 year ago

Seems fine, no error generated.

Thank you for confirming

As a side note, I've used it in my full telegraf configuration and it segfault.

The error comes from the procstat plugin using gopsutil. Does this normally occur? If not I wouldn't worry about it as there are platform differences with the behavior of that plugin and library it uses.

sdalu commented 1 year ago

The error comes from the procstat plugin using gopsutil. Does this normally occur? If not I wouldn't worry about it as there are platform differences with the behavior of that plugin and library it uses.

No such segfault with 1.28.0. But perhaps I should open an issue, as procstat plugin is not generating procstat metrics only procstat_lookup on FreeBSD arm (no such problem on FreeBSD amd64) ?

powersj commented 1 year ago

But perhaps I should open an issue, as procstat plugin is not generating procstat metrics only procstat_lookup on FreeBSD arm (no such problem on FreeBSD amd64) ?

I have not played with FreeBSD and this plugin, but keep in mind that since there is no /proc in FreeBSD I am not sure what metrics we generate. /proc is where the gopsutil library will go look for the metrics.

sdalu commented 1 year ago

But perhaps I should open an issue, as procstat plugin is not generating procstat metrics only procstat_lookup on FreeBSD arm (no such problem on FreeBSD amd64) ?

I have not played with FreeBSD and this plugin, but keep in mind that since there is no /proc in FreeBSD I am not sure what metrics we generate. /proc is where the gopsutil library will go look for the metrics.

Yes but they are generated fine on FreeBSD amd64, but not on FreeBSD arm

powersj commented 1 year ago

Where are you getting this arm64 build?

sdalu commented 1 year ago

Where are you getting this arm64 build?

I installed freebsd 13.2, and I building from port on a RPI4. uname -m is telling me arm64