Closed Rajat0312 closed 1 year ago
@Rajat0312 are you sure you provided whole telegraf.conf
?
I don't see prometheus_client
configured in telegraf.conf
but (basing on your logs) it was enabed.
Also, did you provide whole output from log?
[[outputs.prometheus_client]] listen = ":9091"
path = "/metrics"
# Expiration interval for each metric. 0 == no expiration
expiration_interval = "60s"
# Send string metrics as Prometheus labels.
# Unless set to false all string metrics will be sent as labels.
string_as_label = true
yeah . It was also there and logs I have provided is whole
I'm not sure if provided telegraf.conf
is the same which is used by Telegraf you run.
E! Unsupported logtarget: stdout, using stderr
in your log which should be logged only when logtarget
parameter is set to stdout
in agent
section. You don't have this parameter configured at all.debug = true
is used in agent
section but there are no debug logs in your log file. And they should be.Can you run cat /etc/telegraf/telegraf.conf
inside your container and provide its output?
/ # cat /etc/telegraf/telegraf.conf [agent] collection_jitter = "0s" debug = false flush_interval = "10s" flush_jitter = "0s" hostname = "$HOSTNAME" interval = "10s" logfile = "" metric_batch_size = 1000 metric_buffer_limit = 10000 omit_hostname = false precision = "" quiet = false round_interval = true logtarget = "stdout" [[processors.enum]] [[processors.enum.mapping]] dest = "status_code" field = "status" [processors.enum.mapping.value_mappings] critical = 3 healthy = 1 problem = 2
[[inputs.cisco_telemetry_mdt]] transport = "tcp" service_address = ":57000" # instruct telegraf to listen on port 57000 for TCP telemetry max_msg_size = 4294967295 embedded_tags = ["Cisco-IOS-XR-qos-ma-oper:qos/interface-table/interface/input/service-policy-names/service-policy-instance/statistics/class-stats/class-name","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/interface/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/interface/encapsulation"] [[inputs.cisco_telemetry_mdt]] transport = "grpc" service_address = ":57001" # instruct telegraf to listen on port 57000 for TCP telemetry max_msg_size = 4294967295 embedded_tags = ["Cisco-IOS-XR-qos-ma-oper:qos/interface-table/interface/input/service-policy-names/service-policy-instance/statistics/class-stats/class-name","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/interface/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/interface/encapsulation"]
[[outputs.file]] files = ["stdout"] # files to write to, "stdout" is a specially handled file data_format = "json"
[[outputs.kafka]] brokers = ["localhost:9092"] topic = "test" ssl_cert = "/etc/telegraf/kafka.cert" insecure_skip_verify = true data_format = "json"
[[outputs.prometheus_client]] listen = ":9091"
path = "/metrics"
expiration_interval = "60s"
string_as_label = true
@zak-pawel
Just theoretically (don't have environment to reproduce, and it is hard to say without full stack trace), there can be a problem here:
https://github.com/influxdata/telegraf/blob/master/plugins/inputs/cisco_telemetry_mdt/cisco_telemetry_mdt.go#L416-L432
It had been almost fixed in first commit in this PR https://github.com/influxdata/telegraf/pull/12637 but in second commit it has been reverted ;) Problem with not checking length of subfield.Fields
before calling subfield.Fields[0]
Or here:
https://github.com/influxdata/telegraf/blob/master/plugins/inputs/cisco_telemetry_mdt/cisco_telemetry_mdt.go#L572-L579
Problem with not checking length of field.Fields[0].Fields[0].Fields[0].Fields
before calling field.Fields[0].Fields[0].Fields[0].Fields[0][0]
Or here:
https://github.com/influxdata/telegraf/blob/master/plugins/inputs/cisco_telemetry_mdt/cisco_telemetry_mdt.go#L639-L668
where a lot of Fields
are not checked for length :)
@powersj What do you think?
Hi,
@zak-pawel thanks for taking a look!
panic: runtime error: index out of range [0] with length 0
@Rajat0312 - whenever there is a panic there are additional log messages below it that say where the panic occurred. I would prefer not to guess and see what that message actually said please. It should look something like:
❯ ./telegraf --config config.toml --once
2023-08-18T12:27:03Z I! Loading config: config.toml
2023-08-18T12:27:03Z I! Starting Telegraf 1.28.0-168d9272
2023-08-18T12:27:03Z I! Available plugins: 239 inputs, 9 aggregators, 28 processors, 24 parsers, 59 outputs, 5 secret-stores
2023-08-18T12:27:03Z I! Loaded inputs: cisco_telemetry_mdt
2023-08-18T12:27:03Z I! Loaded aggregators:
2023-08-18T12:27:03Z I! Loaded processors:
2023-08-18T12:27:03Z I! Loaded secretstores:
2023-08-18T12:27:03Z I! Loaded outputs: file
2023-08-18T12:27:03Z I! Tags enabled:
2023-08-18T12:27:03Z D! [agent] Initializing plugins
2023-08-18T12:27:03Z D! [agent] Connecting outputs
2023-08-18T12:27:03Z D! [agent] Attempting connection to [outputs.file]
2023-08-18T12:27:03Z D! [agent] Successfully connected to outputs.file
2023-08-18T12:27:03Z D! [agent] Starting service inputs
panic: runtime error: index out of range [0] with length 0
goroutine 1 [running]:
github.com/influxdata/telegraf/plugins/inputs/cisco_telemetry_mdt.(*CiscoTelemetryMDT).Start(0xc000158510?, {0x7f8f7e8?, 0xc001d2cb60?})
/home/powersj/telegraf/plugins/inputs/cisco_telemetry_mdt/cisco_telemetry_mdt.go:104 +0x12
github.com/influxdata/telegraf/agent.(*Agent).testStartInputs(0xc000158510?, 0xc00100dbc0, {0xc001e30600, 0x1, 0x1?})
/home/powersj/telegraf/agent/agent.go:446 +0x1d0
github.com/influxdata/telegraf/agent.(*Agent).runOnce(0xc00012a9e0, {0x7fe8c28?, 0xc001e84d70}, 0x0)
/home/powersj/telegraf/agent/agent.go:1124 +0x3b8
github.com/influxdata/telegraf/agent.(*Agent).Once(0xc00012a9e0, {0x7fe8c28?, 0xc001e84d70?}, 0xc?)
/home/powersj/telegraf/agent/agent.go:1065 +0x26
main.(*Telegraf).runAgent(0xc001eb2780, {0x7fe8c28, 0xc001e84d70}, 0x10?, 0x40?)
/home/powersj/telegraf/cmd/telegraf/telegraf.go:346 +0x17c5
main.(*Telegraf).reloadLoop(0xc001eb2780)
/home/powersj/telegraf/cmd/telegraf/telegraf.go:166 +0x25b
main.(*Telegraf).Run(0x0?)
/home/powersj/telegraf/cmd/telegraf/telegraf_posix.go:14 +0x52
main.runApp.func1(0xc001e7ef80)
/home/powersj/telegraf/cmd/telegraf/main.go:246 +0xac9
github.com/urfave/cli/v2.(*Command).Run(0xc001e4ef20, 0xc001e7ef80, {0xc0001ba140, 0x4, 0x4})
/home/powersj/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/command.go:274 +0x998
github.com/urfave/cli/v2.(*App).RunContext(0xc0010fef00, {0x7fe8a30?, 0xc4d9f40}, {0xc0001ba140, 0x4, 0x4})
/home/powersj/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:332 +0x5b7
github.com/urfave/cli/v2.(*App).Run(...)
/home/powersj/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:309
main.runApp({0xc0001ba140, 0x4, 0x4}, {0x7f71900?, 0xc0001b4048}, {0x7f8dd40?, 0xc001e30280}, {0x7f8dd68?, 0xc001e4e160}, {0x7fe8918, ...})
/home/powersj/telegraf/cmd/telegraf/main.go:368 +0xfb5
main.main()
/home/powersj/telegraf/cmd/telegraf/main.go:378 +0xed
Hi @powersj , I am running telegraf via kubernetes and it is the last log before pod gets terminated. Is there any workaround by which I can get these logs?
Is there any workaround by which I can get these logs?
Based on your logs it happens right away, which is good.
You can log to a file maybe and see if it collects it You can run it by hand in a container or in the environment
In any case I really want to see the full trace please.
I am trying to get whole logs till can you suggest any workaround to run this new image 1.27.3
@powersj @zak-pawel , I have tried to get full stacktrace but its only showing logs till panic runtime error , I have also tried to get logs in a file but I am getting below error in it , please look into this part how I can use this new image for tcp.
2023-08-21T06:01:01Z I! Using config file: /etc/telegraf/telegraf.conf 2023-08-21T06:01:01Z E! [telegraf] Error running agent: Error loading config file /etc/telegraf/telegraf.conf: line 1: configuration specified the fields ["loglevel"], but they weren't used
Config= apiVersion: v1 data: telegraf.conf: | [agent] logfile = "/path/to/telegraf.log" loglevel = "error" collection_jitter = "0s" debug = true flush_interval = "10s" flush_jitter = "0s" hostname = "$HOSTNAME" interval = "10s" metric_batch_size = 1000 metric_buffer_limit = 10000 omit_hostname = false precision = "" quiet = false round_interval = true [[processors.enum]] [[processors.enum.mapping]] dest = "status_code" field = "status" [processors.enum.mapping.value_mappings] critical = 3 healthy = 1 problem = 2
[[inputs.cisco_telemetry_mdt]]
transport = "tcp"
service_address = ":57000" # instruct telegraf to listen on port 57000 for TCP telemetry
max_msg_size = 4294967295
embedded_tags = ["Cisco-IOS-XR-qos-ma-oper:qos/interface-table/interface/input/service-policy-names/service-policy-instance/statistics/class-stats/class-name","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/interface/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/interface/encapsulation"]
[[inputs.cisco_telemetry_mdt]]
transport = "grpc"
service_address = ":57001" # instruct telegraf to listen on port 57000 for TCP telemetry
max_msg_size = 4294967295
embedded_tags = ["Cisco-IOS-XR-qos-ma-oper:qos/interface-table/interface/input/service-policy-names/service-policy-instance/statistics/class-stats/class-name","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/interface/encapsulation","Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface_xr/interface/encapsulation"]
[[outputs.file]]
files = ["stdout"] # files to write to, "stdout" is a specially handled file
data_format = "json"
[[outputs.kafka]]
brokers = ["localhost:9092"]
topic = "test"
ssl_cert = "/etc/telegraf/kafka.cert"
insecure_skip_verify = true
data_format = "json"
# ssl_ca = "ca.pem"
# ssl_key = "elvarx1.key"
[[outputs.prometheus_client]]
listen = ":9091"
# Path to publish the metrics on, defaults to /metrics
path = "/metrics"
# Expiration interval for each metric. 0 == no expiration
expiration_interval = "60s"
# Send string metrics as Prometheus labels.
# Unless set to false all string metrics will be sent as labels.
string_as_label = true
line 1: configuration specified the fields ["loglevel"], but they weren't used
This is not a valid configuration option. Remove loglevel
. Where did you see this suggested or added?
@powersj I have also tried without loglevel but no logfile is created. Then I tried to change in source code. I have just added a if condition in this for loop and it’s working now but I need to confirm that is it right way ? Any other functionality will work properly?
I have also tried without loglevel but no logfile is created. logfile = "/path/to/telegraf.log"
Does this path exist? If you have the ability to build and run a custom telegraf, then do you have the ability to jump into one of these containers and run telegraf by hand? This should not require building a custom version of telegraf. All we are trying to do is get the complete log message.
I have just added a if condition in this for loop and it’s working now but I need to confirm that is it right way ?
You are guessing as to the cause. It might be the case, but without the full trace we cannot be certain and I would rather fix this with certainty and confidence, rather than play guess and check with you.
So, I used different path too like logfile = "/etc/telegraf/telegraf.log" but files are not being created. I also want to check full stack trace. Yah ! I can access telegraf in containers but I can’t do changes there in code.
I am not asking you to do changes to the code. I am asking you to run telegraf by hand:
telegraf --debug --config <your config>
Can you reproduce the issue that way?
/ # telegraf --debug --config /etc/telegraf/telegraf.conf 2023-08-21T14:12:23Z I! Loading config: /etc/telegraf/telegraf.conf 2023-08-21T14:12:23Z W! DeprecationWarning: Option "ssl_cert" of plugin "outputs.kafka" deprecated since version 1.7.0 and will be removed in 2.0.0: use 'tls_cert' instead 2023-08-21T14:12:24Z E! Unable to open /etc/telegraf/databus/telegraf.log (open /etc/telegraf/databus/telegraf.log: read-only file system), using stderr 2023-08-21T14:12:24Z I! Starting Telegraf 1.27.3 2023-08-21T14:12:24Z I! Available plugins: 237 inputs, 9 aggregators, 28 processors, 23 parsers, 59 outputs, 4 secret-stores 2023-08-21T14:12:24Z I! Loaded inputs: cisco_telemetry_mdt (2x) 2023-08-21T14:12:24Z I! Loaded aggregators: 2023-08-21T14:12:24Z I! Loaded processors: enum 2023-08-21T14:12:24Z I! Loaded secretstores: 2023-08-21T14:12:24Z I! Loaded outputs: file kafka prometheus_client 2023-08-21T14:12:24Z I! Tags enabled: host=telegraf 2023-08-21T14:12:24Z W! Deprecated outputs: 0 and 1 options 2023-08-21T14:12:24Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"telegraf", Flush Interval:10s 2023-08-21T14:12:24Z D! [agent] Initializing plugins 2023-08-21T14:12:24Z D! [agent] Connecting outputs 2023-08-21T14:12:24Z D! [agent] Attempting connection to [outputs.kafka] 2023-08-21T14:12:24Z D! [sarama] Initializing new client 2023-08-21T14:12:24Z D! [sarama] client/metadata fetching metadata for all topics from broker localhost:9092 2023-08-21T14:12:24Z D! [sarama] Connected to broker at localhost:9092 (unregistered) 2023-08-21T14:12:24Z D! [sarama] Successfully initialized new client 2023-08-21T14:12:24Z D! [agent] Successfully connected to outputs.kafka 2023-08-21T14:12:24Z D! [agent] Attempting connection to [outputs.prometheus_client] 2023-08-21T14:12:24Z E! [agent] Failed to connect to [outputs.prometheus_client], retrying in 15s, error was "listen tcp :9091: bind: address already in use" 2023-08-21T14:12:39Z D! [sarama] Producer shutting down. 2023-08-21T14:12:39Z D! [sarama] Closing Client 2023-08-21T14:12:39Z D! [sarama] Closed connection to broker localhost:9092 2023-08-21T14:12:39Z E! [telegraf] Error running agent: connecting output outputs.prometheus_client: error connecting to output "outputs.prometheus_client": listen tcp :9091: bind: address already in use
Inside the kubernetes pod can not able to reproduce that way
2023-08-21T14:12:24Z E! Unable to open /etc/telegraf/databus/telegraf.log (open /etc/telegraf/databus/telegraf.log: read-only file system), using stderr to solve this error I used tmp/ path . but in that file also no stacktrace is present
/ # tail -f tmp/telegraf.log 2023-08-22T07:33:06Z D! [sarama] client/brokers registered new broker #0 at localhost:9092 2023-08-22T07:33:06Z D! [sarama] Successfully initialized new client 2023-08-22T07:33:06Z D! [agent] Successfully connected to outputs.kafka 2023-08-22T07:33:06Z D! [agent] Starting service inputs command terminated with exit code 137
Take a look at your error messages. In the first case:
2023-08-21T14:12:39Z E! [telegraf] Error running agent: connecting output outputs.prometheus_client: error connecting to output "outputs.prometheus_client": listen tcp :9091: bind: address already in use
Probably because you had telegraf running already?
command terminated with exit code 137
That is usually out of memory error in a pod.
I have put up https://github.com/influxdata/telegraf/pull/13813 with your suggested fix. I would much rather see an actual trace than do this, but if you could give that a try.
Thanks @powersj , i will try to get this trace
Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Forums or provide additional details in this issue and reqeust that it be re-opened. Thank you!
Relevant telegraf.conf
Logs from Telegraf
System info
Telegraf 1.27.3 alpine 3.18.2
Docker
FROM alpine:3.18
RUN echo 'hosts: files dns' >> /etc/nsswitch.conf RUN apk add --no-cache iputils ca-certificates net-snmp-tools procps lm_sensors tzdata su-exec libcap && \ update-ca-certificates
ENV TELEGRAF_VERSION 1.27.3
RUN ARCH= && \ case "$(apk --print-arch)" in \ x86_64) ARCH='amd64';; \ aarch64) ARCH='arm64';; \ ) echo "Unsupported architecture: $(apk --print-arch)"; exit 1;; \ esac && \ set -ex && \ mkdir ~/.gnupg; \ echo "disable-ipv6" >> ~/.gnupg/dirmngr.conf; \ apk add --no-cache --virtual .build-deps wget gnupg tar && \ for key in \ 9D539D90D3328DC7D6C8D3B9D8FF8E1F7DF8B07E ; \ do \ gpg --keyserver hkp://keyserver.ubuntu.com --recv-keys "$key" ; \ done && \ wget --no-verbose https://dl.influxdata.com/telegraf/releases/telegraf-${TELEGRAF_VERSION}_linux_${ARCH}.tar.gz.asc && \ wget --no-verbose https://dl.influxdata.com/telegraf/releases/telegraf-${TELEGRAF_VERSION}_linux_${ARCH}.tar.gz && \ gpg --batch --verify telegraf-${TELEGRAF_VERSION}linux${ARCH}.tar.gz.asc telegraf-${TELEGRAF_VERSION}linux${ARCH}.tar.gz && \ mkdir -p /usr/src /etc/telegraf && \ tar -C /usr/src -xzf telegraf-${TELEGRAF_VERSION}linux${ARCH}.tar.gz && \ mv /usr/src/telegraf/etc/telegraf/telegraf.conf /etc/telegraf/ && \ mkdir /etc/telegraf/telegraf.d && \ cp -a /usr/src/telegraf/usr/bin/telegraf /usr/bin/ && \ gpgconf --kill all && \ rm -rf .tar.gz* /usr/src /root/.gnupg && \ apk del .build-deps && \ addgroup -S telegraf && \ adduser -S telegraf -G telegraf && \ chown -R telegraf:telegraf /etc/telegraf
EXPOSE 8125/udp 8092/udp 8094
COPY entrypoint.sh /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"] CMD ["telegraf"]
Steps to reproduce
1.start telegraf with tcp and grpc both cisco mdt plugin 2.input data from source -{"fields":{"threshold":346,"speed":327},"name":"cisco","tags":{"host":"telegraf","name":"1","path":"cisco-path","source":"router-1","subscription":"1"},"timestamp":1631191178} 3.it was working in telegraf 1.23.3 ...
Expected behavior
It should be running with no error . Right now it gets down
Actual behavior
It gets down with error panic: runtime error: index out of range [0] with length 0
Additional info
It is running in telegraf 1.23.3 but not in latest versions