influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
13.87k stars 5.49k forks source link

Telegraf Generating Orphaned DBus Processes on RHEL Servers #2 #13635

Open elangovanseshan opened 10 months ago

elangovanseshan commented 10 months ago

Relevant telegraf.conf

#Telegraf Configuration
#
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
#
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
#
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
#
# Environment variables can be used anywhere in this config file, simply prepend
# them with $. For strings the variable must be within quotes (ie, "$STR_VAR"),
# for numbers and booleans they should be plain (ie, $INT_VAR, $BOOL_VAR)

# Global tags can be specified here in key="value" format.
[global_tags]
  env = "Production_Linux"
  # dc = "us-east-1" # will tag all metrics with dc=us-east-1
  # rack = "1a"
  ## Environment variables can be used as tags, and throughout the config file
  # user = "$USER"

# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "1m"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 1000

  ## For failed writes, telegraf will cache metric_buffer_limit metrics for each
  ## output, and will flush this buffer on a successful write. Oldest metrics
  ## are dropped first when this buffer fills.
  ## This buffer only fills when writes fail to output plugin(s).
  metric_buffer_limit = 10000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. Maximum flush_interval will be
  ## flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "10s"

  ## By default or when set to "0s", precision will be set to the same
  ## timestamp order as the collection interval, with the maximum being 1s.
  ##   ie, when interval = "10s", precision will be "1s"
  ##       when interval = "250ms", precision will be "1ms"
  ## Precision will NOT be used for service inputs. It is up to each individual
  ## service input to set the timestamp at the appropriate precision.
  ## Valid time units are "ns", "us" (or "µs"), "ms", "s".
  precision = ""

  ## Logging configuration:
  ## Run telegraf with debug log messages.
  debug = false
  ## Run telegraf in quiet mode (error log messages only).
  quiet = false
  ## Specify the log file name. The empty string means to log to stderr.
  logfile = "/var/log/telegraf/telegraf.log"
  logfile_rotation_interval = "24h"
  logfile_rotation_max_archives = 2
  logfile_rotation_max_size = "50MB"
  ## Override default hostname, if empty use os.Hostname()
  hostname = ""
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false

###############################################################################
#                            OUTPUT PLUGINS                                   #
###############################################################################

# # # Send telegraf metrics to file(s)
# [[outputs.file]]
#   ## Files to write to, "stdout" is a specially handled file.
#   files = ["/var/log/telegraf/telegraf.out"]

#   ## Data format to output.
#   ## Each data format has its own unique set of configuration options, read
#   ## more about them here:
#   ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md
#   data_format = "influx"

# # Configuration for Wavefront server to send metrics to
[[outputs.wavefront]]
#   ## DNS name of the wavefront proxy server
#   host = "wavefront.example.com"

url = "http://metrics****************:2878"
#
#   ## Port that the Wavefront proxy server listens on
#   port = 2878
#port = 2878
convert_paths = false
namepass = ["prod.*","qa.*","dev.*"]

#----------------------------------
#Linux Input Plugins 
#---------------------------------

###############################################################################################
#                              Linux Input Plugins                                            #
###############################################################################################

## NETWORK METRICS
[[inputs.net]]
  name_prefix = "prod.metrics."
  interval = "15m"
  ignore_protocol_stats = true

## CPU METRICS
[[inputs.cpu]]
  name_prefix = "prod.metrics."
  interval = "10m"
  percpu = false
  totalcpu = true
  collect_cpu_time = false
  report_active = false

## DISK METRICS
[[inputs.disk]]
  name_prefix = "prod.metrics."
  interval = "30m"

[[inputs.diskio]]
  name_prefix = "prod.metrics."
  interval = "5m"

## SYSTEM METRICS
[[inputs.system]]
  name_prefix = "prod.metrics."
  interval = "10m"

## MEMORY METRICS
 [[inputs.mem]]
   name_prefix = "prod.metrics."
   interval = "5m"
   fieldpass = ["active",
                "available",
                "buffered",
                "cached",
                "free",
                "inactive",
                "slab",
                "used",
                "available_percent",
                "used_percent",
                "wired",
                "commit_limit",
                "committed_as",
                "dirty",
                "high_free",
                "huge_pages_free",
                "low_free",
                "mapped",
                "page_tables",
                "shared",
                "swap_cached",
                "swap_free",
                "vmalloc_chunk",
                "vmalloc_used",
                "write_back",
                "write_back_tmp"]

[[inputs.mem]]
  name_prefix = "prod.metrics."
  interval = "60m"
  fieldpass = ["total","high_total","huge_page_size","huge_pages_total","low_total","swap_total","vmalloc_total"]

## SWAP METRICS
[[inputs.swap]]
  name_prefix = "prod.metrics."
  interval = "30m"
  fieldpass = ["free", "total","used", "used_percent"]

[[inputs.swap]]
  name_prefix = "prod.metrics."
  interval = "5m"
  fieldpass = ["in", "out"]

## TELEGRAF INTERNAL METRICS
[[inputs.internal]]
  interval = "60m"
  name_prefix = "prod.metrics."
  namepass = ["internal_gather*"]
    [inputs.internal.tagpass]
      input = ["internal"]

Logs from Telegraf

2023-07-17T15:26:21Z I! Loading config: /etc/telegraf/telegraf.d/monitor.conf
2023-07-17T15:26:21Z I! Starting Telegraf 1.27.2
2023-07-17T15:26:21Z I! Available plugins: 237 inputs, 9 aggregators, 28 processors, 23 parsers, 59 outputs, 4 secret-stores
2023-07-17T15:26:21Z I! Loaded inputs: cpu disk diskio exec (2x) internal mem (2x) net swap (2x) system
2023-07-17T15:26:21Z I! Loaded aggregators:
2023-07-17T15:26:21Z I! Loaded processors:
2023-07-17T15:26:21Z I! Loaded secretstores:
2023-07-17T15:26:21Z I! Loaded outputs: wavefront
2023-07-17T15:26:21Z I! Tags enabled: env=Production_Linux host=stuxsh03
2023-07-17T15:26:21Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"****", Flush Interval:10s
2023-07-17T15:27:53Z I! [agent] Hang on, flushing any cached metrics before shutdown
2023-07-17T15:27:53Z I! [agent] Stopping running outputs
2023-07-17T15:27:58Z I! Loading config: /etc/telegraf/telegraf.conf
2023-07-17T15:27:58Z I! Loading config: /etc/telegraf/telegraf.d/compute_services.conf
2023-07-17T15:27:58Z I! Loading config: /etc/telegraf/telegraf.d/monitor.conf
2023-07-17T15:27:58Z I! Starting Telegraf 1.27.2
2023-07-17T15:27:58Z I! Available plugins: 237 inputs, 9 aggregators, 28 processors, 23 parsers, 59 outputs, 4 secret-stores
2023-07-17T15:27:58Z I! Loaded inputs: cpu disk diskio exec (2x) internal mem (2x) net swap (2x) system
2023-07-17T15:27:58Z I! Loaded aggregators:
2023-07-17T15:27:58Z I! Loaded processors:
2023-07-17T15:27:58Z I! Loaded secretstores:
2023-07-17T15:27:58Z I! Loaded outputs: wavefront
2023-07-17T15:27:58Z I! Tags enabled: env=Production_Linux host**************
2023-07-17T15:27:58Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"***********", Flush Interval:10s
2023-07-17T15:27:58Z D! [agent] Initializing plugins
2023-07-17T15:27:58Z D! [agent] Connecting outputs
2023-07-17T15:27:58Z D! [agent] Attempting connection to [outputs.wavefront]
2023-07-17T15:27:58Z D! [outputs.wavefront] connecting over http/https using Url: ******************:2878
2023-07-17T15:27:58Z D! [agent] Successfully connected to outputs.wavefront
2023-07-17T15:27:58Z D! [agent] Starting service inputs
2023-07-17T15:28:14Z D! [outputs.wavefront] Flushing batch of 1 points
2023-07-17T15:28:14Z D! [outputs.wavefront] Wrote batch of 1 metrics in 63.547432ms
2023-07-17T15:28:14Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:28:30Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:28:46Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:29:03Z D! [outputs.wavefront] Flushing batch of 1 points
2023-07-17T15:29:03Z D! [outputs.wavefront] Wrote batch of 1 metrics in 33.405965ms
2023-07-17T15:29:03Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:29:16Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:29:31Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:29:47Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:30:00Z D! [inputs.disk] [SystemPS] => unable to get disk usage ("/var/named/chroot/etc/named"): permission denied
2023-07-17T15:30:00Z D! [inputs.disk] [SystemPS] => unable to get disk usage ("/var/named/chroot/var/named"): permission denied
2023-07-17T15:30:00Z D! [inputs.disk] [SystemPS] => unable to get disk usage ("/var/named/chroot/usr/lib64/bind"): permission denied
2023-07-17T15:30:04Z D! [outputs.wavefront] Error building tags: unexpected type: string, with value: 1 day,  2:37, for: prod.metrics.system.uptime_format
2023-07-17T15:30:04Z D! [outputs.wavefront] Flushing batch of 99 points
2023-07-17T15:30:04Z D! [outputs.wavefront] Wrote batch of 99 metrics in 69.526316ms
2023-07-17T15:30:04Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:30:14Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:30:24Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:30:36Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:30:48Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:31:07Z D! [outputs.wavefront] Flushing batch of 1 points
2023-07-17T15:31:07Z D! [outputs.wavefront] Wrote batch of 1 metrics in 33.711449ms
2023-07-17T15:31:07Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:31:21Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:31:35Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
stuxsh03.st9793:/root#

System info

telegraf-1.27.2 it's running in OS Linux 2.6.32-754.50.1.el6.x86_64

Docker

No response

Steps to reproduce

Reproducing has been tricky as it doesn't always appear to occur, but on systems that were impacted (hundreds+) reverting Telegraf to an earlier version, stopping the Telegraf service and removing the orphaned process, or performing the below actions resolved the issue.

What we have seen: Upgrading the Telegraf version 1.14 to 1.25.2 on RHEL servers seems to create an issue where DBus generates many orphaned processes. This eventually causes the system to hit the ceiling of available PIDs. Rolling back to 1.14 seems to clear the problem.

Example from one of our systems:

ps -ef|grep dbus|grep -v grep|wc -l 1459

Based on the issue https://github.com/influxdata/telegraf/issues/13481 it was resolved in recent release telegraf-1.27.2 but we are experiencing the same issue with recent release aswell

Expected behavior

Telegraf works as expected.

Actual behavior

Telegraf inadvertantly creates thousands of orphaned DBus processes which eventually causes the available PID's to hit the maximum ceiling, which causes system degradation.

Additional info

No response


### Tasks
crflanigan commented 10 months ago

@powersj New issue created

crflanigan commented 10 months ago

As an aside, it looks like so far this issue appears to be absent from 1.24.2.

elangovanseshan commented 10 months ago

here are the dbus details which we are seeing it in server


root       336     1  0 07:03 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       353     1  0 06:16 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       372     1  0 03:43 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       385     1  0 08:38 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       426     1  0 07:03 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       427     1  0 05:21 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       458     1  0 08:39 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       495     1  0 09:25 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       569     1  0 04:30 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       602     1  0 07:04 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       643     1  0 08:39 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       701     1  0 09:26 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       823     1  0 09:26 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       840     1  0 06:17 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       884     1  0 04:31 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       896     1  0 02:47 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       936     1  0 05:21 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       940     1  0 07:04 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       951     1  0 08:40 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       982     1  0 09:27 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1020     1  0 03:44 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1068     1  0 05:22 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1111     1  0 07:55 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1116     1  0 04:31 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1210     1  0 07:55 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1230     1  0 03:44 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1305     1  0 03:45 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1429     1  0 02:47 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1518     1  0 03:45 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1736     1  0 03:46 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1756     1  0 07:56 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1815     1  0 09:27 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1874     1  0 07:56 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1887     1  0 09:28 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1937     1  0 04:32 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1963     1  0 03:46 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2022     1  0 08:41 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2029     1  0 04:32 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2044     1  0 09:28 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2112     1  0 07:57 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2118     1  0 09:29 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2154     1  0 05:22 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2156     1  0 04:33 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2184     1  0 09:29 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2211     1  0 02:48 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2242     1  0 07:57 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2253     1  0 08:41 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2272     1  0 04:33 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2298     1  0 05:23 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2330     1  0 09:30 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2353     1  0 07:58 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2365     1  0 04:34 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2408     1  0 07:58 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2418     1  0 07:05 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2461     1  0 05:23 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2480     1  0 07:59 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2602     1  0 07:05 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2623     1  0 04:34 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2643     1  0 07:59 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2678     1  0 03:47 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2681     1  0 08:42 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2690     1  0 05:24 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2704     1  0 04:35 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2748     1  0 04:35 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2788     1  0 07:06 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session 
elangovanseshan commented 10 months ago

telegraf  5850     1  0 06:54 ?        00:00:02 /usr/bin/telegraf -pidfile /var/run/telegraf/telegraf.pid -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf telegraf.d
telegraf  5866     1  0 06:54 ?        00:00:00 /usr/bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root     11702 24282  0 07:34 pts/0    00:00:00 grep --color=auto -i telegraf 
crflanigan commented 10 months ago

Is there a way to disable the secret store completely? We don't use it and some component related to it seems to be causing the issues.

powersj commented 10 months ago

Thanks for the issue and logs. Are you seeing this across RHEL 6, 7, and 8 this time? Or only RHEL 6? I have got a RHEL 7 VM up looping over telegraf with --once to see if I can see multiple dbus-daemon's starting. I am over 10k loops and nothing showing up yet.

Is there a way to disable the secret store completely?

Only with a custom build of Telegraf.

Assuming that the issue is with the same code of the secret store as last time, that dbus command runs in the init function of that library. Which means the function is run as soon as the library is imported, before we have any time to do anything else.

elangovanseshan commented 10 months ago

Thanks Joshua for your reply ,I could see this issue only from RHEL6 and i have the latest version deployed in RHEL7/8 and there i don't see any issue with DBUS.


ps -ef|grep -i telegraf
telegraf 14007     1 10 09:49 ?        00:00:01 /usr/bin/telegraf -pidfile /var/run/telegraf/telegraf.pid -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
telegraf 14048     1  0 09:49 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root     14366  7614  0 09:49 pts/0    00:00:00 grep -i telegraf
powersj commented 10 months ago

@crflanigan, @elangovanseshan,

If you must have the newer version of Telegraf on RHEL 6, my suggestion then is to consider building telegraf with the custom builder. The result would provide you with a ~23Mb binary containing only the plugins you need and the secret store plugins would not be present.

git clone https://github.com/influxdata/telegraf
cd telegraf
go build -o ./tools/custom_builder/custom_builder ./tools/custom_builder
./tools/custom_builder/custom_builder --config <conf_file> --config-dir <conf_dir>

Would this be an option for you?

crflanigan commented 10 months ago

Hi @powersj,

We can look at that. I had thought that the Secret Store is core to Telegraf irrespective of the configuration you use post release 1.25, is that right?

Is Telegraf not supported on RHEL 6, if so, when was the last release where it was supported?

Thanks!

elangovanseshan commented 10 months ago

Thank you @powersj ,Let me try custom builder without secret store plugins

powersj commented 10 months ago

I had thought that the Secret Store is core to Telegraf irrespective of the configuration you use post release 1.25, is that right?

Internally to Telegraf, the secret stores are treated like the other plugins, so that you could build telegraf without it.

Is Telegraf not supported on RHEL 6, if so, when was the last release where it was supported?

We have a published doc for supported platforms, which essentially says we support OSes that are under standard support. In line with that, RHEL 6 stopped being supported at the end of 2020. RHEL 7 will stop next June 2024.

While we will not go out of our way to break any previous releases, if we do make a change that breaks them we are less inclined to revert it nor will we continue to test it.

crflanigan commented 10 months ago

Ok @powersj ,

It sounds like patching this issue is unlikely since it's occuring on an unsupported OS, is that right?

Thanks!

powersj commented 10 months ago

It sounds like patching this issue is unlikely since it's occuring on an unsupported OS, is that right?

If you proposed a PR or an idea to get around this we would certainly consider it. We are not going to completely close the door on a fix.

crflanigan commented 10 months ago

@powersj,

Fair enough, thanks buddy!

elangovanseshan commented 10 months ago

@crflanigan, @elangovanseshan,

If you must have the newer version of Telegraf on RHEL 6, my suggestion then is to consider building telegraf with the custom builder. The result would provide you with a ~23Mb binary containing only the plugins you need and the secret store plugins would not be present.

git clone https://github.com/influxdata/telegraf
cd telegraf
go build -o ./tools/custom_builder/custom_builder ./tools/custom_builder
./tools/custom_builder/custom_builder --config <conf_file> --config-dir <conf_dir>

Would this be an option for you?

Thanks ! @powersj custom_builder is working fine for me. I passed the sample conf file to build the binary ,it contain the cpu disk diskio exec mem net swap system input plugins and it's working fine. We have multiple internal teams are using multiple input plugins other than i mentioned above so if we build the binary with limited input plugins, it will affect other internal customers, So we would like to build the custom binary with all input plugins but except secret store plugins . Is there any possible way to build it without passing conf file for each input plugins or can we build with dummy conf files without secret store plugins?

powersj commented 10 months ago

So now i would like to build the custom binary with all input plugins but except secret store plugins

You can get a list of all the input plugins by generating the default config and grep'ing out all the input headers:

make
./telegraf config > default.toml
grep "^# \[\[inputs.*\]\]" default.toml | cut -d' ' -f2 | sort | uniq

You could then add that to your example config or pass that as a second file to the custom builder.

You could also use the various build tags to build telegraf as the customization docs show using BUILDTAGS:

BUILDTAGS="custom,aggregators,inputs,outputs,parsers,processors,serializers" make

If you do start to go this route, please ensure you include everything you actually need ;) It is easy to forget or not realize you are using a serializer for example. This is why I like the custom builder + an actual config better.

elangovanseshan commented 10 months ago

@powersj our initial testing is working fine with custom Telegraf with limited input and output plugin and no evidence of dbus process .

also i would like to know that how can we add the serializers to custom build? I added the required input,output,aggregators,processors through the example conf but not sure about serializers .

Do we need to pass it through conf file or do we have any other option?

powersj commented 10 months ago

Do we need to pass it through conf file or do we have any other option?

You can reference any of the serializers the same way. For example, if you want only the JSON serialier you can add serializers.json to the build tags.

The way to determine these build tags is to look in each plugin's all folder and look at the build tags at the top of a file. This is the JSON all file and you can see that the JSON serializer is imported if this is not a custom build, if a user specifies serializers, which pulls in all serializers, or if they specify serializers.json.

Does that help?

elangovanseshan commented 10 months ago

Thank you @powersj let me try this out

one more thing for your information, initially i updated like dbus issue happening only in RHEL6 servers but we had an issue with RHEL7/8 as well .

So we are planning to go with custom telegraf with limited plugins .

powersj commented 2 months ago

@elangovanseshan, @crflanigan,

but we had an issue with RHEL7/8 as well .

Sorry I never responded to this. Looking at the mentioned gosnowflake issue it looks like a workaround is setting DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus in the environment as well.

For Telegraf, I am inclined to document this and link to the still open upstream issue. Thoughts?

crflanigan commented 2 months ago

Hi @powersj,

Sorry for the delayed response.

I actually commented on one of these issues for keyring and got a notification this morning that they may have resolved it? Seems like a lot of people use this library.

https://github.com/99designs/keyring/issues/103

What do you think?

powersj commented 2 months ago

Hey @crflanigan,

Did someone delete their comment? Latest I see is from Apr 12, 2023.

Hipska commented 17 hours ago

@powersj I think @crflanigan was referring to https://github.com/snowflakedb/gosnowflake/issues/773#issuecomment-2024775431

BTW, I now have that message even when not using outputs.sql at all..

WARN[0000]log.go:244 gosnowflake.(*defaultLogger).Warn DBUS_SESSION_BUS_ADDRESS envvar looks to be not set, this can lead to runaway dbus-daemon processes. To avoid this, set envvar DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus (if it exists) or DBUS_SESSION_BUS_ADDRESS=/dev/null. 2024-05-31T14:00:14Z I! Loading config: test.toml 2024-05-31T14:00:14Z I! Starting Telegraf 1.31.0-35bff98f brought to you by InfluxData the makers of InfluxDB 2024-05-31T14:00:14Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 6 secret-stores 2024-05-31T14:00:14Z I! Loaded inputs: snmp 2024-05-31T14:00:14Z I! Loaded aggregators: 2024-05-31T14:00:14Z I! Loaded processors: 2024-05-31T14:00:14Z I! Loaded secretstores: 2024-05-31T14:00:14Z W! Outputs are not used in testing mode! 2024-05-31T14:00:14Z I! Tags enabled:

This does not happen with telegraf 1.30.3