influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.89k stars 5.6k forks source link

statsd UDP metrics split problem #2938

Closed keyboardfann closed 6 years ago

keyboardfann commented 7 years ago

Bug report

Dear @danielnelson , telegraf 1.3.2 seemd resolve UDP metrics splitting problem. But I also meet the split problem, I use statsrelay -> telegraf statsd input plugin. When metrics > 1000 may find the problem happen, I don't know it's telegraf issue or statsrelay issue.

Relevant telegraf.conf:

# Telegraf Configuration
#
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
#
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
#
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
#
# Environment variables can be used anywhere in this config file, simply prepend
# them with $. For strings the variable must be within quotes (ie, "$STR_VAR"),
# for numbers and booleans they should be plain (ie, $INT_VAR, $BOOL_VAR)

# Global tags can be specified here in key="value" format.
[global_tags]
  # dc = "us-east-1" # will tag all metrics with dc=us-east-1
  # rack = "1a"
  ## Environment variables can be used as tags, and throughout the config file
  # user = "$USER"

# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "60s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 200

  ## For failed writes, telegraf will cache metric_buffer_limit metrics for each
  ## output, and will flush this buffer on a successful write. Oldest metrics
  ## are dropped first when this buffer fills.
  ## This buffer only fills when writes fail to output plugin(s).
  metric_buffer_limit = 1000000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. You shouldn't set this below
  ## interval. Maximum flush_interval will be flush_interval + flush_jitter
  flush_interval = "60s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  ## By default, precision will be set to the same timestamp order as the
  ## collection interval, with the maximum being 1s.
  ## Precision will NOT be used for service inputs, such as logparser and statsd.
  ## Valid values are "ns", "us" (or "µs"), "ms", "s".
  precision = ""

  ## Logging configuration:
  ## Run telegraf with debug log messages.
  debug = true
  ## Run telegraf in quiet mode (error log messages only).
  quiet = false
  ## Specify the log file name. The empty string means to log to stderr.
  logfile = "/var/log/telegraf/telegraf.log"

  ## Override default hostname, if empty use os.Hostname()
  hostname = ""
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = true

###############################################################################
#                            OUTPUT PLUGINS                                   #
###############################################################################

# Configuration for Riemann to send metrics to
[[outputs.riemann]]
  ## The full TCP or UDP URL of the Riemann server
  #url = "udp://10.62.4.240:12010"
  url = "tcp://10.62.4.159:5555"

  ## Riemann event TTL, floating-point time in seconds.
  ## Defines how long that an event is considered valid for in Riemann
  # ttl = 30.0

  ## Separator to use between measurement and field name in Riemann service name
  ## This does not have any effect if 'measurement_as_attribute' is set to 'true'
  separator = "/"

  ## Set measurement name as Riemann attribute 'measurement', instead of prepending it to the Riemann service name
#  measurement_as_attribute = true

  ## Send string metrics as Riemann event states.
  ## Unless enabled all string metrics will be ignored
  # string_as_state = false

  ## A list of tag keys whose values get sent as Riemann tags.
  ## If empty, all Telegraf tag values will be sent as tags
  # tag_keys = ["telegraf","custom_tag"]

  ## Additional Riemann tags to send.
  # tags = ["telegraf-output"]

  ## Description for Riemann event
  # description_text = "metrics collected from telegraf"

  ## Riemann client write timeout, defaults to "5s" if not set.
  #timeout = "5s"

#[[outputs.file]]
##  ## Files to write to, "stdout" is a specially handled file.
#  files = ["stdout", "/tmp/metrics.out"]
###############################################################################
#                            PROCESSOR PLUGINS                                #
###############################################################################

# # Print all metrics that pass through this filter.
# [[processors.printer]]

###############################################################################
#                            AGGREGATOR PLUGINS                               #
###############################################################################

# # Keep the aggregate min/max of each metric passing through.
# [[aggregators.minmax]]
#   ## General Aggregator Arguments:
#   ## The period on which to flush & clear the aggregator.
#   period = "30s"
#   ## If true, the original metric will be dropped by the
#   ## aggregator and will not get sent to the output plugins.
#   drop_original = false

###############################################################################
#                            INPUT PLUGINS                                    #
###############################################################################

###############################################################################
#                            SERVICE INPUT PLUGINS                            #
###############################################################################

# Statsd Server
[[inputs.statsd]]
  ## Address and port to host UDP listener on
  service_address = "10.62.4.158:12100"

  ## The following configuration options control when telegraf clears it's cache
  ## of previous values. If set to false, then telegraf will only clear it's
  ## cache when the daemon is restarted.
  ## Reset gauges every interval (default=true)
  delete_gauges = true
  ## Reset counters every interval (default=true)
  delete_counters = true
  ## Reset sets every interval (default=true)
  delete_sets = true
  ## Reset timings & histograms every interval (default=true)
  delete_timings = true

  ## Percentiles to calculate for timing & histogram stats
  percentiles = [90]

  ## separator to use between elements of a statsd metric
  metric_separator = "."

  ## Parses tags in the datadog statsd format
  ## http://docs.datadoghq.com/guides/dogstatsd/
  parse_data_dog_tags = false

  ## Statsd data translation templates, more info can be read here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md#graphite
  # templates = [
  #     "* measurement* datacenter=1a"
  # ]

  ## Number of UDP messages allowed to queue up, once filled,
  ## the statsd server will start dropping packets
  allowed_pending_messages = 40000000

  ## Number of timing/histogram values to track per-measurement in the
  ## calculation of percentiles. Raising this limit increases the accuracy
  ## of percentiles but also increases the memory usage and cpu time.
  percentile_limit = 1000

System info:

statsrelay project: https://github.com/jjneely/statsrelay telegraf version: 1.3.2 input plugin: statsd output plugin: riemann

Steps to reproduce:

Test script:

for i in $(seq 1 1000);do echo "deploys.test.myservice$i:1|c" | nc -w 1 -u 10.62.4.240 12000;done

telegraf.log

2017-06-19T14:20:00Z D! Output [riemann] wrote batch of 200 metrics in 21.181514ms
2017-06-19T14:20:00Z D! Output [riemann] wrote batch of 200 metrics in 19.413567ms
2017-06-19T14:20:00Z D! Output [riemann] wrote batch of 200 metrics in 22.65629ms
2017-06-19T14:20:00Z D! Output [riemann] wrote batch of 200 metrics in 20.937404ms
2017-06-19T14:20:00Z D! Output [riemann] wrote batch of 200 metrics in 23.934974ms
2017-06-19T14:20:00Z D! Output [riemann] buffer fullness: 18 / 1000000 metrics. 
2017-06-19T14:20:00Z D! Output [riemann] wrote batch of 18 metrics in 4.565389ms

Test script:

for i in $(seq 1 1400);do echo "deploys.test.myservice$i:1|c" | nc -w 1 -u 10.62.4.240 12000;done

telegraf.log

2017-06-19T14:28:06Z I! Starting Telegraf (version 1.3.2)
2017-06-19T14:28:06Z I! Loaded outputs: riemann
2017-06-19T14:28:06Z I! Loaded inputs: inputs.internal inputs.statsd
2017-06-19T14:28:06Z I! Tags enabled: 
2017-06-19T14:28:06Z I! Agent Config: Interval:1m0s, Quiet:false, Hostname:"", Flush Interval:1m0s 
2017-06-19T14:28:06Z I! Started the statsd service on 10.62.4.158:12100
2017-06-19T14:28:06Z I! Statsd listener listening on:  10.62.4.158:12100
2017-06-19T14:28:32Z E! Error: splitting ':', Unable to parse metric: |c
2017-06-19T14:28:32Z E! Error: splitting ':', Unable to parse metric: |c
2017-06-19T14:28:32Z E! Error: splitting ':', Unable to parse metric: |c
2017-06-19T14:28:32Z E! Error: splitting ':', Unable to parse metric: |c
2017-06-19T14:28:32Z E! Error: splitting ':', Unable to parse metric: |c
2017-06-19T14:28:33Z E! Error: splitting ':', Unable to parse metric: |c
2017-06-19T14:28:33Z E! Error: splitting ':', Unable to parse metric: |c
2017-06-19T14:29:00Z D! Output [riemann] wrote batch of 200 metrics in 15.459235ms
2017-06-19T14:29:00Z D! Output [riemann] wrote batch of 200 metrics in 11.160646ms
2017-06-19T14:29:00Z D! Output [riemann] wrote batch of 200 metrics in 12.517813ms
2017-06-19T14:29:00Z D! Output [riemann] wrote batch of 200 metrics in 14.176716ms

Expected behavior:

No error happen

Actual behavior:

Additional info:

2017-06-19T14:28:32Z E! Error: splitting ':', Unable to parse metric: |c 2017-06-19T14:28:32Z E! Error: splitting ':', Unable to parse metric: |c 2017-06-19T14:28:32Z E! Error: splitting ':', Unable to parse metric: |c

Use case:

metrics -> statsrelay -> telegraf statsd plugin

danielnelson commented 7 years ago

I ran statsrelay 127.0.0.1:8125 and your for loop all on one system, but wasn't able to duplicate the error.

keyboardfann commented 7 years ago

Dear @danielnelson, I use two pure VM to test again, and I also try jjneely's statsrelay and uber's statsrelay, jjneely's statsrelay have this issue and uber's statsrelay don't see the problem. So I think it may be jjneely's statsrelay issue.

Env: 192.168.100.102 https://github.com/jjneely/statsrelay https://github.com/uber/statsrelay, telegraf 1.3.2 192.168.100.103 statsd-tg https://github.com/octo/statsd-tg

Testing script

while true;do statsd-tg -d 192.168.100.102 -D 12001 -T 1 -s 0 -c 100000 -t 0 -g 0 & sleep 0.5s;pkill statsd-tg;sleep 3s;done

jjneely's statsrelay

start jjneely statsrelay start

#/usr/bin/statsrelay -bind=192.168.100.102 -port=12001  -bufsize 100 -prefix="Operations.Monitor.tng1396.statsrelay1" 192.168.100.102:12100:1
2017/06/20 03:56:24 Starting version 0.0.6
2017/06/20 03:56:24 Listening on 192.168.100.102:12001
2017/06/20 03:56:24 Setting socket read buffer size to: 100

telegraf.log

[root@kafka1 ~]# tailf /var/log/telegraf/telegraf.log 
2017-06-20T03:54:20Z D! Output [file] wrote batch of 200 metrics in 315.92µs
2017-06-20T03:54:27Z D! Attempting connection to output: file
2017-06-20T03:54:27Z D! Successfully connected to output: file
2017-06-20T03:54:27Z I! Starting Telegraf (version 1.3.2)
2017-06-20T03:54:27Z I! Loaded outputs: file
2017-06-20T03:54:27Z I! Loaded inputs: inputs.cpu inputs.disk inputs.diskio inputs.statsd
2017-06-20T03:54:27Z I! Tags enabled: 
2017-06-20T03:54:27Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"", Flush Interval:1m0s 
2017-06-20T03:54:27Z I! Started the statsd service on 192.168.100.102:12100
2017-06-20T03:54:27Z I! Statsd listener listening on:  192.168.100.102:12100
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:29Z E! Error: splitting ':', Unable to parse metric: 7|c
2017-06-20T03:54:30Z D! Output [file] wrote batch of 200 metrics in 1.912058ms
2017-06-20T03:54:30Z D! Output [file] wrote batch of 200 metrics in 759.935µs
2017-06-20T03:54:30Z D! Output [file] wrote batch of 200 metrics in 867.7µs
2017-06-20T03:54:30Z D! Output [file] wrote batch of 200 metrics in 826.469µs
2017-06-20T03:54:30Z D! Output [file] wrote batch of 200 metrics in 719.278µs
2017-06-20T03:54:30Z D! Output [file] wrote batch of 200 metrics in 1.136756ms

Uber's statsrelay

Config

#cat statsrelay.conf
statsd:
  bind: 192.168.100.102:12001
  tcp_cork: false
  validate: true
  shard_map:
    0: 192.168.100.102:12100:udp

start uber's statsrelay

/usr/local/bin/statsrelay -c statsrelay.conf

telegraf.log

[root@kafka1 ~]# tailf /var/log/telegraf/telegraf.log 
2017-06-20T04:01:50Z D! Output [file] wrote batch of 200 metrics in 368.429µs
2017-06-20T04:01:53Z D! Attempting connection to output: file
2017-06-20T04:01:53Z D! Successfully connected to output: file
2017-06-20T04:01:53Z I! Starting Telegraf (version 1.3.2)
2017-06-20T04:01:53Z I! Loaded outputs: file
2017-06-20T04:01:53Z I! Loaded inputs: inputs.cpu inputs.disk inputs.diskio inputs.statsd
2017-06-20T04:01:53Z I! Tags enabled: 
2017-06-20T04:01:53Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"", Flush Interval:1m0s 
2017-06-20T04:01:53Z I! Started the statsd service on 192.168.100.102:12100
2017-06-20T04:01:53Z I! Statsd listener listening on:  192.168.100.102:12100
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 1.807102ms
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 1.322243ms
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 1.373924ms
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 1.441178ms
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 1.272336ms
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 692.862µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 1.201765ms
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 735.501µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 739.029µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 845.042µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 779.6µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 839.852µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 810.676µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 2.615224ms
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 745.639µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 693.889µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 829.516µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 2.794909ms
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 352.451µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 576.573µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 890.887µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 541.656µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 390.888µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 377.252µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 396.356µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 349.788µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 383.103µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 345.467µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 362.07µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 372.954µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 428.071µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 513.739µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 1.92334ms
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 326.008µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 430.196µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 370.813µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 489.807µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 553.398µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 428.618µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 373.517µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 455.753µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 377.711µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 547.767µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 471.042µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 369.362µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 523.965µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 428.49µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 710.709µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 603.367µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 348.366µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 1.406346ms
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 374.955µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 393.87µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 337.9µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 335.111µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 392.559µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 447.687µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 393.363µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 530.462µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 823.443µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 854.589µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 446.902µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 438.011µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 418.345µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 396.726µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 384.865µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 336.277µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 346.759µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 380.068µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 389.338µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 771.095µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 745.232µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 528.255µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 1.08228ms
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 948.26µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 605.839µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 876.955µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 418.135µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 403.261µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 350.281µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 342.784µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 389.855µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 2.295636ms
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 668.056µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 1.319376ms
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 802.893µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 526.243µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 536.775µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 984.581µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 895.943µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 1.572011ms
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 601.511µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 1.670955ms
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 498.762µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 450.07µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 592.703µs
2017-06-20T04:02:00Z D! Output [file] wrote batch of 200 metrics in 1.020817ms
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 1.332849ms
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 12.454643ms
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 1.018686ms
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 972.755µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 1.016568ms
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 588.945µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 526.909µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 390.079µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 442.113µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 399.213µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 388.404µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 527.321µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 948.624µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 437.394µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 429.789µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 467.111µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 372.313µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 524.04µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 700.287µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 488.543µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 442.084µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 364.934µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 343.459µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 331.522µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 413.083µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 698.867µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 580.549µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 1.661159ms
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 573.744µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 3.514508ms
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 420.525µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 411.745µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 386.316µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 823.943µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 666.417µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 650.16µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 859.913µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 693.22µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 880.664µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 812.963µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 664.497µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 459.249µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 1.156551ms
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 501.614µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 422.12µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 473.055µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 483.871µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 641.766µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 3.049713ms
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 707.531µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 1.535869ms
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 472.788µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 537.778µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 578.471µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 451.363µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 443.779µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 437.736µs
2017-06-20T04:02:10Z D! Output [file] wrote batch of 200 metrics in 421.09µs
danielnelson commented 7 years ago

Can you try capturing a little bit of output with tcpdump? Run this instead of telegraf:

tcpdump -i lo udp port 12100 -X
keyboardfann commented 7 years ago

Dear @danielnelson, Yes , here is the tcpdump result.

Env: 192.168.100.102 https://github.com/jjneely/statsrelay 192.168.100.103 statsd-tg https://github.com/octo/statsd-tg

Testing script

while true;do statsd-tg -d 192.168.100.102 -D 12001 -T 1 -s 0 -c 100000 -t 0 -g 0 & sleep 0.5s;pkill statsd-tg;sleep 3s;done

start stats-relay

/usr/bin/statsrelay -bind=192.168.100.102 -port=12001  -bufsize 100 -prefix="Operations.Monitor.tng1396.statsrelay1" 192.168.100.102:12100:1
2017/06/20 17:48:34 Starting version 0.0.6
2017/06/20 17:48:34 Listening on 192.168.100.102:12001
2017/06/20 17:48:34 Setting socket read buffer size to: 100

tcpdump on 100.102

tcpdump -i lo udp port 12100 -X > /tmp/tcpdump_result

tcpdump file https://drive.google.com/file/d/0B2xTMRot8HV5QTJLYlRQMU1sNVU/view?usp=sharing

Seems the UDP length from ( stats-tg -> statsrelay -> port 12100 ) < 1400

17:55:28.053768 IP kafka1.test.41570 > kafka1.test.12100: UDP, length 1397
17:55:28.053891 IP kafka1.test.35599 > kafka1.test.12100: UDP, length 1397
17:55:28.053961 IP kafka1.test.34241 > kafka1.test.12100: UDP, length 1397
17:55:28.054033 IP kafka1.test.36590 > kafka1.test.12100: UDP, length 1397
17:55:28.054112 IP kafka1.test.36368 > kafka1.test.12100: UDP, length 1397
17:55:28.054178 IP kafka1.test.43820 > kafka1.test.12100: UDP, length 1397
17:55:28.054249 IP kafka1.test.47400 > kafka1.test.12100: UDP, length 1397
17:55:28.054316 IP kafka1.test.35680 > kafka1.test.12100: UDP, length 1397
17:55:28.054457 IP kafka1.test.56218 > kafka1.test.12100: UDP, length 1397
17:55:28.054537 IP kafka1.test.42948 > kafka1.test.12100: UDP, length 1397
17:55:28.054605 IP kafka1.test.44560 > kafka1.test.12100: UDP, length 1397
17:55:28.056072 IP kafka1.test.59373 > kafka1.test.12100: UDP, length 1397
17:55:28.056160 IP kafka1.test.34058 > kafka1.test.12100: UDP, length 1397
17:55:28.056224 IP kafka1.test.42728 > kafka1.test.12100: UDP, length 1397
17:55:28.056291 IP kafka1.test.56836 > kafka1.test.12100: UDP, length 1397
17:55:28.056354 IP kafka1.test.47465 > kafka1.test.12100: UDP, length 1397
17:55:28.062295 IP kafka1.test.37014 > kafka1.test.12100: UDP, length 985
17:55:28.062518 IP kafka1.test.35901 > kafka1.test.12100: UDP, length 1397
17:55:28.062602 IP kafka1.test.59695 > kafka1.test.12100: UDP, length 1397
17:55:28.063181 IP kafka1.test.33063 > kafka1.test.12100: UDP, length 1397
17:55:28.063893 IP kafka1.test.34419 > kafka1.test.12100: UDP, length 1397
17:55:28.069469 IP kafka1.test.55125 > kafka1.test.12100: UDP, length 1397
17:55:28.069542 IP kafka1.test.55581 > kafka1.test.12100: UDP, length 1397
17:55:28.069580 IP kafka1.test.44516 > kafka1.test.12100: UDP, length 1397
17:55:28.069615 IP kafka1.test.42585 > kafka1.test.12100: UDP, length 1397
17:55:28.069654 IP kafka1.test.41918 > kafka1.test.12100: UDP, length 1397
17:55:28.069688 IP kafka1.test.44570 > kafka1.test.12100: UDP, length 1397
17:55:28.069722 IP kafka1.test.38276 > kafka1.test.12100: UDP, length 1397
17:55:28.069756 IP kafka1.test.54298 > kafka1.test.12100: UDP, length 1397
17:55:28.069792 IP kafka1.test.40494 > kafka1.test.12100: UDP, length 1397
17:55:28.069823 IP kafka1.test.53111 > kafka1.test.12100: UDP, length 1397
17:55:28.069861 IP kafka1.test.51459 > kafka1.test.12100: UDP, length 1397
17:55:31.142113 IP kafka1.test.56138 > kafka1.test.12100: UDP, length 1397
17:55:31.142244 IP kafka1.test.36287 > kafka1.test.12100: UDP, length 1397
17:55:31.142314 IP kafka1.test.42614 > kafka1.test.12100: UDP, length 1397
17:55:31.142380 IP kafka1.test.36162 > kafka1.test.12100: UDP, length 1397
17:55:31.142442 IP kafka1.test.42012 > kafka1.test.12100: UDP, length 1397
17:55:31.142509 IP kafka1.test.38587 > kafka1.test.12100: UDP, length 1397
17:55:31.142573 IP kafka1.test.52190 > kafka1.test.12100: UDP, length 1397
17:55:31.142870 IP kafka1.test.49414 > kafka1.test.12100: UDP, length 1397
17:55:31.142957 IP kafka1.test.34571 > kafka1.test.12100: UDP, length 1397
17:55:31.143030 IP kafka1.test.45584 > kafka1.test.12100: UDP, length 1397
17:55:31.143095 IP kafka1.test.46294 > kafka1.test.12100: UDP, length 1397
17:55:31.143158 IP kafka1.test.59217 > kafka1.test.12100: UDP, length 1397
17:55:31.143220 IP kafka1.test.60178 > kafka1.test.12100: UDP, length 1397
17:55:31.143282 IP kafka1.test.37500 > kafka1.test.12100: UDP, length 1397
17:55:31.143346 IP kafka1.test.57844 > kafka1.test.12100: UDP, length 1397
17:55:31.150668 IP kafka1.test.46063 > kafka1.test.12100: UDP, length 887
17:55:31.150784 IP kafka1.test.55676 > kafka1.test.12100: UDP, length 1397
17:55:31.150854 IP kafka1.test.49250 > kafka1.test.12100: UDP, length 1397
17:55:31.150926 IP kafka1.test.42044 > kafka1.test.12100: UDP, length 1397
17:55:31.150990 IP kafka1.test.40886 > kafka1.test.12100: UDP, length 1397
17:55:31.151056 IP kafka1.test.37811 > kafka1.test.12100: UDP, length 1397
17:55:31.151117 IP kafka1.test.38173 > kafka1.test.12100: UDP, length 1397
17:55:31.151386 IP kafka1.test.49391 > kafka1.test.12100: UDP, length 1397
17:55:31.151623 IP kafka1.test.34027 > kafka1.test.12100: UDP, length 1397
17:55:31.151689 IP kafka1.test.33410 > kafka1.test.12100: UDP, length 1397
17:55:31.151753 IP kafka1.test.43693 > kafka1.test.12100: UDP, length 1397
17:55:31.151814 IP kafka1.test.34816 > kafka1.test.12100: UDP, length 1397
17:55:31.151876 IP kafka1.test.37257 > kafka1.test.12100: UDP, length 1397
17:55:31.151937 IP kafka1.test.56299 > kafka1.test.12100: UDP, length 1397
17:55:31.152177 IP kafka1.test.50105 > kafka1.test.12100: UDP, length 1397
17:55:31.152797 IP kafka1.test.40228 > kafka1.test.12100: UDP, length 1397
17:55:31.153265 IP kafka1.test.35974 > kafka1.test.12100: UDP, length 1397
17:55:31.153913 IP kafka1.test.39680 > kafka1.test.12100: UDP, length 1397
17:55:31.154464 IP kafka1.test.51610 > kafka1.test.12100: UDP, length 1397
17:55:31.154889 IP kafka1.test.55384 > kafka1.test.12100: UDP, length 1397
17:55:34.153268 IP kafka1.test.55803 > kafka1.test.12100: UDP, length 1397
17:55:34.153268 IP kafka1.test.37534 > kafka1.test.12100: UDP, length 896
17:55:34.153335 IP kafka1.test.54253 > kafka1.test.12100: UDP, length 1397
17:55:34.153405 IP kafka1.test.42727 > kafka1.test.12100: UDP, length 1397
17:55:34.153446 IP kafka1.test.33784 > kafka1.test.12100: UDP, length 1397
17:55:34.153494 IP kafka1.test.43586 > kafka1.test.12100: UDP, length 1397
17:55:34.153514 IP kafka1.test.32817 > kafka1.test.12100: UDP, length 1397
17:55:34.153804 IP kafka1.test.56143 > kafka1.test.12100: UDP, length 1397
17:55:35.158973 IP kafka1.test.37573 > kafka1.test.12100: UDP, length 1397
17:55:35.159120 IP kafka1.test.54388 > kafka1.test.12100: UDP, length 1397
17:55:35.159195 IP kafka1.test.56729 > kafka1.test.12100: UDP, length 1397
17:55:35.162045 IP kafka1.test.41861 > kafka1.test.12100: UDP, length 1397
17:55:35.162199 IP kafka1.test.55088 > kafka1.test.12100: UDP, length 1397
17:55:35.162282 IP kafka1.test.51487 > kafka1.test.12100: UDP, length 1397
17:55:35.162346 IP kafka1.test.43045 > kafka1.test.12100: UDP, length 1397
17:55:35.162424 IP kafka1.test.56547 > kafka1.test.12100: UDP, length 1397
17:55:35.162514 IP kafka1.test.42359 > kafka1.test.12100: UDP, length 1397
17:55:35.162602 IP kafka1.test.34869 > kafka1.test.12100: UDP, length 1397
17:55:35.162858 IP kafka1.test.57256 > kafka1.test.12100: UDP, length 1397
17:55:35.162970 IP kafka1.test.33000 > kafka1.test.12100: UDP, length 1397
17:55:35.163057 IP kafka1.test.40085 > kafka1.test.12100: UDP, length 1397
17:55:35.163143 IP kafka1.test.53005 > kafka1.test.12100: UDP, length 1397
17:55:35.163209 IP kafka1.test.45331 > kafka1.test.12100: UDP, length 1397
17:55:35.163351 IP kafka1.test.55470 > kafka1.test.12100: UDP, length 1397
17:55:35.163420 IP kafka1.test.53692 > kafka1.test.12100: UDP, length 1397
17:55:35.163585 IP kafka1.test.37228 > kafka1.test.12100: UDP, length 1397
17:55:38.179256 IP kafka1.test.45196 > kafka1.test.12100: UDP, length 1397
17:55:38.179398 IP kafka1.test.40477 > kafka1.test.12100: UDP, length 1397
17:55:38.179466 IP kafka1.test.48206 > kafka1.test.12100: UDP, length 1397
17:55:38.179526 IP kafka1.test.52259 > kafka1.test.12100: UDP, length 1397
17:55:38.179582 IP kafka1.test.58971 > kafka1.test.12100: UDP, length 1397
17:55:38.179639 IP kafka1.test.51762 > kafka1.test.12100: UDP, length 1397
17:55:38.179697 IP kafka1.test.45699 > kafka1.test.12100: UDP, length 1397
17:55:38.180021 IP kafka1.test.54707 > kafka1.test.12100: UDP, length 1397
17:55:38.180122 IP kafka1.test.55893 > kafka1.test.12100: UDP, length 1397
17:55:38.180221 IP kafka1.test.49492 > kafka1.test.12100: UDP, length 1397
17:55:38.180313 IP kafka1.test.38164 > kafka1.test.12100: UDP, length 1397
17:55:38.180410 IP kafka1.test.53038 > kafka1.test.12100: UDP, length 1397
17:55:38.180506 IP kafka1.test.49816 > kafka1.test.12100: UDP, length 1397
17:55:38.180734 IP kafka1.test.53964 > kafka1.test.12100: UDP, length 1397
17:55:38.180907 IP kafka1.test.43071 > kafka1.test.12100: UDP, length 1397
17:55:38.186024 IP kafka1.test.35889 > kafka1.test.12100: UDP, length 1018
17:55:38.186202 IP kafka1.test.35735 > kafka1.test.12100: UDP, length 1397
17:55:38.186309 IP kafka1.test.56424 > kafka1.test.12100: UDP, length 1397
17:55:38.186372 IP kafka1.test.52029 > kafka1.test.12100: UDP, length 1397
17:55:38.186430 IP kafka1.test.46834 > kafka1.test.12100: UDP, length 1397
17:55:38.186487 IP kafka1.test.56946 > kafka1.test.12100: UDP, length 1397
17:55:38.186543 IP kafka1.test.35382 > kafka1.test.12100: UDP, length 1397
17:55:38.186868 IP kafka1.test.57092 > kafka1.test.12100: UDP, length 1397
17:55:38.187330 IP kafka1.test.48981 > kafka1.test.12100: UDP, length 1397
17:55:38.187916 IP kafka1.test.34154 > kafka1.test.12100: UDP, length 1397
17:55:38.188431 IP kafka1.test.54836 > kafka1.test.12100: UDP, length 1397
17:55:41.214165 IP kafka1.test.50178 > kafka1.test.12100: UDP, length 709
17:55:41.214223 IP kafka1.test.35456 > kafka1.test.12100: UDP, length 1397
17:55:41.214241 IP kafka1.test.54332 > kafka1.test.12100: UDP, length 1397
17:55:41.214258 IP kafka1.test.50194 > kafka1.test.12100: UDP, length 1397
17:55:41.214492 IP kafka1.test.37681 > kafka1.test.12100: UDP, length 1397
17:55:42.221058 IP kafka1.test.46632 > kafka1.test.12100: UDP, length 1397
17:55:42.221244 IP kafka1.test.34531 > kafka1.test.12100: UDP, length 1397
17:55:42.223314 IP kafka1.test.34969 > kafka1.test.12100: UDP, length 1397
17:55:42.224319 IP kafka1.test.46422 > kafka1.test.12100: UDP, length 1397
17:55:42.224423 IP kafka1.test.41146 > kafka1.test.12100: UDP, length 1397
17:55:42.224497 IP kafka1.test.39270 > kafka1.test.12100: UDP, length 1397
17:55:42.224561 IP kafka1.test.38478 > kafka1.test.12100: UDP, length 1397
17:55:42.224628 IP kafka1.test.42430 > kafka1.test.12100: UDP, length 1397
17:55:42.228153 IP kafka1.test.38120 > kafka1.test.12100: UDP, length 1397
17:55:42.228260 IP kafka1.test.40140 > kafka1.test.12100: UDP, length 1397
17:55:42.228329 IP kafka1.test.58580 > kafka1.test.12100: UDP, length 1397
17:55:42.228391 IP kafka1.test.46122 > kafka1.test.12100: UDP, length 1397
17:55:42.228452 IP kafka1.test.41972 > kafka1.test.12100: UDP, length 1397
17:55:42.228518 IP kafka1.test.55036 > kafka1.test.12100: UDP, length 1397
17:55:42.228699 IP kafka1.test.36997 > kafka1.test.12100: UDP, length 1397
17:55:42.228788 IP kafka1.test.37566 > kafka1.test.12100: UDP, length 1397
17:55:42.228857 IP kafka1.test.56150 > kafka1.test.12100: UDP, length 1397
17:55:42.228933 IP kafka1.test.37239 > kafka1.test.12100: UDP, length 1397
17:55:42.228996 IP kafka1.test.34125 > kafka1.test.12100: UDP, length 1397
17:55:42.229130 IP kafka1.test.40625 > kafka1.test.12100: UDP, length 1397
17:55:42.229220 IP kafka1.test.52312 > kafka1.test.12100: UDP, length 1397
17:55:42.229285 IP kafka1.test.36909 > kafka1.test.12100: UDP, length 1397
17:55:42.229351 IP kafka1.test.53969 > kafka1.test.12100: UDP, length 1397
17:55:42.230449 IP kafka1.test.44587 > kafka1.test.12100: UDP, length 1397
17:55:42.231032 IP kafka1.test.46858 > kafka1.test.12100: UDP, length 1397
17:55:42.231539 IP kafka1.test.56907 > kafka1.test.12100: UDP, length 1397
17:55:42.236809 IP kafka1.test.36386 > kafka1.test.12100: UDP, length 1107
17:55:42.236921 IP kafka1.test.46588 > kafka1.test.12100: UDP, length 1397
17:55:42.236983 IP kafka1.test.39635 > kafka1.test.12100: UDP, length 1397
17:55:42.237038 IP kafka1.test.44329 > kafka1.test.12100: UDP, length 1397
17:55:42.237090 IP kafka1.test.56298 > kafka1.test.12100: UDP, length 1397
17:55:42.237146 IP kafka1.test.50303 > kafka1.test.12100: UDP, length 1397
17:55:42.237199 IP kafka1.test.39281 > kafka1.test.12100: UDP, length 1397
17:55:42.237366 IP kafka1.test.44417 > kafka1.test.12100: UDP, length 1397
17:55:42.237885 IP kafka1.test.36726 > kafka1.test.12100: UDP, length 1397
17:55:42.238404 IP kafka1.test.59840 > kafka1.test.12100: UDP, length 1397
17:55:42.238943 IP kafka1.test.47297 > kafka1.test.12100: UDP, length 1397
17:55:45.234235 IP kafka1.test.35109 > kafka1.test.12100: UDP, length 1397
17:55:45.234279 IP kafka1.test.59341 > kafka1.test.12100: UDP, length 1397
17:55:45.234300 IP kafka1.test.35280 > kafka1.test.12100: UDP, length 1397
17:55:45.234324 IP kafka1.test.55535 > kafka1.test.12100: UDP, length 1397
17:55:45.234345 IP kafka1.test.59367 > kafka1.test.12100: UDP, length 1397
17:55:45.234365 IP kafka1.test.36020 > kafka1.test.12100: UDP, length 1397
17:55:45.234384 IP kafka1.test.56810 > kafka1.test.12100: UDP, length 1397
17:55:45.234402 IP kafka1.test.50708 > kafka1.test.12100: UDP, length 1397
17:55:45.234422 IP kafka1.test.44774 > kafka1.test.12100: UDP, length 1397
17:55:45.234442 IP kafka1.test.49537 > kafka1.test.12100: UDP, length 1397
17:55:45.234466 IP kafka1.test.56560 > kafka1.test.12100: UDP, length 1397
17:55:45.234484 IP kafka1.test.44681 > kafka1.test.12100: UDP, length 1397
17:55:45.234503 IP kafka1.test.45737 > kafka1.test.12100: UDP, length 1397
17:55:45.236576 IP kafka1.test.60519 > kafka1.test.12100: UDP, length 1397
17:55:45.236602 IP kafka1.test.42588 > kafka1.test.12100: UDP, length 1397
17:55:45.236617 IP kafka1.test.55584 > kafka1.test.12100: UDP, length 95
17:55:45.236623 IP kafka1.test.41761 > kafka1.test.12100: UDP, length 1397
17:55:45.236644 IP kafka1.test.39751 > kafka1.test.12100: UDP, length 1397
17:55:45.236659 IP kafka1.test.53427 > kafka1.test.12100: UDP, length 1397
17:55:45.236664 IP kafka1.test.41907 > kafka1.test.12100: UDP, length 1397
17:55:45.236682 IP kafka1.test.39182 > kafka1.test.12100: UDP, length 1397
17:55:45.236683 IP kafka1.test.48339 > kafka1.test.12100: UDP, length 1397
17:55:45.236702 IP kafka1.test.42329 > kafka1.test.12100: UDP, length 1397
17:55:45.236702 IP kafka1.test.57349 > kafka1.test.12100: UDP, length 1397
17:55:45.236723 IP kafka1.test.36682 > kafka1.test.12100: UDP, length 1397
17:55:45.236723 IP kafka1.test.55412 > kafka1.test.12100: UDP, length 1397
17:55:45.236742 IP kafka1.test.34218 > kafka1.test.12100: UDP, length 1397
17:55:45.236743 IP kafka1.test.36543 > kafka1.test.12100: UDP, length 1397
17:55:48.521680 IP kafka1.test.52314 > kafka1.test.12100: UDP, length 1326
17:55:48.521879 IP kafka1.test.50308 > kafka1.test.12100: UDP, length 1397
17:55:48.521952 IP kafka1.test.46861 > kafka1.test.12100: UDP, length 1397
17:55:48.521977 IP kafka1.test.46651 > kafka1.test.12100: UDP, length 1397
17:55:48.521998 IP kafka1.test.57007 > kafka1.test.12100: UDP, length 1397
17:55:48.522021 IP kafka1.test.38992 > kafka1.test.12100: UDP, length 1397
17:55:48.522037 IP kafka1.test.40497 > kafka1.test.12100: UDP, length 1397
17:55:48.522054 IP kafka1.test.46613 > kafka1.test.12100: UDP, length 1397
17:55:48.522071 IP kafka1.test.51631 > kafka1.test.12100: UDP, length 1397
17:55:48.522088 IP kafka1.test.38261 > kafka1.test.12100: UDP, length 1397
17:55:48.522104 IP kafka1.test.40497 > kafka1.test.12100: UDP, length 1397
17:55:48.522121 IP kafka1.test.57612 > kafka1.test.12100: UDP, length 1397
17:55:48.522137 IP kafka1.test.39034 > kafka1.test.12100: UDP, length 1397
17:55:48.522153 IP kafka1.test.36934 > kafka1.test.12100: UDP, length 1397
17:55:48.522178 IP kafka1.test.59228 > kafka1.test.12100: UDP, length 1397
17:55:48.522196 IP kafka1.test.52246 > kafka1.test.12100: UDP, length 1397
17:55:48.523018 IP kafka1.test.55592 > kafka1.test.12100: UDP, length 1397
17:55:49.518078 IP kafka1.test.60361 > kafka1.test.12100: UDP, length 1397
17:55:49.518294 IP kafka1.test.44697 > kafka1.test.12100: UDP, length 1397
17:55:49.518431 IP kafka1.test.40461 > kafka1.test.12100: UDP, length 1397
17:55:49.518554 IP kafka1.test.59743 > kafka1.test.12100: UDP, length 1397
17:55:49.519116 IP kafka1.test.55924 > kafka1.test.12100: UDP, length 1397
17:55:49.519215 IP kafka1.test.53606 > kafka1.test.12100: UDP, length 1397
17:55:49.519247 IP kafka1.test.41381 > kafka1.test.12100: UDP, length 1397
17:55:49.519277 IP kafka1.test.50325 > kafka1.test.12100: UDP, length 1397
17:55:49.519307 IP kafka1.test.49884 > kafka1.test.12100: UDP, length 1397
17:55:49.519335 IP kafka1.test.47549 > kafka1.test.12100: UDP, length 1397
17:55:49.519366 IP kafka1.test.37358 > kafka1.test.12100: UDP, length 1397
17:55:49.519396 IP kafka1.test.48980 > kafka1.test.12100: UDP, length 1397
17:55:49.519424 IP kafka1.test.38868 > kafka1.test.12100: UDP, length 1397
17:55:49.519451 IP kafka1.test.53809 > kafka1.test.12100: UDP, length 1397
17:55:49.519480 IP kafka1.test.43313 > kafka1.test.12100: UDP, length 1397
17:55:52.524463 IP kafka1.test.47874 > kafka1.test.12100: UDP, length 1397
17:55:52.524508 IP kafka1.test.46334 > kafka1.test.12100: UDP, length 1397
17:55:52.524532 IP kafka1.test.40560 > kafka1.test.12100: UDP, length 1397
17:55:52.524554 IP kafka1.test.45427 > kafka1.test.12100: UDP, length 1397
17:55:52.524577 IP kafka1.test.43979 > kafka1.test.12100: UDP, length 1397
17:55:52.526056 IP kafka1.test.37671 > kafka1.test.12100: UDP, length 1397
17:55:52.526533 IP kafka1.test.33646 > kafka1.test.12100: UDP, length 1397
17:55:52.527133 IP kafka1.test.50665 > kafka1.test.12100: UDP, length 1397
17:55:52.527155 IP kafka1.test.42969 > kafka1.test.12100: UDP, length 1397
17:55:52.527177 IP kafka1.test.41386 > kafka1.test.12100: UDP, length 1397
17:55:52.527199 IP kafka1.test.42774 > kafka1.test.12100: UDP, length 1397
17:55:52.527202 IP kafka1.test.48876 > kafka1.test.12100: UDP, length 1397
17:55:52.527227 IP kafka1.test.36201 > kafka1.test.12100: UDP, length 1397
17:55:52.527236 IP kafka1.test.48697 > kafka1.test.12100: UDP, length 1397
17:55:52.527244 IP kafka1.test.57502 > kafka1.test.12100: UDP, length 1397
17:55:52.527261 IP kafka1.test.47434 > kafka1.test.12100: UDP, length 1397
17:55:52.527285 IP kafka1.test.48754 > kafka1.test.12100: UDP, length 1397
17:55:52.527311 IP kafka1.test.53729 > kafka1.test.12100: UDP, length 1397
17:55:52.527334 IP kafka1.test.36225 > kafka1.test.12100: UDP, length 1397
17:55:52.527357 IP kafka1.test.44406 > kafka1.test.12100: UDP, length 1397
17:55:52.527385 IP kafka1.test.39644 > kafka1.test.12100: UDP, length 1397
17:55:52.527453 IP kafka1.test.50262 > kafka1.test.12100: UDP, length 1397
17:55:52.529798 IP kafka1.test.60479 > kafka1.test.12100: UDP, length 546
17:55:52.529860 IP kafka1.test.33878 > kafka1.test.12100: UDP, length 1397
17:55:52.529880 IP kafka1.test.47957 > kafka1.test.12100: UDP, length 1397
17:55:52.529898 IP kafka1.test.52121 > kafka1.test.12100: UDP, length 1397
17:55:52.529915 IP kafka1.test.51377 > kafka1.test.12100: UDP, length 1397
17:55:52.529931 IP kafka1.test.41796 > kafka1.test.12100: UDP, length 1397
17:55:52.529949 IP kafka1.test.48370 > kafka1.test.12100: UDP, length 1397
17:55:52.529966 IP kafka1.test.35244 > kafka1.test.12100: UDP, length 1397
17:55:52.529984 IP kafka1.test.53852 > kafka1.test.12100: UDP, length 1397
17:55:52.530001 IP kafka1.test.36302 > kafka1.test.12100: UDP, length 1397
17:55:52.530017 IP kafka1.test.40269 > kafka1.test.12100: UDP, length 1397
17:55:52.530038 IP kafka1.test.nimgtw > kafka1.test.12100: UDP, length 1397
17:55:52.530061 IP kafka1.test.55384 > kafka1.test.12100: UDP, length 1397
17:55:52.530283 IP kafka1.test.51440 > kafka1.test.12100: UDP, length 1397
17:55:52.530500 IP kafka1.test.36340 > kafka1.test.12100: UDP, length 1397
17:55:52.530572 IP kafka1.test.34717 > kafka1.test.12100: UDP, length 1397
17:55:52.530806 IP kafka1.test.49588 > kafka1.test.12100: UDP, length 1397
danielnelson commented 7 years ago

It looks like statsrelay is sending multi metric packets, and it doesn't seem that we support these currently.

17:55:28.053768 IP kafka1.test.41570 > kafka1.test.12100: UDP, length 1397
    0x0000:  4500 0591 b426 4000 4011 3718 c0a8 6466  E....&@.@.7...df
    0x0010:  c0a8 6466 a262 2f44 057d 4fac 3037 3332  ..df.b/D.}O.0732
    0x0020:  3737 3a38 7c63 0a30 3839 3932 333a 347c  77:8|c.089923:4|
    0x0030:  630a 3036 3435 3338 3a38 7c63 0a30 3232  c.064538:8|c.022
    0x0040:  3135 313a 377c 630a 3037 3036 3833 3a32  151:7|c.070683:2
    0x0050:  7c63 0a30 3032 3638 343a 357c 630a 3031  |c.002684:5|c.01
    0x0060:  3830 3637 3a38 7c63 0a30 3434 3033 363a  8067:8|c.044036:
danielnelson commented 7 years ago

@keyboardfann I looked back at this issue and actually I misread the dump. Additionally I can confirm that multi metric packets are accepted using:

echo -e "gorets:1|c\nglork:320|ms\ngaugor:333|g\nuniques:765|s" | nc -u localhost 8125

I read through the dump again and I don't see any problems with truncated packets, so we may need to dig deeper, are you still able to reproduce?

keyboardfann commented 6 years ago

Hi @danielnelson , Sorry for late reply, I have reproduced it.

Infomation: 192.168.100.103 telegraf:1.5.0 riemann:0.2.14

192.168.100.102 statsdrelay https://github.com/jjneely/statsrelay

Telegraf config:

[global_tags]
[agent]
  interval = "60s"
  round_interval = true
  metric_batch_size = 200
  metric_buffer_limit = 1000000
  collection_jitter = "0s"
  flush_interval = "60s"
  flush_jitter = "0s"
  precision = ""
  debug = true
  quiet = false
  logfile = "/var/log/telegraf/telegraf.log"
  hostname = ""
  omit_hostname = true

[[outputs.riemann]]
  url = "tcp://localhost:5555"
  separator = "/"

[[inputs.statsd]]
  service_address = "192.168.100.102:12000"
  delete_gauges = true
  delete_counters = true
  delete_sets = true
  delete_timings = true
  percentiles = [90]
  metric_separator = "."
  parse_data_dog_tags = false
  allowed_pending_messages = 40000000
  percentile_limit = 1000

Riemann Config:

; -*- mode: clojure; -*-
; vim: filetype=clojure

(logging/init {:file "/var/log/riemann/riemann.log"})

; Listen on the local interface over TCP (5555), UDP (5555), and websockets
; (5556)
(let [host "0.0.0.0"]
  (tcp-server {:host host})
  (udp-server {:host host})
  (ws-server  {:host host}))

; Expire old events from the index every 5 seconds.
(periodically-expire 5)

(instrumentation {:interval 5 :enabled? false})

(let [index (index)]
  ; Inbound events will be passed to these streams:
  (streams
    (default :ttl 60
      ; Index all events immediately.
      index
      #(info %)
      ; Log expired events.
      (expired
        (fn [event] (info "expired" event))))))

statsrelay startup command:

statsrelay -bind=192.168.100.103 -port=12000 -prefix="server1.statsrelay1" 192.168.100.102:12000:100 &

Test command:

for i in $(seq 1 1400);do echo "deploys.test.myservice$i:1|c" | nc -w 1 -u 192.168.100.103 12000;done

Telegraf log:

2017-12-18T04:02:51Z I! Starting Telegraf v1.5.0
2017-12-18T04:02:51Z I! Loaded outputs: riemann
2017-12-18T04:02:51Z I! Loaded inputs: inputs.statsd
2017-12-18T04:02:51Z I! Tags enabled:
2017-12-18T04:02:51Z I! Agent Config: Interval:1m0s, Quiet:false, Hostname:"", Flush Interval:1m0s
2017-12-18T04:02:51Z I! Started the statsd service on 192.168.100.102:12000
2017-12-18T04:02:51Z I! Statsd UDP listener listening on:  192.168.100.102:12000
2017-12-18T04:03:03Z E! Error: splitting ':', Unable to parse metric: yservi
2017-12-18T04:03:07Z E! Error: splitting ':', Unable to parse metric: deploys.test.mys
2017-12-18T04:03:07Z E! Error: splitting ':', Unable to parse metric: deploys.test.mys
2017-12-18T04:03:07Z E! Error: splitting ':', Unable to parse metric: deploys.test.myservice10
2017-12-18T04:03:10Z E! Error: splitting ':', Unable to parse metric: st.myservice
2017-12-18T04:03:13Z E! Error: splitting ':', Unable to parse metric: deploys.test.myservice10
2017-12-18T04:03:13Z E! Error: splitting ':', Unable to parse metric: deploys.test.myservice10
2017-12-18T04:03:13Z E! Error: splitting ':', Unable to parse metric: deploys.test.myservice10
2017-12-18T04:03:13Z E! Error: splitting ':', Unable to parse metric: deploys.test.myservice10
2017-12-18T04:03:23Z E! Error: splitting ':', Unable to parse metric: 1|c
2017-12-18T04:03:26Z E! Error: splitting ':', Unable to parse metric: deploys.test.mys
2017-12-18T04:03:26Z E! Error: splitting ':', Unable to parse metric: deploys.test.mys
2017-12-18T04:03:26Z E! Error: splitting ':', Unable to parse metric: deploys.test.mys
2017-12-18T04:03:26Z E! Error: splitting ':', Unable to parse metric: deploys.test.myserv
danielnelson commented 6 years ago

I was able to reproduce, here are steps to reproduce without Telegraf and on a single host.

Save this to a file udp.go and run with go run udp.go, it will print out one packet per line using %q format to stderr so you may wish to redirect it to a file:

package main

import (
        "fmt"
        "net"
        "os"
)

func listen() error {
        conn, err := net.ListenPacket("udp", ":12000")
        if err != nil {
                return err
        }
        fmt.Println(conn.LocalAddr())

        buf := make([]byte, 64*1024)
        for {   
                n, _, err := conn.ReadFrom(buf)
                if err != nil {
                        return err
                }

                fmt.Fprintf(os.Stderr, "%q\n", string(buf[:n]))
        }
}

func main() {
        err := listen()
        if err != nil {
                fmt.Fprintln(os.Stderr, err)
        }
}

Start statsrelay:

statsrelay -bind=127.0.0.1 -port=12001 -prefix="server1.statsrelay1" 127.0.0.1:12000:100

Now send the test data using this fancy bash udp method to avoid a 1 second pause with the nc command:

for i in $(seq 1 1400);do echo "deploys.test.myservice$i:1|c" >/dev/udp/localhost/12001; done

Here is a quick example of how the data looks at the end of a packet:

deploys.test.myservice1059:1|c\ndeploys.test.myservice1060:1|c\ndeploys.test.mys"

So based on this I think that it is a bug in statsrelay.

keyboardfann commented 6 years ago

Yes , maybe. So I change to use uber statsrelay and looks good. https://github.com/uber/statsrelay

jjneely commented 6 years ago

If the UDP packet payload is

deploys.test.myservice1059:1|c\ndeploys.test.myservice1060:1|c\ndeploys.test.mys

Then yes, my Statsrelay as well as Etsy's StatsD daemon is going to fail to parse deploys.test.mys or otherwise miss-understand the metric depending on what is cut off or partially attached to the front of the next payload. I don't do any state/connection handling and the last time I looked neither did Etsy's StatsD which I used as my reference implementation.

https://github.com/etsy/statsd/blob/master/docs/server.md

The docs for the UDP/TCP server for Etsy's StatsD imply this but don't out right say it. My interpretation of this was that it was incorrect use of the protocol to split metrics across UDP packets and only supported with TCP where every metric must be terminated with \n and we must track connection state.

That's why I've been arguing that telegraf's UDP implementation for sending StatsD metrics is flawed. Its not compatible with all StatsD server/proxy implementations. Looks like Uber's version corrects for this. I'm open to PRs, but my goals were to be fast, not correct for clients' implementations.

danielnelson commented 6 years ago

This is the payload I am receiving from statsrelay, not a packet Telegraf is sending. I'm sending test data to statsrelay using this command, which I believe places each line in it's own packet:

for i in $(seq 1 1400);do echo "deploys.test.myservice$i:1|c" >/dev/udp/localhost/12001; done
jjneely commented 6 years ago

Ah, I missed that detail. I thought that was the payload coming from your bash generator. Using tcpdump (packets never lie) also confirms that we get one metric per packet.

Standing corrected, let me take a closer look.

jjneely commented 6 years ago
Packet #0 corrupt: "deploys.test.myservice97:1|c\ndeploys.test.myservice98:1|c\ndeploys.test.myservice99:1|c\ndeploys.test.myservice52:1|c\ndeploys.test.myservice53:1|c\ndeploys.test.myservice54:1|c\ndeploys.test.myservice55:1|c\ndeploys.test.myservice56:1|c\ndeploys.test.myservice57:1|c\ndeploys.test.myservice58:1|c\ndeploys.test.myservice59:1|c\ndeploys.test.myservice60:1|c\ndeploys.test.myservice61:1|c\ndeploys.test.myservice62:1|c\ndeploys.test.myservice63:1|c\ndeploys.test.myservice64:1|c\ndeploys.test.myservice65:1|c\ndeploys.test.myservice66:1|c\ndeploys.test.myservice67:1|c\ndeploys.test.myservice68:1|c\ndeploys.test.myservice69:1|c\ndeploys.test.myservice70:1|c\ndeploys.test.myservice71:1|c\ndeploys.test.myservice72:1|c\ndeploys.test.myservice73:1|c\ndeploys.test.myservice74:1|c\ndeploys.test.myservice75:1|c\ndeploys.test.myservice76:1|c\ndeploys.test.myservice77:1|c\ndeploys.test.myservice78:1|c\ndeploys.test.myservice79:1|c\ndeploys.test.myservice80:1|c\ndeploys.test.myservice81:1|c\ndeploys.test.myservice82:1|c\ndeploys.test.myservice83:1|c\ndeploys.test.myservice84:1|c\ndeploys.test.myservice85:1|c\ndeploys.test.myservice86:1|c\ndeploys.test.myservice87:1|c\ndeploys.test.myservice88:1|c\ndeploys.test.myservice89:1|c\ndeploys.test.myservice90:1|c\ndeploys.test.myservice91:1|c\ndeploys.test.myservice92:1|c\ndeploys.test.myservice93:1|c\ndeploys.test.myservice94:1|c\ndeploys.test.myservice95:1|c\ndeploys.test.myservi"
Packet #2 corrupt: "deploys.test.myservice189:1|c\ndeploys.test.myservice190:1|c\ndeploys.test.myservice191:1|c\ndeploys.test.myservice192:1|c\ndeploys.test.myservice193:1|c\ndeploys.test.myservice194:1|c\ndeploys.test.myservice195:1|c\ndeploys.test.myservice196:1|c\ndeploys.test.myservice197:1|c\ndeploys.test.myservice198:1|c\ndeploys.test.myservice199:1|c\ndeploys.test.myservice200:1|c\ndeploys.test.myservice201:1|c\ndeploys.test.myservice202:1|c\ndeploys.test.myservice203:1|c\ndeploys.test.myservice204:1|c\ndeploys.test.myservice205:1|c\ndeploys.test.myservice206:1|c\ndeploys.test.myservice207:1|c\ndeploys.test.myservice208:1|c\ndeploys.test.myservice209:1|c\ndeploys.test.myservice210:1|c\ndeploys.test.myservice211:1|c\ndeploys.test.myservice212:1|c\ndeploys.test.myservice213:1|c\ndeploys.test.myservice214:1|c\ndeploys.test.myservice215:1|c\ndeploys.test.myservice216:1|c\ndeploys.test.myservice217:1|c\ndeploys.test.myservice218:1|c\ndeploys.test.myservice173:1|c\ndeploys.test.myservice174:1|c\ndeploys.test.myservice175:1|c\ndeploys.test.myservice176:1|c\ndeploys.test.myservice177:1|c\ndeploys.test.myservice178:1|c\ndeploys.test.myservice179:1|c\ndeploys.test.myservice180:1|c\ndeploys.test.myservice181:1|c\ndeploys.test.myservice182:1|c\ndeploys.test.myservice183:1|c\ndeploys.test.myservice184:1|c\ndeploys.test.myservice185:1|c\ndeploys.test.myservice186:1|c\ndeploys.test.myservice187:1|c\ndeploys.test.myservice188:1"
package main

// Attribution: https://github.com/influxdata/telegraf/issues/2938#issuecomment-355712864

import (
        "bytes"
        "fmt"
        "net"
        "os"
)

func listen() error {
        conn, err := net.ListenPacket("udp", ":12000")
        if err != nil {
                return err
        }

        buf := make([]byte, 64*1024)
        c := 0
        for {
                n, _, err := conn.ReadFrom(buf)
                if err != nil {
                        return err
                }

                //payload := fmt.Sprintf("%q", string(buf[:n]))
                payload := buf[:n]
                if !bytes.HasSuffix(payload, []byte(":1|c\n")) &&
                        !bytes.HasSuffix(payload, []byte(":1400|c\n")) {
                        // This packet looks like it sliced a metric in two...
                        fmt.Printf("Packet #%d corrupt: %q\n", c, string(payload))
                }
                c++
        }
}

func main() {
        err := listen()
        if err != nil {
                fmt.Fprintln(os.Stderr, err)
        }
}

Okay, moving this back over to https://github.com/jjneely/statsrelay/issues/20

danielnelson commented 6 years ago

Okay, sorry about the initial confusion I caused.

jjneely commented 6 years ago

I believe this is now fixed on current master. Anyone able to confirm?

keyboardfann commented 6 years ago

Hi @jjneely & @danielnelson, After the verification, it works fine and no error show again. Thank you for fix the issue.

Send metrics

[root@kafka2 statsrelay]# for i in $(seq 1 1400);do echo "deploys.test.myservice$i:1|c" | nc -w 1 -u 192.168.100.103 12000;done
[root@kafka2 statsrelay]# 

Telegraf log

[root@kafka1 ~]# tailf /var/log/telegraf/telegraf.log 
2018-01-11T04:13:00Z D! Output [riemann] wrote batch of 200 metrics in 21.354148ms
2018-01-11T04:13:19Z D! Attempting connection to output: riemann
2018-01-11T04:13:19Z D! Successfully connected to output: riemann
2018-01-11T04:13:19Z I! Starting Telegraf v1.5.1
2018-01-11T04:13:19Z I! Loaded outputs: riemann
2018-01-11T04:13:19Z I! Loaded inputs: inputs.statsd
2018-01-11T04:13:19Z I! Tags enabled: 
2018-01-11T04:13:19Z I! Agent Config: Interval:1m0s, Quiet:false, Hostname:"", Flush Interval:1m0s 
2018-01-11T04:13:19Z I! Started the statsd service on 192.168.100.102:12000
2018-01-11T04:13:19Z I! Statsd UDP listener listening on:  192.168.100.102:12000
2018-01-11T04:14:00Z D! Output [riemann] wrote batch of 200 metrics in 20.223706ms
2018-01-11T04:14:00Z D! Output [riemann] wrote batch of 200 metrics in 14.787184ms
2018-01-11T04:14:00Z D! Output [riemann] wrote batch of 200 metrics in 33.30283ms
2018-01-11T04:14:00Z D! Output [riemann] wrote batch of 200 metrics in 40.731801ms
2018-01-11T04:14:00Z D! Output [riemann] wrote batch of 200 metrics in 53.164813ms
2018-01-11T04:14:00Z D! Output [riemann] wrote batch of 200 metrics in 38.367114ms
2018-01-11T04:14:00Z D! Output [riemann] wrote batch of 200 metrics in 42.405188ms