joshuar / pingbeat

DEPRECATED. Pingbeat sends ICMP packets and stores the RTT in Elasticsearch or other outputs supported by libbeat.
Apache License 2.0
55 stars 19 forks source link

Missing 'loss' field in pingbeat output #28

Open project-poodle opened 7 years ago

project-poodle commented 7 years ago

in 1.0-beta, there is a 'loss:boolean' field that can capture packet loss. It seems this field is no longer present with 5.4.

This field is quite useful when detecting network instability. Could this field be added back?

joshuar commented 7 years ago

Hi @zach929, this field should still be there. Rather than adding loss: false to every good ping, pingbeat just adds loss: true to failed pings (along with the reason field which will eventually report what ICMP/network error was the cause). Are you no longer seeing loss where you previously saw it?

project-poodle commented 7 years ago

Hi @joshuar , thanks for the reply. The 'loss: true' event was not generated in 5.4 during my test. following is the pingbeat.yml:

pingbeat:
  # Defines how often a ping is sent to a target
  period: "5s"
  # Whether to send pings over IPv4
  useipv4: true
  # Whether to send pings over IPv6
  useipv6: false
  # How long to wait for a target to respond to a ping request
  timeout: "10s"
  targets:
    - name: "100.100.100.100"
    - name: "8.8.8.8"

output:
  console:
    pretty: true

Following is the output of the program:

[es5]$ sudo /usr/bin/pingbeat -e -c pingbeat.yml -d publish
2017/04/30 23:36:02.094281 beat.go:285: INFO Home path: [/usr/bin] Config path: [/usr/bin] Data path: [/usr/bin/data] Logs path: [/usr/bin/logs]
2017/04/30 23:36:02.094362 beat.go:186: INFO Setup Beat: pingbeat; Version: 5.4.0
2017/04/30 23:36:02.094456 outputs.go:108: INFO Activated console as output plugin.
2017/04/30 23:36:02.094493 publish.go:238: DBG  Create output worker
2017/04/30 23:36:02.094658 publish.go:280: DBG  No output is defined to store the topology. The server fields might not be filled.
2017/04/30 23:36:02.094745 publish.go:295: INFO Publisher name: es5
2017/04/30 23:36:02.094920 metrics.go:23: INFO Metrics logging every 30s
2017/04/30 23:36:02.095120 async.go:63: INFO Flush Interval set to: 1s
2017/04/30 23:36:02.095153 async.go:64: INFO Max Bulk Size set to: 2048
2017/04/30 23:36:02.095175 async.go:72: DBG  create bulk processing worker (interval=1s, bulk size=2048)
2017/04/30 23:36:02.095592 beat.go:221: INFO pingbeat start running.
2017/04/30 23:36:02.095616 pingbeat.go:71: INFO pingbeat is running! Hit CTRL-C to stop it.
2017/04/30 23:36:02.096336 pingbeat.go:97: INFO Using ip4:icmp connection
2017/04/30 23:36:07.098339 client.go:214: DBG  Publish: {
  "@timestamp": "2017-04-30T23:36:07.097Z",
  "beat": {
    "hostname": "es5",
    "name": "es5",
    "version": "5.4.0"
  },
  "rtt": 1.389245,
  "target.addr": "8.8.8.8",
  "target.name": "8.8.8.8",
  "target.tags": null,
  "type": "pingbeat"
}
2017/04/30 23:36:08.096155 output.go:109: DBG  output worker: publish 1 events
{
  "@timestamp": "2017-04-30T23:36:07.097Z",
  "beat": {
    "hostname": "es5",
    "name": "es5",
    "version": "5.4.0"
  },
  "rtt": 1.389245,
  "target.addr": "8.8.8.8",
  "target.name": "8.8.8.8",
  "target.tags": null,
  "type": "pingbeat"
}
2017/04/30 23:36:12.097673 client.go:214: DBG  Publish: {
  "@timestamp": "2017-04-30T23:36:12.097Z",
  "beat": {
    "hostname": "es5",
    "name": "es5",
    "version": "5.4.0"
  },
  "rtt": 1.210513,
  "target.addr": "8.8.8.8",
  "target.name": "8.8.8.8",
  "target.tags": null,
  "type": "pingbeat"
}
2017/04/30 23:36:13.095671 output.go:109: DBG  output worker: publish 1 events
{
  "@timestamp": "2017-04-30T23:36:12.097Z",
  "beat": {
    "hostname": "es5",
    "name": "es5",
    "version": "5.4.0"
  },
  "rtt": 1.210513,
  "target.addr": "8.8.8.8",
  "target.name": "8.8.8.8",
  "target.tags": null,
  "type": "pingbeat"
}
2017/04/30 23:36:17.098026 client.go:214: DBG  Publish: {
  "@timestamp": "2017-04-30T23:36:17.097Z",
  "beat": {
    "hostname": "es5",
    "name": "es5",
    "version": "5.4.0"
  },
  "rtt": 1.462431,
  "target.addr": "8.8.8.8",
  "target.name": "8.8.8.8",
  "target.tags": null,
  "type": "pingbeat"
}

100.100.100.100 is an obvious non-pingable address. In the output, only the 8.8.8.8 address generates ping event. The 'loss: true' event was not generated for '100.100.100.100'.

joshuar commented 7 years ago

Hi @zach929 okay, the loss processing is still there, but some refactoring of the code meant that some "loss" conditions were no longer being recorded. With 4f9c249696fcc20b615e3cd0619d8a85e67456ad:

The default timeout is relatively low, (10 x interval) simply because I originally want to keep the memory usage low where a large number of targets was defined and a low interval was being used. This timeout parameter can be set in the config as needed and I may opt for a higher timeout.

Can you try the master branch and see if it is better?

jegade commented 7 years ago

hi @joshuar, can you build a dev-release? I'm getting no 'loss: true' with the latest-Version

jegade commented 7 years ago

i'm getting a lot of errors for unreachable hosts with the latest release

2017/05/03 16:20:30.548142 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548147 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548151 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548457 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548510 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548521 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548531 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548541 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548560 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548570 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548580 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548589 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548599 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548611 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548620 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548629 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548638 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548649 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548660 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548670 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548680 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548690 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
2017/05/03 16:20:30.548707 pingbeat.go:180: ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout
atomicom commented 7 years ago

I'm also experiencing a continuous stream of 'ERR Couldn't read from connection: read ip4 0.0.0.0: i/o timeout' when an IP is unreachable - it filled up 7 log files in a second.

Running at release v5.4.0

joshuar commented 7 years ago

@atomicom @jegade @zach929 looks like I really made a mess of that last release. Can you try 5.4.1: https://github.com/joshuar/pingbeat/releases/tag/v5.4.1

This should fix both tracking of loss and also stop any unnecessary error messages.

jegade commented 7 years ago

@joshuar much better, now the losts are tracked. Thank you